Set SSH server keep alive options

When os-net-config configures the network configuration on the overcloud nodes
ssh connections can be dropped.

Since we have ssh retries set to 8 in ansible.cfg, ansible would retry the task
since it was failed by a ssh connection error.

However, the first task was actually still running and it eventually succeeds.

The second task that was kicked off by ansible as a retry, sees that the
deployment is already applied, but the notification file (*.notify.json) does
not yet exist since the first task is still in progress. This causes the second
task to fail with the error reported in the bug and the whole ansible-playbook
run to then fail.

Setting ServerAliveInterval and ServerAliveCountMax ssh options seems to fix
the issue as ssh doesn't drop the first connection when these are configured.

Change-Id: I08781fe2aa6472d3fae5c5f5d0babd1f7a3b9b2d
Closes-Bug: #1792343
This commit is contained in:
James Slagle 2018-09-20 13:36:03 -04:00 committed by Quique Llorente
parent d5b1651fd5
commit c0f41cae9f
2 changed files with 8 additions and 1 deletions

View File

@ -0,0 +1,5 @@
---
fixes:
- The ServerAliveInterval and ServerAliveCountMax SSH options are now set in
the mistral ansible action so that when networking configuration is
performed on the overcloud nodes SSH will not drop the connection.

View File

@ -56,7 +56,9 @@ def write_default_ansible_cfg(work_dir,
'-o UserKnownHostsFile=/dev/null '
'-o StrictHostKeyChecking=no '
'-o ControlMaster=auto '
'-o ControlPersist=30m')
'-o ControlPersist=30m '
'-o ServerAliveInterval=5 '
'-o ServerAliveCountMax=5')
config.set('ssh_connection', 'control_path_dir',
os.path.join(work_dir, 'ansible-ssh'))
config.set('ssh_connection', 'retries', '8')