Set SSH server keep alive options
When os-net-config configures the network configuration on the overcloud nodes ssh connections can be dropped. Since we have ssh retries set to 8 in ansible.cfg, ansible would retry the task since it was failed by a ssh connection error. However, the first task was actually still running and it eventually succeeds. The second task that was kicked off by ansible as a retry, sees that the deployment is already applied, but the notification file (*.notify.json) does not yet exist since the first task is still in progress. This causes the second task to fail with the error reported in the bug and the whole ansible-playbook run to then fail. Setting ServerAliveInterval and ServerAliveCountMax ssh options seems to fix the issue as ssh doesn't drop the first connection when these are configured. Change-Id: I08781fe2aa6472d3fae5c5f5d0babd1f7a3b9b2d Closes-Bug: #1792343
This commit is contained in:
parent
d5b1651fd5
commit
c0f41cae9f
|
@ -0,0 +1,5 @@
|
|||
---
|
||||
fixes:
|
||||
- The ServerAliveInterval and ServerAliveCountMax SSH options are now set in
|
||||
the mistral ansible action so that when networking configuration is
|
||||
performed on the overcloud nodes SSH will not drop the connection.
|
|
@ -56,7 +56,9 @@ def write_default_ansible_cfg(work_dir,
|
|||
'-o UserKnownHostsFile=/dev/null '
|
||||
'-o StrictHostKeyChecking=no '
|
||||
'-o ControlMaster=auto '
|
||||
'-o ControlPersist=30m')
|
||||
'-o ControlPersist=30m '
|
||||
'-o ServerAliveInterval=5 '
|
||||
'-o ServerAliveCountMax=5')
|
||||
config.set('ssh_connection', 'control_path_dir',
|
||||
os.path.join(work_dir, 'ansible-ssh'))
|
||||
config.set('ssh_connection', 'retries', '8')
|
||||
|
|
Loading…
Reference in New Issue