Set SSH server keep alive options
When os-net-config configures the network configuration on the overcloud nodes
ssh connections can be dropped.
Since we have ssh retries set to 8 in ansible.cfg, ansible would retry the task
since it was failed by a ssh connection error.
However, the first task was actually still running and it eventually succeeds.
The second task that was kicked off by ansible as a retry, sees that the
deployment is already applied, but the notification file (*.notify.json) does
not yet exist since the first task is still in progress. This causes the second
task to fail with the error reported in the bug and the whole ansible-playbook
run to then fail.
Setting ServerAliveInterval and ServerAliveCountMax ssh options seems to fix
the issue as ssh doesn't drop the first connection when these are configured.
Change-Id: I08781fe2aa6472d3fae5c5f5d0babd1f7a3b9b2d
Closes-Bug: #1792343
(cherry picked from commit c0f41cae9f
)
This commit is contained in:
parent
9c856b0101
commit
56bf1d6db5
|
@ -0,0 +1,5 @@
|
|||
---
|
||||
fixes:
|
||||
- The ServerAliveInterval and ServerAliveCountMax SSH options are now set in
|
||||
the mistral ansible action so that when networking configuration is
|
||||
performed on the overcloud nodes SSH will not drop the connection.
|
|
@ -48,7 +48,9 @@ def write_default_ansible_cfg(work_dir,
|
|||
'-o UserKnownHostsFile=/dev/null '
|
||||
'-o StrictHostKeyChecking=no '
|
||||
'-o ControlMaster=auto '
|
||||
'-o ControlPersist=30m')
|
||||
'-o ControlPersist=30m '
|
||||
'-o ServerAliveInterval=5 '
|
||||
'-o ServerAliveCountMax=5')
|
||||
config.set('ssh_connection', 'control_path_dir',
|
||||
os.path.join(work_dir, 'ansible-ssh'))
|
||||
config.set('ssh_connection', 'retries', '8')
|
||||
|
|
Loading…
Reference in New Issue