summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorZuul <zuul@review.openstack.org>2018-11-19 15:19:37 +0000
committerGerrit Code Review <review@openstack.org>2018-11-19 15:19:37 +0000
commit0a6a14de3cc7c0ebc283dbdb98940e59415aaef3 (patch)
tree18c76210fbd2a9f2df8ad1f0d00c554c6d5be376
parent2ac4dee8a09a38cbad5f2db01184907133e7a92a (diff)
parent87985182632199b4d240d8af9649c59ca57ccd7b (diff)
Merge "Migration document update."
-rw-r--r--doc/source/install/migration.rst370
1 files changed, 252 insertions, 118 deletions
diff --git a/doc/source/install/migration.rst b/doc/source/install/migration.rst
index 8406453..92b7e8d 100644
--- a/doc/source/install/migration.rst
+++ b/doc/source/install/migration.rst
@@ -3,163 +3,213 @@
3Migration Strategy 3Migration Strategy
4================== 4==================
5 5
6This document details an in-place migration strategy from ML2/OVS in either 6This document details an in-place migration strategy from ML2/OVS to ML2/OVN
7ovs-firewall, or ovs-hybrid mode in a TripleO OpenStack deployment. 7in either ovs-firewall or ovs-hybrid mode for a TripleO OpenStack deployment.
8 8
9For non TripleO deployments, please refer to the file ``migration/README.rst`` 9For non TripleO deployments, please refer to the file ``migration/README.rst``
10and the ansible playbook ``migration/migrate-to-ovn.yml``. 10and the ansible playbook ``migration/migrate-to-ovn.yml``.
11 11
12Overview 12Overview
13-------- 13--------
14The migration would be accomplished by following the steps: 14The migration process is orchestrated through the shell script
15ovn_migration.sh, which is provided with networking-ovn.
15 16
16a. Administrator steps: 17The administrator uses ovn_migration.sh to perform readiness steps
18and migration from the undercloud node.
19The readiness steps, such as host inventory production, DHCP and MTU
20adjustments, prepare the environment for the procedure.
17 21
18 * Updating to the latest openstack/neutron version 22Subsequent steps start the migration via Ansible.
19 23
20 * Reducing the DHCP T1 parameter on dhcp_agent.ini beforehand, which 24Plan for a 24-hour wait after the setup-mtu-t1 step to allow VMs to catch up
21 is controlled by the dhcp_renewal_time of /etc/neutron/dhcp_agent.ini 25with the new MTU size. The default neutron ML2/OVS configuration has a
26dhcp_lease_duration of 86400 seconds (24h).
22 27
23 Somewhere around 30 seconds would be enough (TODO: Data and calculations 28Also, if there are instances using static IP assignment, the administrator
24 to back this value with precise information). 29should be ready to update the MTU of those instances to the new value of 8
30bytes less than the ML2/OVS (VXLAN) MTU value. For example, the typical
311500 MTU network value that makes VXLAN tenant networks use 1450 bytes of MTU
32will need to change to 1442 under Geneve. Or under the same overlay network,
33a GRE encapsulated tenant network would use a 1458 MTU, but again a 1442 MTU
34for Geneve.
25 35
26 * Waiting for at least dhcp_lease_duration (see /etc/neutron/neutron.conf 36If there are instances which use DHCP but don't support lease update during
27 or /etc/neutron/dhcp_agent.ini) time (default is 86400 seconds = 37the T1 period the administrator will need to reboot them to ensure that MTU
28 24 hours), that way all instances will grab the new new lease renewal 38is updated inside those instances.
29 time and start checking with the dhcp server periodically based on the
30 T1 parameter.
31 39
32 * Lowering the MTU of all VXLAN or GRE based networks down to
33 make sure geneve works (a tool will be provided for that). The mtu
34 must be set to "max_tunneling_network_mtu - ovn_geneve_overhead", that's
35 generally "1500 - ovn_geneve_overhead", unless your network and any
36 intermediate router hop between compute and network nodes is jumboframe
37 capable). ovn_geneve_overhead is 58 bytes. VXLAN overhead is 50 bytes. So
38 for the typical 1500 MTU tunneling network, we may need to assign 1442.
39 40
40b. Automated steps (via ansible) 41Steps for migration
42-------------------
41 43
42 * Create pre-migration resources (network and VM) to validate final 44Perform the following steps in the overcloud/undercloud
43 migration. 45~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
44 46
45 * Update the overcloud stack (in the case of TripleO) to deploy OVN 471. Ensure that you have updated to the latest openstack/neutron version.
46 alongside reference implementation services using a temporary bridge
47 "br-migration" instead of br-int.
48 48
49 * Start the migration process: 49Perform the following steps in the undercloud
50~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
50 51
51 1. generate the OVN north db by running neutron-ovn-db-sync util 521. Install python-networking-ovn-migration-tool.
52 2. re-assign ovn-controller to br-int instead of br-migration
53 3. cleanup network namespaces (fip, snat, qrouter, qdhcp),
54 4. remove any unnecessary patch ports on br-int
55 5. remove br-tun and br-migration ovs bridges
56 6. delete qr-*, ha-* and qg-* ports from br-int
57 53
58 * Delete neutron agents and neutron HA internal networks 54 .. code-block:: console
59 55
60 * Validate connectivity on pre-migration resources. 56 yum install python-networking-ovn-migration-tool
61 57
62 * Delete pre-migration resources. 582. Create a working directory on the undercloud, and copy the ansible playbooks
63 59
64 * Create post-migration resources. 60 .. code-block:: console
65 61
66 * Validate connectivity on post-migration resources. 62 mkdir ~/ovn_migration
63 cd ~/ovn_migration
64 cp -rfp /usr/share/ansible/networking-ovn-migration/playbooks .
67 65
68 * Cleanup post-migration resources.
69 66
70 * Re-run deployment tool to update OVN on br-int. 673. Create or edit the ``overcloud-deploy-ovn.sh`` script in your ``$HOME``.
68This script must source your stackrc file, and then execute an ``openstack
69overcloud overcloud deploy`` with your original deployment parameters, plus
70the following environment files, added to the end of the command
71in the following order:
71 72
73 When your network topology is DVR and your compute nodes have connectivity
74 to the external network:
72 75
73Steps for migration 76 .. code-block:: console
74-------------------
75Carryout the below steps in the undercloud:
76 77
771. Create ``overcloud-deploy-ovn.sh`` script in /home/stack. Make sure the 78 -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovn-dvr-ha.yaml \
78 below environment files are added in the order mentioned below 79 -e $HOME/ovn-extras.yaml
80
81
82 When your compute nodes don't have external connectivity and you don't use
83 DVR:
79 84
80 .. code-block:: console 85 .. code-block:: console
81 86
82 -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
83 -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovn-ha.yaml \ 87 -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovn-ha.yaml \
84 -e /home/stack/ovn-extras.yaml 88 -e $HOME/ovn-extras.yaml
89
85 90
86 If compute nodes have external connectivity, then you can use the 91Make sure that all users have execution privileges on the script, because it
87 environment file - environments/services-docker/neutron-ovn-dvr-ha.yaml 92will be called by ovn_migration.sh/ansible during the migration process.
93
94 .. code-block:: console
95
96 $ chmod a+x ~/overcloud-deploy-ovn.sh
88 97
892. Check the script ``ovn_migration.sh`` and override the environment variables
90 if desired.
91 98
92 Below are the environment variables 994. To configure the parameters of your migration you can set the environment
100variables that will be used by ``ovn_migration.sh``. You can skip setting any
101values matching the defaults.
93 102
94 * IS_DVR_ENABLED - If the existing ML2/OVS has DVR enabled, set it to True. 103 * STACKRC_FILE - must point to your stackrc file in your undercloud.
95 Default value is False. 104 Default: ~/stackrc
96 105
97 * PUBLIC_NETWORK_NAME - Name of the public network. Default value is 106 * OVERCLOUDRC_FILE - must point to your overcloudrc file in your
98 'public'. 107 undercloud.
108 Default: ~/overcloudrc
109
110 * OVERCLOUD_OVN_DEPLOY_SCRIPT - must point to the script described in step
111 1..
112 Default: ~/overcloud-deploy-ovn.sh
113
114 * PUBLIC_NETWORK_NAME - Name of your public network.
115 Default: 'public'.
116 To support migration validation, this network must have available
117 floating IPs, and those floating IPs must be pingable from the
118 undercloud. If that's not possible please configure VALIDATE_MIGRATION
119 to False.
99 120
100 * IMAGE_NAME - Name/ID of the glance image to us for booting a test server. 121 * IMAGE_NAME - Name/ID of the glance image to us for booting a test server.
101 Default value is 'cirros'. 122 Default:'cirros'.
123 It will be automatically downloaded during the pre-validation /
124 post-validation process.
102 125
103 * VALIDATE_MIGRATION - Create migration resources to validate the 126 * VALIDATE_MIGRATION - Create migration resources to validate the
104 migration. 127 migration. The migration script, before starting the migration, boot a
105 The migration script, before starting the migration, boots a server and 128 server and validates that the server is reachable after the migration.
106 validates that the server is reachable after the migration. 129 Default: True.
107 Default value is True. 130
131 * SERVER_USER_NAME - User name to use for logging to the migration
132 instances.
133 Default: 'cirros'.
108 134
109 * SERVER_USER_NAME - User name to use for logging to the migration server. 135 * DHCP_RENEWAL_TIME - DHCP renewal time in seconds to configure in DHCP
110 Default value is 'cirros'. 136 agent configuration file.
137 Default: 30
111 138
112 * DHCP_RENEWAL_TIME - DHCP renewal time to configure in dhcp agent
113 configuration file. The default value is 30 seconds.
114 139
1152. Run ``./ovn_migration.sh generate-inventory`` to generate the inventory 140 .. warning::
116 file - hosts_for_migration. Please review this file for correctness and 141
117 modify it if desired. 142 Please note that VALIDATE_MIGRATION requires enough quota (2
143 available floating ips, 2 networks, 2 subnets, 2 instances,
144 and 2 routers as admin).
145
146 For example:
147
148 .. code-block:: console
149
150 $ export PUBLIC_NETWORK_NAME=my-public-network
151 $ ovn_migration.sh .........
152
153
1545. Run ``ovn_migration.sh generate-inventory`` to generate the inventory
155 file - ``hosts_for_migration`` and ``ansible.cfg``. Please review
156 ``hosts_for_migration`` for correctness.
157
158 .. code-block:: console
159
160 $ ovn_migration.sh generate-inventory
161
118 162
1194. Run ``./ovn_migration.sh setup-mtu-t1``. This lowers the T1 parameter 1636. Run ``ovn_migration.sh setup-mtu-t1``. This lowers the T1 parameter
120 of the internal neutron DHCP servers configuring the 'dhcp_renewal_time' in 164 of the internal neutron DHCP servers configuring the ``dhcp_renewal_time``
121 /var/lib/config-data/puppet-generated/neutron/etc/neutron/dhcp_agent.ini 165 in /var/lib/config-data/puppet-generated/neutron/etc/neutron/dhcp_agent.ini
122 in all the nodes where DHCP agent is running. 166 in all the nodes where DHCP agent is running.
123 167
1245. After the previous step we need to wait at least 24h before continuing 168 .. code-block:: console
125 if you are using VXLAN or GRE tenant networking. This will allow VMs to
126 catch up with the new MTU size of the next step.
127 169
128 .. warning:: 170 $ ovn_migration.sh setup-mtu-t1
129 171
130 This step is very important, never skip it if you are using VXLAN
131 or GRE tenant networks. If you are using VLAN tenant networks you don't
132 need to wait.
133 172
134 .. warning:: 1737. If you are using VXLAN or GRE tenant networking, ``wait at least 24 hours``
174before continuing. This will allow VMs to catch up with the new MTU size
175of the next step.`
176
177 .. warning::
178
179 If you are using VXLAN or GRE networks, this 24-hour wait step is critical.
180 If you are using VLAN tenant networks you can proceed to the next step without delay.
181
182 .. warning::
135 183
136 If you have any instance with static IP assignation on VXLAN or 184 If you have any instance with static IP assignation on VXLAN or
137 GRE tenant networks, you will need to manually modify the 185 GRE tenant networks, you must manually modify the configuration of those instances.
138 configuration of those instances to configure the new geneve MTU, 186 If your instances don't honor the T1 parameter of DHCP they will need
139 which is current VXLAN MTU minus 8 bytes, that is 1442 when VXLAN 187 to be rebooted.
140 based MTU was 1450. 188 to configure the new geneve MTU, which is the current VXLAN MTU minus 8 bytes.
189 For instance, if the VXLAN-based MTU was 1450, change it to 1442.
141 190
142 .. note:: 191 .. note::
143 192
144 24h is the time based on default configuration, it actually depends on 193 24 hours is the time based on default configuration. It actually depends on
145 /var/lib/config-data/puppet-generated/neutron/etc/neutron/dhcp_agent.ini 194 /var/lib/config-data/puppet-generated/neutron/etc/neutron/dhcp_agent.ini
146 dhcp_renewal_time and 195 dhcp_renewal_time and
147 /var/lib/config-data/puppet-generated/neutron/etc/neutron/neutron.conf 196 /var/lib/config-data/puppet-generated/neutron/etc/neutron/neutron.conf
148 dhcp_lease_duration parameters. (defaults to 86400 seconds) 197 dhcp_lease_duration parameters. (defaults to 86400 seconds)
149 198
150 .. note:: 199 .. note::
151 200
152 Please note that migrating a VLAN deployment is not recommended at 201 Please note that migrating a deployment which uses VLAN for tenant/project
153 this time because of a bug in core ovn, full support is being worked 202 networks is not recommended at this time because of a bug in core ovn,
154 out here: 203 full support is being worked out here:
155 https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/347594.html 204 https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/347594.html
156 205
157 One way of verifying that the T1 parameter has propated to existing VMs
158 is going to one of the compute nodes, and run tcpdump over one of the
159 VM taps attached to a tenant network, we should see that requests happen
160 around every 30 seconds.
161 206
162 .. code-block:: console 207 One way to verify that the T1 parameter has propagated to existing VMs
208 is to connect to one of the compute nodes, and run ``tcpdump`` over one
209 of the VM taps attached to a tenant network. If T1 propegation was a success,
210 you should see that requests happen on an interval of approximately 30 seconds.
211
212 .. code-block:: console
163 213
164 [heat-admin@overcloud-novacompute-0 ~]$ sudo tcpdump -i tap52e872c2-e6 port 67 or port 68 -n 214 [heat-admin@overcloud-novacompute-0 ~]$ sudo tcpdump -i tap52e872c2-e6 port 67 or port 68 -n
165 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode 215 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
@@ -169,37 +219,121 @@ Carryout the below steps in the undercloud:
169 13:17:56.241156 IP 192.168.99.5.bootpc > 192.168.99.3.bootps: BOOTP/DHCP, Request from fa:16:3e:6b:41:3d, length 300 219 13:17:56.241156 IP 192.168.99.5.bootpc > 192.168.99.3.bootps: BOOTP/DHCP, Request from fa:16:3e:6b:41:3d, length 300
170 13:17:56.249899 IP 192.168.99.3.bootps > 192.168.99.5.bootpc: BOOTP/DHCP, Reply, length 355 220 13:17:56.249899 IP 192.168.99.3.bootps > 192.168.99.5.bootpc: BOOTP/DHCP, Reply, length 355
171 221
172 .. note:: 222 .. note::
173 223
174 This verification is not possible with cirros VMs, due to cirros 224 This verification is not possible with cirros VMs. The cirros
175 udhcpc implementation which won't obey DHCP option 58 (T1), if you have 225 udhcpc implementation does not obey DHCP option 58 (T1). Please
176 any cirros based instances you will need to reboot them. 226 try this verification on a port that belongs to a full linux VM.
227 We recommend you to check all the different types of workloads your
228 system runs (Windows, different flavors of linux, etc..).
177 229
1786. Run ``./ovn_migration.sh reduce-mtu``. This lowers the MTU of the pre 2308. Run ``ovn_migration.sh reduce-mtu``.
179 migration VXLAN and GRE networks. You can skip this step if you use VLAN 231
180 tenant networks. It will be safe to execute in such case, because the 232 This lowers the MTU of the pre
181 tool will ignore non-VXLAN/GRE networks. 233 migration VXLAN and GRE networks. The tool will ignore non-VXLAN/GRE
234 networks, so if you use VLAN for tenant networks it will be fine if you
235 find this step not doing anything.
236
237 .. code-block:: console
182 238
1837. Set the below tripleo heat template parameters to point to the proper 239 $ ovn_migration.sh reduce-mtu
184 OVN docker images in appropriate environment file
185 240
186 * DockerOvnControllerConfigImage
187 * DockerOvnControllerImage
188 * DockerOvnNorthdImage
189 * DockerNeutronApiImage
190 * DockerNeutronConfigImage
191 * DockerOvnDbsImage
192 * DockerOvnDbsConfigImage
193 241
194 This can be done running the next command: 242 This step will go network by network reducing the MTU, and tagging with
243 ``adapted_mtu`` the networks which have been already handled.
244
245
2469. Make Tripleo ``prepare the new container images`` for OVN.
247
248 If your deployment didn't have a containers-prepare-parameter.yaml, you can
249 create one with:
250
251 .. code-block:: console
252
253 $ test -f $HOME/containers-prepare-parameter.yaml || \
254 openstack tripleo container image prepare default \
255 --output-env-file $HOME/containers-prepare-parameter.yaml
256
257
258 If you had to create the file, please make sure it's included at the end of
259 your $HOME/overcloud-deploy-ovn.sh and $HOME/overcloud-deploy.sh
260
261 Change the neutron_driver in the containers-prepare-parameter.yaml file to
262 ovn:
263
264 .. code-block:: console
265
266 $ sed -i -E 's/neutron_driver:([ ]\w+)/neutron_driver: ovn/' $HOME/containers-prepare-parameter.yaml
267
268 You can verify with:
269
270 .. code-block:: console
271
272 $ grep neutron_driver containers-prepare-parameter.yaml
273 neutron_driver: ovn
274
275
276 Then update the images:
195 277
196 .. code-block:: console 278 .. code-block:: console
197 279
198 PREPARE_ARGS="-e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \ 280 $ openstack tripleo container image prepare \
199 -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovn-ha.yaml" \ 281 --environment-file /home/stack/containers-prepare-parameter.yaml
200 ~/overcloud-prep-containers.sh 282
283 .. note::
284
285 It's important to provide the full path to your containers-prepare-parameter.yaml
286 otherwise the command will finish very quickly and won't work (current
287 version doesn't seem to output any error).
288
289
290 TripleO will validate the containers and push them to your local
291 registry.
292
293
29410. Run ``ovn_migration.sh start-migration`` to kick start the migration
295 process.
296
297 .. code-block:: console
298
299 $ ovn_migration.sh start-migration
300
301
302 Under the hood, this is what will happen:
303
304 * Create pre-migration resources (network and VM) to validate existing
305 deployment and final migration.
306
307 * Update the overcloud stack to deploy OVN alongside reference
308 implementation services using a temporary bridge "br-migration" instead
309 of br-int.
310
311 * Start the migration process:
312
313 1. generate the OVN north db by running neutron-ovn-db-sync util
314 2. clone the existing resources from br-int to br-migration, to ovn
315 find the same resources UUIDS over br-migration
316 3. re-assign ovn-controller to br-int instead of br-migration
317 4. cleanup network namespaces (fip, snat, qrouter, qdhcp),
318 5. remove any unnecessary patch ports on br-int
319 6. remove br-tun and br-migration ovs bridges
320 7. delete qr-*, ha-* and qg-* ports from br-int (via neutron netns
321 cleanup)
322
323 * Delete neutron agents and neutron HA internal networks from the database
324 via API.
325
326 * Validate connectivity on pre-migration resources.
327
328 * Delete pre-migration resources.
329
330 * Create post-migration resources.
331
332 * Validate connectivity on post-migration resources.
333
334 * Cleanup post-migration resources.
335
336 * Re-run deployment tool to update OVN on br-int.
201 337
2028. Run ``./ovn_migration.sh start-migration`` to kick start the migration
203 process.
204 338
205Migration is complete !!! 339Migration is complete !!!