summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMoshe Levi <moshele@mellanox.com>2018-07-15 16:14:21 +0300
committerMoshe Levi <moshele@mellanox.com>2019-01-08 13:30:45 +0200
commitf358fbdde9a1cadc838327b8bf34ee54a7e7f43a (patch)
tree2629b204372c10874e88791dca4e9293fb28d33f
parenta6daa1cd33cfd6d35a1e7694d1226d159ea95ed3 (diff)
Add Support for Smart NIC
Notes
Notes (review): Code-Review+2: Julia Kreger <juliaashleykreger@gmail.com> Code-Review+2: Shivanand Tendulker <stendulker@gmail.com> Workflow+1: Julia Kreger <juliaashleykreger@gmail.com> Verified+2: Zuul Submitted-by: Zuul Submitted-at: Tue, 22 Jan 2019 23:39:23 +0000 Reviewed-on: https://review.openstack.org/582767 Project: openstack/ironic-specs Branch: refs/heads/master
-rwxr-xr-xspecs/approved/support-smart-nic.rst412
l---------specs/not-implemented/support-smart-nic.rst1
2 files changed, 413 insertions, 0 deletions
diff --git a/specs/approved/support-smart-nic.rst b/specs/approved/support-smart-nic.rst
new file mode 100755
index 0000000..13f5b62
--- /dev/null
+++ b/specs/approved/support-smart-nic.rst
@@ -0,0 +1,412 @@
1..
2 This work is licensed under a Creative Commons Attribution 3.0 Unported
3 License.
4
5 http://creativecommons.org/licenses/by/3.0/legalcode
6
7====================
8Smart NIC Networking
9====================
10
11https://storyboard.openstack.org/#!/story/2003346
12
13This spec describes proposed changes to Ironic to enable a generic,
14vendor-agnostic, baremetal networking service running on smart NICs,
15enabling baremetal networking with feature parity to the virtualization
16use-case.
17
18Problem description
19===================
20
21While Ironic today supports Neutron provisioned network connectivity for
22baremetal servers through an ML2 mechanism driver, the existing support
23is based largely on configuration of TORs through vendor-specific mechanism
24drivers, with limited capabilities.
25
26Proposed change
27===============
28
29There is a wide range of smart/intelligent NICs emerging on the market.
30These NICs generally incorporate one or more general purpose CPU cores along
31with data-plane packet processing acceleration, and can efficiently run
32virtual switches such as OVS, while maintaining the existing interfaces to the
33SDN layer.
34
35The proposal is to extend Ironic to enable use of smart NICs to implement
36generic networking services for Bare Metal servers. The goal is to enable
37running the standard Neutron Open vSwitch L2 agent, providing a generic,
38vendor-agnostic bare metal networking service with feature parity compared
39to the virtualization use-case. The Neutron Open vSwitch L2 agent manages the
40OVS bridges on the smart NIC.
41
42In this proposal, we address two use-cases:
43
44#. Neutron OVS L2 agent runs locally on the smart NIC.
45
46 This use case requires a smart NIC capable or running openstack control
47 services such as the Neutron OVS L2 agent. This use case strives to view
48 the smart NIC as an isolated hypervisor for the baremetal node, with the
49 smart NIC providing the services to the bare metal image running on the host
50 (as a hypervisor would provide services to a VM). While this spec initially
51 targets Neutron OVS L2 agent, the same implementation would naturally and
52 easily be extended to any other ML2 plugin as well as to additional
53 agents/services (for example exposing emulated NVMe storage devices
54 back-ended by a storage initiator on the smart NIC).
55
56#. Neutron OVS L2 agent(s) run remotely and manages
57 the OVS bridges for all the baremetal smart NICs.
58
59
60The enhancements for Neutron OVS L2 agent captured in [1]_, [2]_ and [3]_.
61
62* Set the smart NIC configuration
63
64 smart NIC configuration includes the following:
65
66 #. extend the ironic port with is_smartnic field. (default to False)
67 #. smart NIC hostname - the hostname of server/smart NIC where the Neutron
68 OVS agent is running. (required)
69 #. smart NIC port id - the port name that needs to be plugged to the
70 integration bridge. B in the diagram below (required)
71 #. smart NIC SSH public key - ssh public key of the smart NIC
72 (required only for remote)
73 #. smart NIC OVSDB SSL certificate - OVSDB SSL of the OVS in smart NIC
74 (required only for remote)
75
76 The OVS ML2 mechanism driver will determine if the Neutron OVS Agent runs
77 locally or remotely based on smart NIC configuration passed from ironic.
78 The config attribute will be stored in the local_link_information of the
79 baremetal port.
80
81 In the scope of this spec the smart NIC config will be set manually by
82 the admin.
83
84* Deployment Interfaces
85
86 Extending the ramdisk, direct, iscsi and ansible to support the smart nic
87 use-cases.
88
89 The Deployment Interfaces call network interface methods such as:
90 add_provisioning_network, remove_provisioning_network,
91 configure_tenant_networks, unconfigure_tenant_networks, add_cleaning_network
92 and remove_cleaning_network.
93
94 These network methods are currently ordinarily called when the baremetal is
95 powered down, ensuring proper network configuration on the TOR before booting
96 the bare metal.
97
98 smart NICs share the power state with the baremetal, requiring the baremetal
99 to be powered up before configuring the network. This leads to a potential
100 race where the baremetal boots and access the network prior to the network
101 being properly configured on the OVS within the smart NIC.
102
103 To ensure proper network configuration prior to baremetal boot, the
104 deployment interfaces will intermittently boot the baremetal into the BIOS
105 shell, providing a state where the ovs on the smart NIC may be configured
106 properly before rebooting the bare metal into the actual guest image or
107 ramdisk.
108
109
110 The following code for configure/unconfigure network:
111
112 .. code-block:: python
113
114 if task.driver.network.need_power_on(task):
115 old_power_state = task.driver.power.get_power_state(task)
116 if old_power_state == states.POWER_OFF:
117 # set next boot to BIOS to halt the baremetal boot
118 manager_utils.node_set_boot_device(task, boot_devices.BIOS,
119 persistent=False)
120 manager_utils.node_power_action(task, states.POWER_ON)
121
122 # ...
123 # call task.driver.network method(s)
124 # ...
125
126 if task.driver.network.need_power_on(task):
127 manager_utils.node_power_action(task, old_power_state)
128
129 The following methods in the deployment interface are calling to one or
130 more configure/unconfigure networks and should be updated with the logic
131 above.
132
133 * iscsi Deploy Interface
134
135 - iscsi_deploy::prepare
136 - iscsi_deploy::deploy
137 - iscsi_deploy::tear_down
138
139 * ansible Deploy Interface
140
141 - ansible/deploy::reboot_and_finish_deploy
142 - ansible/deploy::prepare
143 - ansible/deploy::tear_down
144 - ansible/deploy::prepare_cleaning
145 - ansible/deploy::tear_down_cleaning
146
147 * direct Interface
148
149 - agent::prepare
150 - agent::tear_down
151 - agent::deploy
152 - agent::rescue
153 - agent::unrescue
154 - agent_base_vendor::reboot_and_finish_deploy
155 - agent_base_vendor::_finalize_rescue
156
157 * RAM Disk Interface
158
159 - pxe::deploy
160
161 * Common cleaning methods
162
163 - deploy_utils::prepare_inband_cleaning
164 - deploy_utils::tear_down_inband_clean
165
166* Network Interface
167
168 Extend the base `network_interface` with need_power_on -
169 return true if any ironic port attached to the node is a smart nic
170
171 Extend the ironic.common.neutron add_ports_to_network/
172 remove_ports_from_network methods for the smart NIC case:
173
174 * on add_ports_to_network and has smartNIC do the following:
175
176 - check neutron agent alive - verify that neutron agent is alive
177 - create neutron port
178 - check neutron port active - verify that neutron port is in active state
179
180 * on remove_ports_from_network and has smartNIC do the following:
181
182 - check neutron agent alive - verify that neutron agent is alive
183 - delete neutron port
184 - check neutron port is removed
185
186
187* Neutron ml2 OVS changes:
188
189 - Introduce a new vnic_type for ``smart-nic``.
190 - Update the Neutron ml2 OVS to bind smart-nic vnic_type with
191 `binding:profile` smart NIC config.
192
193* Neutron OVS agent changes:
194
195Example of smart NIC model::
196
197 +---------------------+
198 | baremetal |
199 | +-----------------+ |
200 | | OS Server | | |
201 | | | | |
202 | | +A | | |
203 | +------|--------+ | |
204 | | | |
205 | +------|--------+ | |
206 | | OS SmartNIC | | |
207 | | +-+B-+ | | |
208 | | |OVS | | | |
209 | | +-+C-+ | | |
210 | +------|--------+ | |
211 +--------|------------+
212 |
213
214 A - port on the baremetal host.
215 B - port that represents the baremetal port in the smart NIC.
216 C - port that represents to the physical port in the smart NIC.
217
218 Add/Remove Port B to the OVS br-int with external-ids
219
220 In our case we will use the neutron OVS agent to plug the port on update
221 port event with the following external-ids: iface-id,iface-status, attached-mac
222 and node-uuid
223
224
225Alternatives
226------------
227
228* Delay the Neutron port binding (port binding means setting all the
229 OVSDB/Openflows config on the SmartNIC) to be performed by Neutron
230 later (once the bare metal is powered up). The problem with this
231 approach is that we have no guarantee of if/when the rules will be
232 programmed, and thus may inadvertently boot the baremetal while
233 the smart NIC is still programmed on the old network.
234
235Data model impact
236-----------------
237
238A new ``is_smartnic`` boolean field will be added to Port object.
239
240
241State Machine Impact
242--------------------
243
244None
245
246REST API impact
247---------------
248
249The port REST API will be modified to support the new ``is_smartnic``
250field. The field will be readable by users with the baremetal observer role
251and writable by users with the baremetal admin role.
252
253Updates to the is_smartnic field of ports will be restricted in the
254same way as for other connectivity related fields (link local connection, etc.)
255- they will be restricted to nodes in the ``enroll``, ``inspecting`` and
256``manageable`` states.
257
258Client (CLI) impact
259-------------------
260
261
262"ironic" CLI
263~~~~~~~~~~~~
264
265None
266
267"openstack baremetal" CLI
268~~~~~~~~~~~~~~~~~~~~~~~~~
269
270The openstack baremetal CLI will be updated to support getting and setting the
271``is_smartnic`` field on ports.
272
273RPC API impact
274--------------
275
276None
277
278Driver API impact
279-----------------
280
281None
282
283Nova driver impact
284------------------
285
286None
287
288Ramdisk impact
289--------------
290
291None
292
293Security impact
294---------------
295
296* Smart NIC Isolation
297
298Both use cases run infrastructure functionality on the smart NIC, with
299the first use case also running control plane functionality.
300
301This requires proper isolation between the untrusted bare metal host and the
302smart NIC, preventing any/all direct or indirect access, both through the
303network interface exposed to the host and through side channels such as the
304platform BMC.
305
306Such isolation is implemented by the smart NIC device and/or the hardware
307platform vendor. There are multiple approaches for such isolation,
308ranging from completely physical disconnection of the smart NIC from the
309platform BMC to a platform with a trusted BMC wherein the BMC considers
310the baremetal host an untrusted entity and restricts its capabilities/access
311to the platform.
312
313In the absence of such isolation, the untrusted baremetal tenant
314may be able to gain access to the provisioning network, and in the second
315may be able to compromise the control plane.
316
317Proper isolation is dependent on the platform hardware/firmware, and cannot
318be directly enforced/guaranteed by ironic. Users of smart NIC use case should
319be made well aware of this via explicit documentation, and should be guided
320to verify the proper isolation exists on their platform when enabling such
321use cases.
322
323* Security Groups
324
325This will allow to use Neutron OVS agent pipeline. One of the features in the
326pipeline is security groups which will enhance the security model when using
327baremetal in a cloud.
328
329* Security credentials
330
331The node running the Neutron OVS agent (smart NIC or remote, according to use
332case) should be configured with the message bus credentials for the Neutron
333server.
334
335In addition, for the second use case, the SSH public key and OVSDB SSL
336certificate should be configured for the smart NIC port.
337
338
339Other end user impact
340---------------------
341
342* Baremetal admin needs to update the SmartNIC config manually.
343
344Scalability impact
345------------------
346
347None
348
349Performance Impact
350------------------
351
352None
353
354Other deployer impact
355---------------------
356
357None
358
359Developer impact
360----------------
361
362None
363
364Implementation
365==============
366
367Assignee(s)
368-----------
369
370Primary assignee:
371 hamdyk - hamdy@mellanox.com
372
373Work Items
374----------
375
376* Update the Neutron network interface to populate the Smart NIC config from
377 the ironic port to the Neutron port `binding:profile` attribute.
378* Update the network_interface and common.neutron as described above
379* Update deployment interfaces as described above
380* Documentation updates.
381
382
383Dependencies
384============
385
386None, but the Neutron specs [1]_, [2]_ and [3]_ depend on this spec.
387
388Testing
389=======
390
391* Mellanox CI Jobs testing with Bluefield SmartNIC
392
393Upgrades and Backwards Compatibility
394====================================
395
396None
397
398
399Documentation Impact
400====================
401
402* Update the multitenancy.rst with setting the SmartNIC config
403* Document the security implications/guidelines under admin/security.rst
404
405References
406==========
407
408.. [1] https://review.openstack.org/#/c/619920/
409
410.. [2] https://review.openstack.org/#/c/595402/
411
412.. [3] https://review.openstack.org/#/c/595512/
diff --git a/specs/not-implemented/support-smart-nic.rst b/specs/not-implemented/support-smart-nic.rst
new file mode 120000
index 0000000..f2f01cf
--- /dev/null
+++ b/specs/not-implemented/support-smart-nic.rst
@@ -0,0 +1 @@
../approved/support-smart-nic.rst \ No newline at end of file