summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSpyros Trigazis <strigazi@gmail.com>2016-11-09 16:28:28 +0100
committerSpyros Trigazis <strigazi@gmail.com>2016-11-09 16:35:04 +0100
commit8b0a99e44b7d2c7b9cb9126ab9a4ea9d8b0d818a (patch)
tree8a1ee5e3af1ec291598a4575be3c57f3d20e12f6
parent74da0acf5d11efd3e7647f4f6175cc6470310662 (diff)
Import all implemented specs from magnum repo
* Add all implemented specs to specs/implemented. * Create ocata directory to store the corresponding specs. Change-Id: I36da3a26ece3b6f7d27b7af80460303f4a6c4226
Notes
Notes (review): Code-Review+2: Adrian Otto <adrian.otto@rackspace.com> Code-Review-1: Hieu LE <hieulq@vn.fujitsu.com> Code-Review+1: Randall Burt <randall.burt@rackspace.com> Code-Review+2: Jaycen Grant <jaycen.v.grant@intel.com> Workflow+1: Jaycen Grant <jaycen.v.grant@intel.com> Verified+2: Jenkins Submitted-by: Jenkins Submitted-at: Fri, 18 Nov 2016 14:51:12 +0000 Reviewed-on: https://review.openstack.org/395674 Project: openstack/magnum-specs Branch: refs/heads/master
l---------doc/source/specs1
-rw-r--r--specs/implemented/async-container-operation.rst452
-rw-r--r--specs/implemented/bay-drivers.rst344
-rw-r--r--specs/implemented/container-networking-model.rst458
-rw-r--r--specs/implemented/container-volume-integration-model.rst500
-rw-r--r--specs/implemented/containers-service.rst400
-rw-r--r--specs/implemented/create-trustee-user-for-each-bay.rst186
-rw-r--r--specs/implemented/magnum-horizon-plugin.rst171
-rw-r--r--specs/implemented/open-dcos.rst177
-rw-r--r--specs/implemented/resource-quotas.rst252
-rw-r--r--specs/implemented/tls-support-magnum.rst226
11 files changed, 3167 insertions, 0 deletions
diff --git a/doc/source/specs b/doc/source/specs
new file mode 120000
index 0000000..87a4030
--- /dev/null
+++ b/doc/source/specs
@@ -0,0 +1 @@
../../specs \ No newline at end of file
diff --git a/specs/implemented/async-container-operation.rst b/specs/implemented/async-container-operation.rst
new file mode 100644
index 0000000..fa8ba73
--- /dev/null
+++ b/specs/implemented/async-container-operation.rst
@@ -0,0 +1,452 @@
1=================================
2Asynchronous Container Operations
3=================================
4
5Launchpad blueprint:
6
7https://blueprints.launchpad.net/magnum/+spec/async-container-operations
8
9At present, container operations are done in a synchronous way, end-to-end.
10This model does not scale well, and incurs a penalty on the client to be
11stuck till the end of completion of the operation.
12
13Problem Description
14-------------------
15
16At present Magnum-Conductor executes the container operation as part of
17processing the request forwarded from Magnum-API. For
18container-create, if the image needs to be pulled down, it may take
19a while depending on the responsiveness of the registry, which can be a
20substantial delay. At the same time, experiments suggest that even for
21pre-pulled image, the time taken by each operations, namely
22create/start/delete, are in the same order, as it involves complete turn
23around between the magnum-client and the COE-API, via Magnum-API and
24Magnum-Conductor[1].
25
26Use Cases
27---------
28
29For wider enterprise adoption of Magnum, we need it to scale better.
30For that we need to replace some of these synchronous behaviors with
31suitable alternative of asynchronous implementation.
32
33To understand the use-case better, we can have a look at the average
34time spent during container operations, as noted at[1].
35
36Proposed Changes
37----------------
38
39The design has been discussed over the ML[6]. The conclusions have been kept
40on the 'whiteboard' of the Blueprint.
41
42The amount of code change is expected to be significant. To ease the
43process of adoption, code review, functional tests, an approach of phased
44implementation may be required. We can define the scope of the three phases of
45the implementation as follows -
46
47* Phase-0 will bring in the basic feature of asynchronous mode of operation in
48 Magnum - (A) from API to Conductor and (B) from Conductor to COE-API. During
49 phase-0, this mode will be optional through configuration.
50
51 Both the communications of (A) and (B) are proposed to be made asynchronous
52 to achieve the best of it. If we do (A) alone, it does not gain us much, as
53 (B) takes up the higher cycles of the operation. If we do (B) alone, it does
54 not make sense, as (A) will synchronously wait for no meaningful data.
55
56* Phase-1 will concentrate on making the feature persistent to address various
57 scenarios of conductor restart, worker failure etc. We will support this
58 feature for multiple Conductor-workers in this phase.
59
60* Phase-2 will select asynchronous mode of operation as the default mode. At
61 the same time, we can evaluate to drop the code for synchronous mode, too.
62
63
64Phase-0 is required as a meaningful temporary step, to establish the
65importance and tangible benefits of phase-1. This is also to serve as a
66proof-of-concept at a lower cost of code changes with a configurable option.
67This will enable developers and operators to have a taste of the feature,
68before bringing in the heavier dependencies and changes proposed in phase-1.
69
70A reference implementation for the phase-0 items, has been put for review[2].
71
72Following is the summary of the design -
73
741. Configurable mode of operation - async
75-----------------------------------------
76
77For ease of adoption, the async_mode of communication between API-conductor,
78conductor-COE in magnum, can be controlled using a configuration option. So
79the code-path for sync mode and async mode would co-exist for now. To achieve
80this with minimal/no code duplication and cleaner interface, we are using
81openstack/futurist[4]. Futurist interface hides the details of type of executor
82being used. In case of async configuration, a greenthreadpool of configured
83poolsize gets created. Here is a sample of how the config would look
84like: ::
85
86 [DEFAULT]
87 async_enable = False
88
89 [conductor]
90 async_threadpool_max_workers = 64
91
92Futurist library is used in oslo.messaging. Thus, it is used by almost all
93OpenStack projects, in effect. Futurist is very useful to run same code
94under different execution model and hence saving potential duplication of
95code.
96
97
982. Type of operations
99---------------------
100
101There are two classes of container operations - one that can be made async,
102namely create/delete/start/stop/pause/unpause/reboot, which do not need data
103about the container in return. The other type requires data, namely
104container-logs. For async-type container-operations, magnum-API will be
105using 'cast' instead of 'call' from oslo_messaging[5].
106
107'cast' from oslo.messaging.rpcclient is used to invoke a method and return
108immediately, whereas 'call' invokes a method and waits for a reply. While
109operating in asynchronous mode, it is intuitive to use cast method, as the
110result of the response may not be available immediately.
111
112Magnum-api first fetches the details of a container, by doing
113'get_rpc_resource'. This function uses magnum objects. Hence, this function
114uses a 'call' method underneath. Once, magnum-api gets back the details,
115it issues the container operation next, using another 'call' method.
116The above proposal is to replace the second 'call' with 'cast'.
117
118If user issues a container operation, when there is no listening
119conductor (because of process failure), there will be a RPC timeout at the
120first 'call' method. In this case, user will observe the request to
121get blocked at client and finally fail with HTTP 500 ERROR, after the RPC
122timeout, which is 60 seconds by default. This behavior is independent of the
123usage of 'cast' or 'call' for the second message, mentioned above. This
124behavior does not influence our design, but it is documented here for clarity
125of understanding.
126
127
1283. Ensuring the order of execution - Phase-0
129--------------------------------------------
130
131Magnum-conductor needs to ensure that for a given bay and given container,
132the operations are executed in sequence. In phase-0, we want to demonstrate
133how asynchronous behavior helps scaling. Asynchronous mode of container
134operations would be supported for single magnum-conductor scenario, in
135phase-0. If magnum-conductor crashes, there will be no recovery for the
136operations accepted earlier - which means no persistence in phase-0, for
137operations accepted by magnum-conductor. Multiple conductor scenario and
138persistence will be addressed in phase-1 [please refer to the next section
139for further details]. If COE crashes or does not respond, the error will be
140detected, as it happens in sync mode, and reflected on the container-status.
141
142Magnum-conductor will maintain a job-queue. Job-queue is indexed by bay-id and
143container-id. A job-queue entry would contain the sequence of operations
144requested for a given bay-id and container-id, in temporal order. A
145greenthread will execute the tasks/operations in order for a given job-queue
146entry, till the queue empties. Using a greenthread in this fashion saves us
147from the cost and complexity of locking, along with functional correctness.
148When request for new operation comes in, it gets appended to the corresponding
149queue entry.
150
151For a sequence of container operations, if an intermediate operation fails,
152we will stop continuing the sequence. The community feels more confident to
153start with this strictly defensive policy[17]. The failure will be logged
154and saved into the container-object, which will help an operator be informed
155better about the result of the sequence of container operations. We may revisit
156this policy later, if we think it is too restrictive.
157
1584. Ensuring the order of execution - phase-1
159--------------------------------------------
160
161The goal is to execute requests for a given bay and a given container in
162sequence. In phase-1, we want to address persistence and capability of
163supporting multiple magnum-conductor processes. To achieve this, we will
164reuse the concepts laid out in phase-0 and use a standard library.
165
166We propose to use taskflow[7] for this implementation. Magnum-conductors
167will consume the AMQP message and post a task[8] on a taskflow jobboard[9].
168Greenthreads from magnum-conductors would subscribe to the taskflow
169jobboard as taskflow-conductors[10]. Taskflow jobboard is maintained with
170a choice of persistent backend[11]. This will help address the concern of
171persistence for accepted operations, when a conductor crashes. Taskflow
172will ensure that tasks, namely container operations, in a job, namely a
173sequence of operations for a given bay and container, would execute in
174sequence. We can easily notice that some of the concepts used in phase-0
175are reused as it is. For example, job-queue maps to jobboard here, use of
176greenthread maps to the conductor concept of taskflow. Hence, we expect easier
177migration from phase-0 to phase-1, with the choice of taskflow.
178
179For taskflow jobboard[11], the available choices of backend are Zookeeper and
180Redis. But, we plan to use MySQL as default choice of backend, for magnum
181conductor jobboard use-case. This support will be added to taskflow. Later,
182we may choose to support the flexibility of other backends like ZK/Redis via
183configuration. But, phase-1 will keep the implementation simple with MySQL
184backend and revisit this, if required.
185
186Let's consider the scenarios of Conductor crashing -
187 - If a task is added to jobboard, and conductor crashes after that,
188 taskflow can assign a particular job to any available greenthread agents
189 from other conductor instances. If the system was running with single
190 magnum-conductor, it will wait for the conductor to come back and join.
191 - A task is picked up and magnum-conductor crashes. In this case, the task
192 is not complete from jobboard point-of-view. As taskflow detects the
193 conductor going away, it assigns another available conductor.
194 - When conductor picks up a message from AMQP, it will acknowledge AMQP,
195 only after persisting it to jobboard. This will prevent losing the message,
196 if conductor crashes after picking up the message from AMQP. Explicit
197 acknowledgement from application may use NotificationResult.HANDLED[12]
198 to AMQP. We may use the at-least-one-guarantee[13] feature in
199 oslo.messaging[14], as it becomes available.
200
201To summarize some of the important outcomes of this proposal -
202 - A taskflow job represents the sequence of container operations on a given
203 bay and given container. At a given point of time, the sequence may contain
204 a single or multiple operations.
205 - There will be a single jobboard for all conductors.
206 - Task-flow conductors are multiple greenthreads from a given
207 magnum-conductor.
208 - Taskflow-conductor will run in 'blocking' mode[15], as those greenthreads
209 have no other job than claiming and executing the jobs from jobboard.
210 - Individual jobs are supposed to maintain a temporal sequence. So the
211 taskflow-engine would be 'serial'[16].
212 - The proposed model for a 'job' is to consist of a temporal sequence of
213 'tasks' - operations on a given bay and a given container. Henceforth,
214 it is expected that when a given operation, namely container-create is in
215 progress, a request for container-start may come in. Adding the task to
216 the existing job is intuitive to maintain the sequence of operations.
217
218To fit taskflow exactly into our use-case, we may need to do two enhancements
219in taskflow -
220- Supporting mysql plugin as a DB backend for jobboard. Support for redis
221exists, so it will be similar.
222We do not see any technical roadblock for adding mysql support for taskflow
223jobboard. If the proposal does not get approved by taskflow team, we may have
224to use redis, as an alternative option.
225- Support for dynamically adding tasks to a job on jobboard. This also looks
226feasible, as discussed over the #openstack-state-management [Unfortunately,
227this channel is not logged, but if we agree in this direction, we can initiate
228discussion over ML, too]
229If taskflow team does not allow adding this feature, even though they have
230agreed now, we will use the dependency feature in taskflow. We will explore
231and elaborate this further, if it requires.
232
233
2345. Status of progress
235---------------------
236
237The progress of execution of a container operation is reflected on the status
238of a container as - 'create-in-progress', 'delete-in-progress' etc.
239
240Alternatives
241------------
242
243Without an asynchronous implementation, Magnum will suffer from complaints
244about poor scalability and slowness.
245
246In this design, stack-lock[3] has been considered as an alternative to
247taskflow. Following are the reasons for preferring taskflow over
248stack-lock, as of now,
249- Stack-lock used in Heat is not a library, so it will require making a copy
250for Magnum, which is not desirable.
251- Taskflow is relatively mature, well supported, feature-rich library.
252- Taskflow has in-built capacity to scale out[in] as multiple conductors
253can join in[out] the cluster.
254- Taskflow has a failure detection and recovery mechanism. If a process
255crashes, then worker threads from other conductor may continue the execution.
256
257In this design, we describe futurist[4] as a choice of implementation. The
258choice was to prevent duplication of code for async and sync mode. For this
259purpose, we could not find any other solution to compare.
260
261Data model impact
262-----------------
263
264Phase-0 has no data model impact. But phase-1 may introduce an additional
265table into the Magnum database. As per the present proposal for using taskflow
266in phase-1, we have to introduce a new table for jobboard under magnum db.
267This table will be exposed to taskflow library as a persistent db plugin.
268Alternatively, an implementation with stack-lock will also require an
269introduction of a new table for stack-lock objects.
270
271REST API impact
272---------------
273
274None.
275
276Security impact
277---------------
278
279None.
280
281Notifications impact
282--------------------
283
284None
285
286Other end user impact
287---------------------
288
289None
290
291Performance impact
292------------------
293
294Asynchronous mode of operation helps in scalability. Hence, it improves
295responsiveness and reduces the turn around time in a significant
296proportion. A small test on devstack, comparing both the modes,
297demonstrate this with numbers.[1]
298
299Other deployer impact
300---------------------
301
302None.
303
304Developer impact
305----------------
306
307None
308
309Implementation
310--------------
311
312Assignee(s)
313-----------
314
315Primary assignee
316 suro-patz(Surojit Pathak)
317
318Work Items
319----------
320
321For phase-0
322* Introduce config knob for asynchronous mode of container operations.
323
324* Changes for Magnum-API to use CAST instead of CALL for operations eligible
325 for asynchronous mode.
326
327* Implement the in-memory job-queue in Magnum conductor, and integrate futurist
328 library.
329
330* Unit tests and functional tests for async mode.
331
332* Documentation changes.
333
334For phase-1
335* Get the dependencies on taskflow being resolved.
336
337* Introduce jobboard table into Magnum DB.
338
339* Integrate taskflow in Magnum conductor to replace the in-memory job-queue
340 with taskflow jobboard. Also, we need conductor greenthreads to subscribe
341 as workers to the taskflow jobboard.
342
343* Add unit tests and functional tests for persistence and multiple conductor
344 scenario.
345
346* Documentation changes.
347
348For phase-2
349* We will promote asynchronous mode of operation as the default mode of
350operation.
351
352* We may decide to drop the code for synchronous mode and corresponding config.
353
354* Documentation changes.
355
356
357Dependencies
358------------
359
360For phase-1, if we choose to implement using taskflow, we need to get
361following two features added to taskflow first -
362* Ability to add new task to an existing job on jobboard.
363* mysql plugin support as persistent DB.
364
365Testing
366-------
367
368All the existing test cases are run to ensure async mode does not break them.
369Additionally more functional tests and unit tests will be added specific to
370async mode.
371
372Documentation Impact
373--------------------
374
375Magnum documentation will include a description of the option for asynchronous
376mode of container operations and its benefits. We will also add to
377developer documentation on guideline for implementing a container operation in
378both the modes - sync and async. We will add a section on 'how to debug
379container operations in async mode'. The phase-0 and phase-1 implementation
380and their support for single or multiple conductors will be clearly documented
381for the operators.
382
383References
384----------
385
386[1] - Execution time comparison between sync and async modes:
387
388https://gist.github.com/surojit-pathak/2cbdad5b8bf5b569e755
389
390[2] - Proposed change under review:
391
392https://review.openstack.org/#/c/267134/
393
394[3] - Heat's use of stacklock
395
396http://docs.openstack.org/developer/heat/_modules/heat/engine/stack_lock.html
397
398[4] - openstack/futurist
399
400http://docs.openstack.org/developer/futurist/
401
402[5] - openstack/oslo.messaging
403
404http://docs.openstack.org/developer/oslo.messaging/rpcclient.html
405
406[6] - ML discussion on the design
407
408http://lists.openstack.org/pipermail/openstack-dev/2015-December/082524.html
409
410[7] - Taskflow library
411
412http://docs.openstack.org/developer/taskflow/
413
414[8] - task in taskflow
415
416http://docs.openstack.org/developer/taskflow/atoms.html#task
417
418[9] - job and jobboard in taskflow
419
420http://docs.openstack.org/developer/taskflow/jobs.html
421
422[10] - conductor in taskflow
423
424http://docs.openstack.org/developer/taskflow/conductors.html
425
426[11] - persistent backend support in taskflow
427
428http://docs.openstack.org/developer/taskflow/persistence.html
429
430[12] - oslo.messaging notification handler
431
432http://docs.openstack.org/developer/oslo.messaging/notification_listener.html
433
434[13] - Blueprint for at-least-once-guarantee, oslo.messaging
435
436https://blueprints.launchpad.net/oslo.messaging/+spec/at-least-once-guarantee
437
438[14] - Patchset under review for at-least-once-guarantee, oslo.messaging
439
440https://review.openstack.org/#/c/229186/
441
442[15] - Taskflow blocking mode for conductor
443
444http://docs.openstack.org/developer/taskflow/conductors.html#taskflow.conductors.backends.impl_executor.ExecutorConductor
445
446[16] - Taskflow serial engine
447
448http://docs.openstack.org/developer/taskflow/engines.html
449
450[17] - Community feedback on policy to handle failure within a sequence
451
452http://eavesdrop.openstack.org/irclogs/%23openstack-containers/%23openstack-containers.2016-03-08.log.html#t2016-03-08T20:41:17
diff --git a/specs/implemented/bay-drivers.rst b/specs/implemented/bay-drivers.rst
new file mode 100644
index 0000000..57bc7fc
--- /dev/null
+++ b/specs/implemented/bay-drivers.rst
@@ -0,0 +1,344 @@
1..
2 This work is licensed under a Creative Commons Attribution 3.0 Unported
3 License.
4
5 http://creativecommons.org/licenses/by/3.0/legalcode
6
7======================================
8Container Orchestration Engine drivers
9======================================
10
11Launchpad blueprint:
12
13https://blueprints.launchpad.net/magnum/+spec/bay-drivers
14
15Container Orchestration Engines (COEs) are different systems for managing
16containerized applications in a clustered environment, each having their own
17conventions and ecosystems. Three of the most common, which also happen to be
18supported in Magnum, are: Docker Swarm, Kubernetes, and Mesos. In order to
19successfully serve developers, Magnum needs to be able to provision and manage
20access to the latest COEs through its API in an effective and scalable way.
21
22
23Problem description
24===================
25
26Magnum currently supports the three most popular COEs, but as more emerge and
27existing ones change, it needs an effective and scalable way of managing
28them over time.
29
30One of the problems with the current implementation is that COE-specific logic,
31such as Kubernetes replication controllers and services, is situated in the
32core Magnum library and made available to users through the main API. Placing
33COE-specific logic in a core API introduces tight coupling and forces
34operators to work with an inflexible design.
35
36By formalising a more modular and extensible architecture, Magnum will be
37in a much better position to help operators and consumers satisfy custom
38use-cases.
39
40Use cases
41---------
42
431. Extensibility. Contributors and maintainers need a suitable architecture to
44 house current and future COE implementations. Moving to a more extensible
45 architecture, where core classes delegate to drivers, provides a more
46 effective and elegant model for handling COE differences without the need
47 for tightly coupled and monkey-patched logic.
48
49 One of the key use cases is allowing operators to customise their
50 orchestration logic, such as modifying Heat templates or even using their
51 own tooling like Ansible. Moreover, operators will often expect to use a
52 custom distro image with lots of software pre-installed and many special
53 security requirements that is extremely difficult or impossible to do with
54 the current upstream templates. COE drivers solves these problems.
55
562. Maintainability. Moving to a modular architecture will be easier to manage
57 in the long-run because the responsibility of maintaining non-standard
58 implementations is shifted into the operator's domain. Maintaining the
59 default drivers which are packaged with Magnum will also be easier and
60 cleaner since logic is now demarcated from core codebase directories.
61
623. COE & Distro choice. In the community there has been a lot of discussion
63 about which distro and COE combination to support with the templates.
64 Having COE drivers allows for people or organizations to maintain
65 distro-specific implementations (e.g CentOS+Kubernetes).
66
674. Addresses dependency concerns. One of the direct results of
68 introducing a driver model is the ability to give operators more freedom
69 about choosing how Magnum integrates with the rest of their OpenStack
70 platform. For example, drivers would remove the necessity for users to
71 adopt Barbican for secret management.
72
735. Driver versioning. The new driver model allows operators to modify existing
74 drivers or creating custom ones, release new bay types based on the newer
75 version, and subsequently launch news bays running the updated
76 functionality. Existing bays which are based on older driver versions would
77 be unaffected in this process and would still be able to have lifecycle
78 operations performed on them. If one were to list their details from the
79 API, it would reference the old driver version. An operator can see which
80 driver version a bay type is based on through its ``driver`` value,
81 which is exposed through the API.
82
83Proposed change
84===============
85
861. The creation of new directory at the project root: ``./magnum/drivers``.
87 Each driver will house its own logic inside its own directory. Each distro
88 will house its own logic inside that driver directory. For example, the
89 Fedora Atomic distro using Swarm will have the following directory
90 structure:
91
92 ::
93
94 drivers/
95 swarm_atomic_v1/
96 image/
97 ...
98 templates/
99 ...
100 api.py
101 driver.py
102 monitor.py
103 scale.py
104 template_def.py
105 version.py
106
107
108 The directory name should be a string which uniquely identifies the driver
109 and provides a descriptive reference. The driver version number and name are
110 provided in the manifest file and will be included in the bay metadata at
111 cluster build time.
112
113 There are two workflows for rolling out driver updates:
114
115 - if the change is relatively minor, they modify the files in the
116 existing driver directory and update the version number in the manifest
117 file.
118
119 - if the change is significant, they create a new directory
120 (either from scratch or by forking).
121
122 Further explanation of the three top-level files:
123
124 - an ``image`` directory is *optional* and should contain documentation
125 which tells users how to build the image and register it to glance. This
126 directory can also hold artifacts for building the image, for instance
127 diskimagebuilder elements, scripts, etc.
128
129 - a ``templates`` directory is *required* and will (for the forseeable
130 future) store Heat template YAML files. In the future drivers will allow
131 operators to use their own orchestration tools like Ansible.
132
133 - ``api.py`` is *optional*, and should contain the API controller which
134 handles custom API operations like Kubernetes RCs or Pods. It will be
135 this class which accepts HTTP requests and delegates to the Conductor. It
136 should contain a uniquely named class, such as ``SwarmAtomicXYZ``, which
137 extends from the core controller class. The COE class would have the
138 opportunity of overriding base methods if necessary.
139
140 - ``driver.py`` is *required*, and should contain the logic which maps
141 controller actions to COE interfaces. These include: ``bay_create``,
142 ``bay_update``, ``bay_delete``, ``bay_rebuild``, ``bay_soft_reboot`` and
143 ``bay_hard_reboot``.
144
145 - ``version.py`` is *required*, and should contain the version number of
146 the bay driver. This is defined by a ``version`` attribute and is
147 represented in the ``1.0.0`` format. It should also include a ``Driver``
148 attribute and should be a descriptive name such as ``swarm_atomic``.
149
150 Due to the varying nature of COEs, it is up to the bay
151 maintainer to implement this in their own way. Since a bay is a
152 combination of a COE and an image, ``driver.py`` will also contain
153 information about the ``os_distro`` property which is expected to be
154 attributed to Glance image.
155
156 - ``monitor.py`` is *optional*, and should contain the logic which monitors
157 the resource utilization of bays.
158
159 - ``template_def.py`` is *required* and should contain the COE's
160 implementation of how orchestration templates are loaded and matched to
161 Magnum objects. It would probably contain multiple classes, such as
162 ``class SwarmAtomicXYZTemplateDef(BaseTemplateDefinition)``.
163
164 - ``scale.py`` is *optional* per bay specification and should contain the
165 logic for scaling operations.
166
1672. Renaming the ``coe`` attribute of BayModel to ``driver``. Because this
168 value would determine which driver classes and orchestration templates to
169 load, it would need to correspond to the name of the driver as it is
170 registered with stevedore_ and setuptools entry points.
171
172 During the lifecycle of an API operation, top-level Magnum classes (such as
173 a Bay conductor) would then delegate to the driver classes which have been
174 dynamically loaded. Validation will need to ensure that whichever value
175 is provided by the user is correct.
176
177 By default, drivers are located under the main project directory and their
178 namespaces are accessible via ``magnum.drivers.foo``. But a use case that
179 needs to be looked at and, if possible, provided for is drivers which are
180 situated outside the project directory, for example in
181 ``/usr/share/magnum``. This will suit operators who want greater separation
182 between customised code and Python libraries.
183
1843. The driver implementations for the 3 current COE and Image combinations:
185 Docker Swarm Fedora, Kubernetes Fedora, Kubernetes CoreOS, and Mesos
186 Ubuntu. Any templates would need to be moved from
187 ``magnum/templates/{coe_name}`` to
188 ``magnum/drivers/{driver_name}/templates``.
189
1904. Removal of the following files:
191
192 ::
193
194 magnum/magnum/conductor/handlers/
195 docker_conductor.py
196 k8s_conducter.py
197
198Design Principles
199-----------------
200
201- Minimal, clean API without a high cognitive burden
202
203- Ensure Magnum's priority is to do one thing well, but allow extensibility
204 by external contributors
205
206- Do not force ineffective abstractions that introduce feature divergence
207
208- Formalise a modular and loosely coupled driver architecture that removes
209 COE logic from the core codebase
210
211
212Alternatives
213------------
214
215This alternative relates to #5 of Proposed Change. Instead of having a
216drivers registered using stevedore_ and setuptools entry points, an alternative
217is to use the Magnum config instead.
218
219
220Data model impact
221-----------------
222
223Since drivers would be implemented for the existing COEs, there would be
224no loss of functionality for end-users.
225
226
227REST API impact
228---------------
229
230Attribute change when creating and updating a BayModel (``coe`` to
231``driver``). This would occur before v1 of the API is frozen.
232
233COE-specific endpoints would be removed from the core API.
234
235
236Security impact
237---------------
238
239None
240
241
242Notifications impact
243--------------------
244
245None
246
247
248Other end user impact
249---------------------
250
251There will be deployer impacts because deployers will need to select
252which drivers they want to activate.
253
254
255Performance Impact
256------------------
257
258None
259
260
261
262Other deployer impact
263---------------------
264
265In order to utilize new functionality and bay drivers, operators will need
266to update their installation and configure bay models to use a driver.
267
268
269Developer impact
270----------------
271
272Due to the significant impact on the current codebase, a phased implementation
273approach will be necessary. This is defined in the Work Items section.
274
275Code will be contributed for COE-specific functionality in a new way, and will
276need to abide by the new architecture. Documentation and a good first
277implementation will play an important role in helping developers contribute
278new functionality.
279
280
281Implementation
282==============
283
284
285Assignee(s)
286-----------
287
288Primary assignee:
289murali-allada
290
291Other contributors:
292jamiehannaford
293strigazi
294
295
296Work Items
297----------
298
2991. New ``drivers`` directory
300
3012. Change ``coe`` attribute to ``driver``
302
3033. COE drivers implementation (swarm-fedora, k8s-fedora, k8s-coreos,
304 mesos-ubuntu). Templates should remain in directory tree until their
305 accompanying driver has been implemented.
306
3074. Delete old conductor files
308
3095. Update client
310
3116. Add documentation
312
3137. Improve user experience for operators of forking/creating new
314 drivers. One way we could do this is by creating new client commands or
315 scripts. This is orthogonal to this spec, and will be considered after
316 its core implementation.
317
318Dependencies
319============
320
321None
322
323
324Testing
325=======
326
327Each commit will be accompanied with unit tests, and Tempest functional tests.
328
329
330Documentation Impact
331====================
332
333A set of documentation for this architecture will be required. We should also
334provide a developer guide for creating a new bay driver and updating existing
335ones.
336
337
338References
339==========
340
341`Using Stevedore in your Application
342<http://docs.openstack.org/developer/stevedore/tutorial/index.html/>`_.
343
344.. _stevedore: http://docs.openstack.org/developer/stevedore/
diff --git a/specs/implemented/container-networking-model.rst b/specs/implemented/container-networking-model.rst
new file mode 100644
index 0000000..70d8f67
--- /dev/null
+++ b/specs/implemented/container-networking-model.rst
@@ -0,0 +1,458 @@
1..
2 This work is licensed under a Creative Commons Attribution 3.0 Unported
3 License.
4
5 http://creativecommons.org/licenses/by/3.0/legalcode
6
7=================================
8Magnum Container Networking Model
9=================================
10
11Launchpad Blueprint:
12
13https://blueprints.launchpad.net/magnum/+spec/extensible-network-model
14
15For Magnum to prosper, the project must support a range of networking tools
16and techniques, while maintaining a simple, developer-focused user
17experience. The first step in achieving this goal is to standardize the
18process of allocating networking to containers, while providing an
19abstraction for supporting various networking capabilities through
20pluggable back-end implementations. This document recommends using Docker's
21libnetwork library to implement container networking abstractions and
22plugins. Since libnetwork is not a standard and the container ecosystem
23is rapidly evolving, the Magnum community should continue evaluating
24container networking options on a frequent basis.
25
26Problem Description
27===================
28
29The container networking ecosystem is undergoing rapid changes. The
30networking tools and techniques used in today's container deployments are
31different than twelve months ago and will continue to evolve. For example,
32Flannel [6]_, Kubernetes preferred networking implementation, was initially
33released in July of 2014 and was not considered preferred until early 2015.
34
35Furthermore, the various container orchestration engines have not
36standardized on a container networking implementation and may never. For
37example, Flannel is the preferred container networking implementation for
38Kubernetes but not for Docker Swarm. Each container networking implementation
39comes with its own API abstractions, data model, tooling, etc.. Natively
40supporting each container networking implementation can be a burden on the
41Magnum community and codebase. By supporting only a subset of container
42networking implementations, the project may not be widely adopted or may
43provide a suboptimal user experience.
44
45Lastly, Magnum has limited support for advanced container networking
46functionality. Magnum instantiates container networks behind the scenes
47through Heat templates, exposing little-to-no user configurability. Some
48users require the ability to customize their container environments,
49including networking details. However, networking needs to "just work" for
50users that require no networking customizations.
51
52Roles
53-----
54
55The following are roles that the Magnum Container Networking Model takes
56into consideration. Roles are an important reference point when creating
57user stories. This is because each role provides different functions and
58has different requirements.
59
601. Cloud Provider (CP): Provides standard OpenStack cloud infrastructure
61 services, including the Magnum service.
62
632. Container Service Provider (CSP): Uses Magnum to deliver
64 Containers-as-a-Service (CaaS) to users. CSPs are a consumer of CP
65 services and a CaaS provider to users.
66
673. Users: Consume Magnum services to provision and manage clustered
68 container environments and deploy apps within the container clusters.
69
70The container ecosystem focuses on the developer user type. It is imperative
71that the Magnum Container Networking Model meets the need of this user type.
72
73These roles are not mutually exclusive. For example:
74
751. A CP can also be a CSP. In this case, the CP/CSP provisions and manages
76 standard OpenStack services, the Magnum service, and provides CaaS
77 services to users.
78
792. A User can also be a CSP. In this case, the user provisions their own
80 baymodels, bays, etc. from the CP.
81
82Definitions
83-----------
84
85COE
86 Container Orchestration Engine
87
88Baymodel
89 An object that stores template information about the bay which is
90 used to create new bays consistently.
91
92Bay
93 A Magnum resource that includes at least one host to run containers on,
94 and a COE to manage containers created on hosts within the bay.
95
96Pod
97 Is the smallest deployable unit that can be created, scheduled, and
98 managed within Kubernetes.
99
100Additional Magnum definitions can be found in the Magnum Developer
101documentation [2]_.
102
103Use Cases
104----------
105
106This document does not intend to address each use case. The use cases are
107provided as reference for the long-term development of the Magnum Container
108Networking Model.
109
110As a User:
111
1121. I need to easily deploy containerized apps in an OpenStack cloud.
113 My user experience should be similar to how I deploy containerized apps
114 outside of an OpenStack cloud.
115
1162. I need to have containers communicate with vm-based apps that use
117 OpenStack networking.
118
1193. I need the option to preserve the container's IP address so I can
120 manage containers by IP's, not just ports.
121
1224. I need to block unwanted traffic to/from my containerized apps.
123
1245. I need the ability for my containerized apps to be highly available.
125
1266. I need confidence that my traffic is secure from other tenants traffic.
127
128As a CSP:
129
1301. I need to easily deploy a bay for consumption by users. The bay must
131 support the following:
132
133 A. One or more hosts to run containers.
134 B. The ability to choose between virtual or physical hosts to run
135 containers.
136 C. The ability to automatically provision networking to containers.
137
1382. I need to provide clustering options that support different
139 container/image, formats and technologies.
140
1413. After deploying my initial cluster, I need the ability to provide ongoing
142 management, including:
143
144 A. The ability to add/change/remove networks that containers connect to.
145 B. The ability to add/change/remove nodes within the cluster.
146
1474. I need to deploy a Bay without admin rights to OpenStack services.
148
1495. I need the freedom to choose different container networking tools and
150 techniques offered by the container ecosystem beyond OpenStack.
151
152As a CP:
153
1541. I need to easily and reliably add the Magnum service to my existing
155 OpenStack cloud environment.
156
1572. I need to easily manage (monitor, troubleshoot, etc..) the Magnum
158 service. Including the ability to mirror ports to capture traffic
159 for analysis.
160
1613. I need to make the Magnum services highly-available.
162
1634. I need to make Magnum services highly performant.
164
1655. I need to easily scale-out Magnum services as needed.
166
1676. I need Magnum to be robust regardless of failures within the container
168 orchestration engine.
169
170Proposed Changes
171================
172
1731. Currently, Magnum supports Flannel [6]_ as the only multi-host container
174 networking implementation. Although Flannel has become widely accepted
175 for providing networking capabilities to Kubernetes-based container
176 clusters, other networking tools exist and future tools may develop.
177
178 This document proposes extending Magnum to support specifying a
179 container networking implementation through a combination of user-facing
180 baymodel configuration flags. Configuration parameters that are common
181 across Magnum or all networking implementations will be exposed as unique
182 flags. For example, a flag named network-driver can be used to instruct
183 Magnum which network driver to use for implementing a baymodel
184 container/pod network. network driver examples may include:
185
186 flannel, weave, calico, midonet, netplugin, etc..
187
188 Here is an example of creating a baymodel that uses Flannel as the
189 network driver: ::
190
191 magnum baymodel-create --name k8sbaymodel \
192 --image-id fedora-21-atomic-5 \
193 --keypair-id testkey \
194 --external-network-id 1hsdhs88sddds889 \
195 --dns-nameserver 8.8.8.8 \
196 --flavor-id m1.small \
197 --docker-volume-size 5 \
198 --coe kubernetes \
199 --network-driver flannel
200
201 If no network-driver parameter is supplied by the user, the baymodel is
202 created using the default network driver of the specified Magnum COE.
203 Each COE must support a default network driver and each driver must
204 provide reasonable default configurations that allow users to instantiate
205 a COE without supplying labels. The default network driver for each COE
206 should be consistent with existing Magnum default settings. Where current
207 defaults do not exist, the defaults should be consistent with upstream
208 network driver projects.
209
2102. Each network driver supports a range of configuration parameters that
211 should be observed by Magnum. This document suggests using an attribute
212 named "labels" for supplying driver-specific configuration parameters.
213 Labels consist of one or more arbitrary key/value pairs. Here is an
214 example of using labels to change default settings of the Flannel
215 network driver: ::
216
217 magnum baymodel-create --name k8sbaymodel \
218 --image-id fedora-21-atomic-5 \
219 --keypair-id testkey \
220 --external-network-id ${NIC_ID} \
221 --dns-nameserver 8.8.8.8 \
222 --flavor-id m1.small \
223 --docker-volume-size 5 \
224 --coe kubernetes \
225 --network-driver flannel \
226 --labels flannel_network_cidr=10.0.0.0/8,\
227 flannel_network_subnetlen=22,\
228 flannel_backend=vxlan
229
230 With Magnum's current implementation, this document would support
231 labels for the Kubernetes COE type. However, labels are applicable
232 beyond Kubernetes, as the Docker daemon, images and containers now
233 support labels as a mechanism for providing custom metadata. The labels
234 attribute within Magnum should be extended beyond Kubernetes pods, so a
235 single mechanism can be used to pass arbitrary metadata throughout the
236 entire system. A blueprint [2]_ has been registered to expand the scope
237 of labels for Magnum. This document intends on adhering to the
238 expand-labels-scope blueprint.
239
240 Note: Support for daemon-labels was added in Docker 1.4.1. Labels for
241 containers and images were introduced in Docker 1.6.0
242
243 If the --network-driver flag is specified without any labels, default
244 configuration values of the driver will be used by the baymodel. These
245 defaults are set within the Heat template of the associated COE. Magnum
246 should ignore label keys and/or values not understood by any of the
247 templates during the baymodel operation.
248
249 Magnum will continue to CRUD bays in the same way:
250
251 magnum bay-create --name k8sbay --baymodel k8sbaymodel --node-count 1
252
2533. Update python-magnumclient to understand the new Container Networking
254 Model attributes. The client should also be updated to support passing
255 the --labels flag according to the expand-labels-scope blueprint [2]_.
256
2574. Update the conductor template definitions to support the new Container
258 Networking Model attributes.
259
2605. Refactor Heat templates to support the Magnum Container Networking Model.
261 Currently, Heat templates embed Flannel-specific configuration within
262 top-level templates. For example, the top-level Kubernetes Heat
263 template [8]_ contains the flannel_network_subnetlen parameter. Network
264 driver specific configurations should be removed from all top-level
265 templates and instead be implemented in one or more template fragments.
266 As it relates to container networking, top-level templates should only
267 expose the labels and generalized parameters such as network-driver.
268 Heat templates, template definitions and definition entry points should
269 be suited for composition, allowing for a range of supported labels. This
270 document intends to follow the refactor-heat-templates blueprint [3]_ to
271 achieve this goal.
272
2736. Update unit and functional tests to support the new attributes of the
274 Magnum Container Networking Model.
275
2767. The spec will not add support for natively managing container networks.
277 Due to each network driver supporting different API operations, this
278 document suggests that Magnum not natively manage container networks at
279 this time and instead leave this job to native tools. References [4]_ [5]_
280 [6]_ [7]_.
281 provide additional details to common labels operations.
282
2838. Since implementing the expand-labels-scope blueprint [2]_ may take a while,
284 exposing network functionality through baymodel configuration parameters
285 should be considered as an interim solution.
286
287Alternatives
288------------
289
290
2911. Observe all networking configuration parameters, including labels
292 within a configuration file instead of exposing the labels attribute to
293 the user.
294
2952. Only support a single networking implementation such as Flannel. Flannel
296 is currently supported for the Kubernetes COE type. It can be ported to
297 support the swarm COE type.
298
2993. Add support for managing container networks. This will require adding
300 abstractions for each supported network driver or creating an
301 abstraction layer that covers all possible network drivers.
302
3034. Use the Kuryr project [10]_ to provide networking to Magnum containers.
304 Kuryr currently contains no documentation or code, so this alternative
305 is highly unlikely if the Magnum community requires a pluggable
306 container networking implementation in the near future. However, Kuryr
307 could become the long-term solution for container networking within
308 OpenStack. A decision should be made by the Magnum community whether
309 to move forward with Magnum's own container networking model or to wait
310 for Kuryr to mature. In the meantime, this document suggests the Magnum
311 community become involved in the Kuryr project.
312
313Data Model Impact
314-----------------
315
316This document adds the labels and network-driver attribute to the baymodel
317database table. A migration script will be provided to support the attribute
318being added. ::
319
320 +-------------------+-----------------+---------------------------------------------+
321 | Attribute | Type | Description |
322 +===================+=================+=============================================+
323 | labels | JSONEncodedDict | One or more arbitrary key/value pairs |
324 +-------------------+-----------------+---------------------------------------------+
325 | network-driver | string | Container networking backend implementation |
326 +-------------------+-----------------+---------------------------------------------+
327
328REST API Impact
329---------------
330
331This document adds the labels and network-driver attribute to the BayModel
332API class. ::
333
334 +-------------------+-----------------+---------------------------------------------+
335 | Attribute | Type | Description |
336 +===================+=================+=============================================+
337 | labels | JSONEncodedDict | One or more arbitrary key/value pairs |
338 +-------------------+-----------------+---------------------------------------------+
339 | network-driver | string | Container networking backend implementation |
340 +-------------------+-----------------+---------------------------------------------+
341
342Security Impact
343---------------
344
345Supporting more than one network driver increases the attack
346footprint of Magnum.
347
348Notifications Impact
349--------------------
350
351None
352
353Other End User Impact
354---------------------
355
356Most end users will never use the labels configuration flag
357and simply use the default network driver and associated
358configuration options. For those that wish to customize their
359container networking environment, it will be important to understand
360what network-driver and labels are supported, along with their
361associated configuration options, capabilities, etc..
362
363Performance Impact
364------------------
365
366Performance will depend upon the chosen network driver and its
367associated configuration. For example, when creating a baymodel with
368"--network-driver flannel" flag, Flannel's default configuration
369will be used. If the default for Flannel is an overlay networking technique
370(i.e. VXLAN), then networking performance will be less than if Flannel used
371the host-gw configuration that does not perform additional packet
372encapsulation to/from containers. If additional performance is required
373when using this driver, Flannel's host-gw configuration option could be
374exposed by the associated Heat template and instantiated through the labels
375attribute.
376
377Other Deployer Impact
378---------------------
379
380Currently, container networking and OpenStack networking are different
381entities. Since no integration exists between the two, deployers/operators
382will be required to manage each networking environment individually.
383However, Magnum users will continue to deploy baymodels, bays, containers,
384etc. without having to specify any networking parameters. This will be
385accomplished by setting reasonable default parameters within the Heat
386templates.
387
388Developer impact
389----------------
390
391None
392
393Implementation
394==============
395
396Assignee(s)
397-----------
398
399Primary assignee:
400Daneyon Hansen (danehans)
401
402Other contributors:
403Ton Ngo (Tango)
404Hongbin Lu (hongbin)
405
406Work Items
407----------
408
4091. Extend the Magnum API to support new baymodel attributes.
4102. Extend the Client API to support new baymodel attributes.
4113. Extend baymodel objects to support new baymodel attributes. Provide a
412 database migration script for adding attributes.
4134. Refactor Heat templates to support the Magnum Container Networking Model.
4145. Update Conductor template definitions and definition entry points to
415 support Heat template refactoring.
4166. Extend unit and functional tests to support new baymodel attributes.
417
418Dependencies
419============
420
421Although adding support for these new attributes does not depend on the
422following blueprints, it's highly recommended that the Magnum Container
423Networking Model be developed in concert with the blueprints to maintain
424development continuity within the project.
425
4261. Common Plugin Framework Blueprint [1]_.
427
4282. Expand the Scope of Labels Blueprint [9]_.
429
4303. Refactor Heat Templates, Definitions and Entry Points Blueprint [3]_.
431
432Testing
433=======
434
435Each commit will be accompanied with unit tests. There will also be
436functional tests which will be used as part of a cross-functional gate
437test for Magnum.
438
439Documentation Impact
440====================
441
442The Magnum Developer Quickstart document will be updated to support the
443configuration flags introduced by this document. Additionally, background
444information on how to use these flags will be included.
445
446References
447==========
448
449.. [1] https://blueprints.launchpad.net/magnum/+spec/common-plugin-framework
450.. [2] http://docs.openstack.org/developer/magnum/
451.. [3] https://blueprints.launchpad.net/magnum/+spec/refactor-heat-templates
452.. [4] https://github.com/docker/libnetwork/blob/master/docs/design.md
453.. [5] https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/networking.md
454.. [6] https://github.com/coreos/flannel
455.. [7] https://github.com/coreos/rkt/blob/master/Documentation/networking.md
456.. [8] https://github.com/openstack/magnum/blob/master/magnum/templates/kubernetes/kubecluster.yaml
457.. [9] https://blueprints.launchpad.net/magnum/+spec/expand-labels-scope
458.. [10] https://github.com/openstack/kuryr
diff --git a/specs/implemented/container-volume-integration-model.rst b/specs/implemented/container-volume-integration-model.rst
new file mode 100644
index 0000000..ba5c348
--- /dev/null
+++ b/specs/implemented/container-volume-integration-model.rst
@@ -0,0 +1,500 @@
1..
2 This work is licensed under a Creative Commons Attribution 3.0 Unported
3 License.
4
5 http://creativecommons.org/licenses/by/3.0/legalcode
6
7=========================================
8Magnum Container Volume Integration Model
9=========================================
10
11Launchpad Blueprint:
12
13https://blueprints.launchpad.net/magnum/+spec/magnum-integrate-with-cinder
14
15Storage is a key part of any computing system. Containers in particular have
16the interesting characteristic that local storage by default is ephemeral:
17any changes to the file system disappear when the container is deleted. This
18introduces the need for persistent storage to retain and share data between
19containers, and this is currently an active area of development in all
20container orchestration engines (COE).
21
22As the component in OpenStack for managing COE's, Magnum must fully enable the
23features for persistent storage in the COE's. To achieve this goal, we propose
24in this specification to generalize the process for utilizing persistent
25storage with containers so that it is applicable for different bay types.
26Despite the complexity, we aim to maintain a good user experience by a simple
27abstraction for working with various volume capabilities. For the rest of this
28specification, we will use the term Volume to refer to persistent storage, and
29Volume Driver as the plugin in a COE to support the particular persistent
30storage.
31
32Problem Description
33===================
34
35Containers requires full life cycle management such as create, run, stop,
36delete,... and a key operation is to manage the data - making the data
37persistent, reusing the data, sharing data between containers, etc.
38In this area, the support for container volume is undergoing rapid change
39to bring more integration with open source software and third party
40storage solutions.
41
42A clear evidence of this growth is the many plugin volume drivers [1]_ [4]_
43such as NFS, GlusterFS, EBS, etc. They provide different functionality, use
44different storage backend and have different requirements. The COE's are
45naturally motivated to be flexible and allow as many choices as possible for
46the users with respect to the storage backend. Since Magnum's role is to
47support the COE's within OpenStack, the goal is to be transparent and enable
48these same storage backends for the COE's through the COE's lifecycle
49operation.
50
51Currently, Magnum provides limited support for managing container volume
52. The only option available is to specify the docker-volume-size for a
53pre-allocated block storage in the COE to host the containers. Magnum
54instantiates container volumes through Heat templates, exposing no other
55mechanism to configure and operate on volumes. In practice, some users
56require the ability to manage volumes easily in the COEs .
57
58Note that we are not proposing to create a new volume management interface
59in Magnum. After the users create the baymodel and bays, we assume that the
60users would manage the volumes through existing techniques:
61
621. Log in to the COE, use COE specific CLI or GUI to manage volumes.
63
642. Use native tools to manage volumes.
65
66The initial implementation will focus on OpenStack Cinder integration; as
67other alternatives become available, contributors are welcome through
683rd-party maintained projects.
69
70
71Definitions
72-----------
73
74COE
75 Container Orchestration Engine
76
77Baymodel
78 An object that stores template information about the bay which is
79 used to create new bays consistently.
80
81Bay
82 A Magnum resource that includes at least one host to run containers on,
83 and a COE to manage containers created on hosts within the bay.
84
85Pod
86 Is the smallest deployable unit that can be created, scheduled, and
87 managed within Kubernetes.
88
89Volume
90 storage that is persistent
91
92Volume plugin
93 COE specific code that supports the functionality of a type of volume.
94
95Additional Magnum definitions can be found in the Magnum Developer
96documentation[7]_ .
97
98Use Cases
99----------
100
101This document does not intend to address all use cases. We list below a number
102of use cases for 3 different roles; they should be useful as reference for the
103long-term development of the Magnum Container Volume Integration.
104
105As a User:
106
107As mentioned above, our goal is to preserve the user experience specific to
108the COE in managing the volumes. Therefore, we expect the use cases for the
109users will be fulfilled by the COE's themselves; Magnum will simply ensure
110that the necessary supports are in place.
111
1121. I need to easily create volume for containers to use as persistent
113 data store.
114
1152. I need the ability to create and mount a data volume container for cross
116 container sharing.
117
1183. I need to mount a host directory as a data volume.
119
1204. I need to easily attach a known volume to container to use the
121 existing data.
122
1235. I need the ability to delete the volume.
124
1256. I need to list and view the details of the volume
126
1277. I need to modify the volume.
128
129
130As a CSP:
131
1321. I need to easily deploy a bay for consumption by users. The bay must
133 support the following:
134
135 A. One or more hosts to run containers.
136 B. The ability to choose between virtual or physical hosts to
137 run containers.
138 C. The ability to automatically enable volume plugins to containers.
139
1402. I need to provide clustering options that support different volume plugins
141 per COE.
142
1433. After deploying my initial cluster, I need the ability to provide lifecycle
144 management, including:
145
146 A. The ability to add/remove volumes that containers used.
147 B. The ability to add/remove nodes within the cluster with the necessary
148 adjustment to the volumes
149
150As a CP:
151
1521. I need to easily and reliably add the Magnum service to my existing
153 OpenStack cloud environment.
154
1552. I need to make the Magnum services highly-available.
156
1573. I need to make Magnum services highly performant.
158
1594. I need to easily scale-out Magnum services as needed.
160
161
162Proposed Changes
163================
164
165We propose extending Magnum as follows.
166
167
168
1691. The new attribute volume-driver for a baymodel specifies the volume backend
170 driver to use when deploying a bay.
171
172 Volume drivers may include:
173
174 rexray, flocker, nfs, glusterfs, etc..
175
176 Here is an example of creating a Docker Swarm baymodel that uses rexray [5]_
177 [6]_ as the volume driver: ::
178
179
180 magnum baymodel-create --name swarmbaymodel \
181 --image-id fedora-21-atomic-5 \
182 --keypair-id testkey \
183 --external-network-id 1hsdhs88sddds889 \
184 --dns-nameserver 8.8.8.8 \
185 --flavor-id m1.small \
186 --docker-volume-size 5 \
187 --coe swarm\
188 --network-driver flannel \
189 --volume-driver rexray
190
191 When a Swarm bay is created with this bay model, the REX-Ray storage
192 subsystem will be installed, configured and started on the Swarm nodes,
193 then the REX-Ray volume plugin will be registered in Docker. When a container
194 is created with rexray as the volume driver, the container will have full
195 access to the REX-Ray capabilities such as creating, mounting, deleting
196 volumes [6]_. REX-Ray in turn will interface with Cinder to manage the
197 volumes in OpenStack.
198
199 Here is an example of creating a Kubernetes baymodel that uses Cinder [2]_
200 [3]_ as the volume driver: ::
201
202 magnum baymodel-create --name k8sbaymodel \
203 --image-id fedora-21-atomic-5 \
204 --keypair-id testkey \
205 --external-network-id 1hsdhs88sddds889 \
206 --dns-nameserver 8.8.8.8 \
207 --flavor-id m1.small \
208 --docker-volume-size 5 \
209 --coe kubernetes\
210 --network-driver flannel \
211 --volume-driver cinder
212
213 When the Kubernetes bay is created using this bay model, the kubelet will be
214 configured so that an existing Cinder volume can be mounted in a pod by
215 specifying the volume ID in the pod manifest as follows: ::
216
217 volumes:
218 - name: mysql-persistent-storage
219 cinder:
220 volumeID: bd82f7e2-wece-4c01-a505-4acf60b07f4a
221 fsType: ext4
222
223
224
225Here is an example of creating a mesos baymodel that uses rexray as the
226volume driver: ::
227
228 magnum baymodel-create --name mesosbaymodel \
229 --image-id ubuntu-mesos\
230 --keypair-id testkey \
231 --external-network-id 1hsdhs88sddds889 \
232 --dns-nameserver 8.8.8.8 \
233 --flavor-id m1.small \
234 --coe mesos\
235 --network-driver docker \
236 --volume-driver rexray
237
238When the mesos bay is created using this bay model, the mesos bay will be
239configured so that an existing Cinder volume can be mounted in a container
240by configuring the parameters to mount the cinder volume in the json file. ::
241
242 "parameters": [
243 { "key": "volume-driver", "value": "rexray" },
244 { "key": "volume", "value": "redisdata:/data" }
245 ]
246
247If no volume-driver parameter is supplied by the user, the baymodel is
248created using the default volume driver of the particular COE.
249Magnum will provide a default volume driver for each COE as well as the
250reasonable default configuration for each driver so that
251users can instantiate a COE without supplying a volume driver and
252associated labels. Generally the defaults should be consistent with upstream
253volume driver projects.
254
2552. Each volume driver supports a range of configuration parameters that are
256 handled by the "labels" attribute.
257
258 Labels consist of one or more arbitrary key/value pairs.
259 Here is an example of using labels to choose ¡°storage-provider¡± for
260 rexray driver.
261 Volume driver: ::
262
263 magnum baymodel-create --name k8sbaymodel \
264 --image-id fedora-21-atomic-5 \
265 --keypair-id testkey \
266 --external-network-id ${NIC_ID} \
267 --dns-nameserver 8.8.8.8 \
268 --flavor-id m1.small \
269 --docker-volume-size 5 \
270 --coe kubernetes \
271 --volume-driver rexray \
272 --labels storage-provider=openstack \
273 [, key2=value2...]
274
275
276 If the --volume-driver flag is specified without any labels, default
277 configuration values of the driver will be used by the baymodel.
278
279 Magnum will validate the labels together with the driver specified before
280 creating the bay and will return an error if the validation fails.
281
282 Magnum will continue to CRUD bays in the same way:
283
284 magnum bay-create --name k8sbay --baymodel k8sbaymodel --node-count 1
285
2863. Update python-magnumclient to handle the new container volume-
287 driver attributes.
288
2894. Update the conductor template definitions to support the new container
290 volume-driver model attributes.
291
2925. Refactor Heat templates to support the Magnum volume driver plugin.
293 Configurations specific to volume drivers should be
294 implemented in one or more template fragments.
295 Top-level templates should only
296 expose the labels and generalized parameters such as volume-driver.
297 Heat templates, template definitions and definition entry points should
298 be designed for composition, allowing for a range of supported labels.
299
3006. Update unit and functional tests to support the new attributes of the
301 Magnum container volume driver.
302
3037. Preserve the user experience by ensuring that any operation on volume will
304 be identical between a COE deployed by Magnum and a COE deployed by other
305 methods.
306
307
308Alternatives
309------------
310
3111. Without the support proposed, the user will need to manually enable and
312 configure the volume plugin. This will require the user to log into the
313 nodes in the cluster and understand the low level infrastructure of the
314 cluster as deployed by the heat templates.
3152. We can add full support for managing container volume in Magnum user
316 interface itself. This will require adding abstractions for each supported
317 COE volume plugins driver or creating an abstraction layer that covers all
318 possible COE volume drivers.
319
320Data Model Impact
321-----------------
322
323This document adds the volume-driver attribute to the baymodel
324database table. A migration script will be provided to support the attribute
325being added. ::
326
327 +-------------------+-----------------+---------------------------------------------+
328 | Attribute | Type | Description |
329 +===================+=================+=============================================+
330 +-------------------+-----------------+---------------------------------------------+
331 | volume-driver | string | Container volume backend implementation |
332 +-------------------+-----------------+---------------------------------------------+
333
334REST API Impact
335---------------
336
337This document adds volume-driver attribute to the BayModel
338API class. ::
339
340 +-------------------+-----------------+---------------------------------------------+
341 | Attribute | Type | Description |
342 +===================+=================+=============================================+
343 +-------------------+-----------------+---------------------------------------------+
344 | volume-driver | string | Container volume backend implementation |
345 +-------------------+-----------------+---------------------------------------------+
346
347Security Impact
348---------------
349
350Supporting volume drivers can potentially increase the attack surface
351on containers.
352
353Notifications Impact
354--------------------
355
356None
357
358Other End User Impact
359---------------------
360
361There is no impact if the user does not use a volume driver.
362We anticipate that most users would not use the labels for volume
363and would simply use the default volume driver and associated
364configuration options. For those who wish to customize their
365container volume driver environment, it will be important to understand
366what volume-driver and labels are supported, along with their
367associated configuration options, capabilities, etc..
368
369Performance Impact
370------------------
371
372There is no impact if the user does not use a volume driver.
373When a volume driver is used, the performance will depend upon the specific
374volume driver and its associated storage backends. For example, Kubernetes
375supports Cinder and awsEBS; the two types of volumes can have different
376performance.
377
378An example of the second case is a docker swarm bay with
379"--volume-driver rexray" where the rexray driver's storage provider is
380OpenStack cinder. The resulting performance for container may vary depending
381on the storage backends. As listed in [8]_ , Cinder supports many storage
382drivers. Besides this, different container volume driver can also cause
383performance variance.
384
385
386High-Availability Impact
387------------------------------
388
389
390
391+-----------------+--------------------+--------------------------+
392| COE | Master HA | Pod/Container/App HA |
393+=================+====================+==========================+
394| Kubernetes | No | Yes |
395+-----------------+--------------------+--------------------------+
396| Docker Swarm | No | Yes |
397+-----------------+--------------------+--------------------------+
398| Mesos | No | No |
399+-----------------+--------------------+--------------------------+
400
401"No" means that the volume doesn't affect the high-availability.
402"Yes" means that the volume affect the high-availability.
403
404Kubernetes does support pod high-availability through the replication
405controller, however, this doesn't work when a pod with volume attached
406fails. Refer the link [11]_ for details.
407
408Docker swarm doesn't support the containers rescheduling when a node fails, so
409volume can not be automatically detached by volume driver. Refer the
410link [12]_ for details.
411
412Mesos supports the application high-availability when a node fails, which
413means application would be started on new node, and volumes can be
414automatically attached to the new node by the volume driver.
415
416Other Deployer Impact
417---------------------
418
419Currently, both Kubernetes and Docker community have supported some volume
420plugins. The changes proposed will enable these volume plugins in Magnum.
421However, Magnum users will be able to continue to deploy baymodels, bays,
422containers, etc. without having to specify any parameters for volume.
423This will be accomplished by setting reasonable default parameters within
424the Heat templates.
425
426Developer impact
427----------------
428
429None
430
431Implementation
432==============
433
434Assignee(s)
435-----------
436
437Primary assignee:
438
439- Kai Qiang Wu (Kennan)
440
441Other contributors:
442
443- Qun Wang (wangqun)
444- Ton Ngo (Tango)
445
446
447Work Items
448----------
449
4501. Extend the Magnum API to support new baymodel attributes.
4512. Extend the Client API to support new baymodel attributes.
4523. Extend baymodel objects to support new baymodel attributes. Provide a
453 database migration script for adding attributes.
4544. Refactor Heat templates to support the Magnum container volume driver.
4555. Update Conductor template definitions and definition entry points to
456 support Heat template refactoring.
4576. Extend unit and functional tests to support new baymodel attributes.
4587. Document how to use the volume drivers with examples.
459
460Dependencies
461============
462
463Although adding support for these new attributes does not depend on the
464following blueprints, it's highly recommended that the Magnum Container
465Networking Model be developed in concert with the blueprints to maintain
466development continuity within the project.
467https://blueprints.launchpad.net/magnum/+spec/ubuntu-image-build
468
469Kubernetes with cinder support need Kubernetes version >= 1.1.1
470Swarm need version >= 1.8.3, as Kubernetes 1.1.1 upgraded to that version
471
472Testing
473=======
474
475Each commit will be accompanied with unit tests. There will also be
476functional tests which will be used as part of a cross-functional gate
477test for Magnum.
478
479Documentation Impact
480====================
481
482The Magnum Developer Quickstart document will be updated to support the
483configuration flags introduced by this document. Additionally, background
484information on how to use these flags will be included.
485
486References
487==========
488
489.. [1] http://kubernetes.io/v1.1/docs/user-guide/volumes.html
490.. [2] http://kubernetes.io/v1.1/examples/mysql-cinder-pd/
491.. [3] https://github.com/kubernetes/kubernetes/tree/master/pkg/volume/cinder
492.. [4] http://docs.docker.com/engine/extend/plugins/
493.. [5] https://github.com/emccode/rexray
494.. [6] http://rexray.readthedocs.org/en/stable/user-guide/storage-providers/openstack
495.. [7] http://docs.openstack.org/developer/magnum/
496.. [8] http://docs.openstack.org/liberty/config-reference/content/section_volume-drivers.html
497.. [9] http://docs.openstack.org/admin-guide-cloud/blockstorage_multi_backend.html#
498.. [10] http://docs.openstack.org/user-guide-admin/dashboard_manage_volumes.html
499.. [11] https://github.com/kubernetes/kubernetes/issues/14642
500.. [12] https://github.com/docker/swarm/issues/1488
diff --git a/specs/implemented/containers-service.rst b/specs/implemented/containers-service.rst
new file mode 100644
index 0000000..ca48b39
--- /dev/null
+++ b/specs/implemented/containers-service.rst
@@ -0,0 +1,400 @@
1..
2 This work is licensed under a Creative Commons Attribution 3.0 Unported
3 License.
4
5 http://creativecommons.org/licenses/by/3.0/legalcode
6
7==================
8Containers Service
9==================
10
11Launchpad blueprint:
12
13https://blueprints.launchpad.net/nova/+spec/containers-service
14
15Containers share many features in common with Nova instances. For the common
16features, virt drivers for Nova can be used to surface basic instance
17functionality. For features that go beyond what can be naturally fit within
18a virt driver, we propose a new API service that allows for advanced features
19to be added without conflating the worlds of instances and containers.
20
21Some examples of containers specific features are setting of shell environment
22variables, and accepting a shell command to execute at runtime. Capturing the
23STDIO of the process(es) within a container, and tracking the return status
24of processes are all beyond the scope of what was contemplated for Nova. All
25of these features will be implemented in the Containers Service.
26
27
28Problem description
29===================
30Container technology is rapidly gaining popularity as a way to bundle and
31deploy applications. Recognizing and adapting to this trend will position
32OpenStack to be useful not only to clouds that employ bare metal and virtual
33machine instances, but can remain competitive in offering container services
34as well.
35
36Nova's concepts of an instance, and the actions that may be taken on it do not
37match completely with containers.
38
39Use cases
40---------
411. App Consolidation. End-user wants to run multiple small applications in
42 separate operating system environments, but wants to optimize for efficiency
43 to control hosting costs. Each application belongs to the same tenant, so
44 security isolation between applications is nice-to-have but not critical.
45 Isolation is desired primarily for simplified management of the execution
46 environment for each application.
472. App Portability. End-user wants to create a single container image, and
48 deploy the same image to multiple hosting environments, including OpenStack.
49 Other environments may include local servers, dedicated servers, private
50 clouds, and public clouds. Switching environments requires passing database
51 connection strings by environment variables at the time a container starts
52 to allow the application to use the services available in each environment
53 without changing the container image.
543. Docker Compatibility. End-user has a Dockerfile used to build an application
55 and its runtime environment and dependencies in a Docker container image.
56 They want an easy way to run the Docker resulting image on an OpenStack
57 cloud.
584. LXC Compatibility. End-user wants an easy way to remotely create multiple
59 LXC containers within a single Nova instance.
605. OpenVZ Compatibility. End-user wants an easy way to remotely create multiple
61 OpenVZ containers within a single Nova instance.
626. Containers-Centric World View. End-user wants to communicate with a single
63 OpenStack API, and request the addition of containers, without the need to
64 be concerned with keeping track of how many containers are already running
65 on a given Nova instance, and when more need to be created. They want to
66 simply create and remove containers, and allow the appropriate resource
67 scheduling to happen automatically.
687. Platform Integration. Cloud operator already has an OpenStack cloud, and
69 wants to add a service/application centric management system on top.
70 Examples of such systems are Cloud Foundry, Kubernetes, Apache Mesos, etc.
71 The selected system is already Docker compatible. Allow this cloud operator
72 easy integration with OpenStack to run applications in containers. The
73 Cloud Operator now harnesses the power of both the management system, and
74 OpenStack, and does not need to manage a second infrastructure for his/her
75 application hosting needs. All details involving the integration of
76 containers with Nova instances is managed by OpenStack.
778. Container network. End-user wants to define a custom overlay network for
78 containers, and wants to have admin privilege to manage the network
79 topology. Building a container network can decouple application deployment
80 and management from the underlying network infrastructure, and enable
81 additional usage scenario, such as (i) software-defined networking, and
82 (ii) extending the container network (i.e. connecting various resources from
83 multiple hosting environments). End-users want a single service that could
84 help them build the container network, and dynamically modify the network
85 topology by adding or removing containers to or from the network.
869. Permit secure use of native REST APIs. Provide two models of operation with
87 Magnum. The first model allows Magnum to manage the lifecycle of Pods,
88 ReplicationControllers, and Services. The second model allows end-users to
89 manage the lifecycle of Pods, ReplicationControllers, and Services by
90 providing direct secure access to the native ReST APIs in Kubernetes and
91 possibly Docker.
92
93Long Term Use Cases
94-------------------
95These use cases have been identified by the community as important, but
96unlikely to be tackled in short term (especially prior to incubation). We wish
97to adapt to these use cases in long term, but this is not a firm project
98commitment.
99
1001. Multi-region/multi-cloud support. End-user wants to deploy applications to
101 multiple regions/clouds, and dynamically relocate deployed applications
102 across different regions/clouds. In particular, they want a single service
103 that could help them (i) provision nodes from multiple regions/clouds, thus
104 running containers on top of them, and (ii) dynamically relocate containers
105 (e.g. through container migration) between nodes regardless of the
106 underlying infrastructure.
107
108Proposed change
109===============
110Add a new API service for CRUD and advanced management of containers.
111If cloud operators only want to offer basic instance features for their
112containers, they may use nova with an alternate virt-driver, such as
113libvirt/lxc or nova-docker. For those wanting a full-featured container
114experience, they may offer the Containers Service API as well, in combination
115with Nova instances that contain an OpenStack agent that connects to the
116containers service through a security controlled agent (daemon) that allows
117the OpenStack control plane to provision and control containers running on
118Compute Hosts.
119
120The Containers Service will call the Nova API to create one or more Nova
121instances inside which containers will be created. The Nova instances may
122be of any type, depending on the virt driver(s) chosen by the cloud operator.
123This includes bare-metal, virtual machines, containers, and potentially other
124instance types.
125
126This allows the following configurations of containers in OpenStack.
127
128* Containers in Virtual Machine Instances
129* Containers in Bare Metal Instances
130* Containers in Container Instances (nested)
131
132The concept of nesting containers is currently possible if the parent container
133runs in privileged mode. Patches to the linux kernel are being developed to
134allow nesting of non-privileged containers as well, which provides a higher
135level of security.
136
137The spirit of this plan aims to duplicate as little as possible between Nova
138and the Containers Service. Common components like the scheduler are expected
139to be abstracted into modules, such as Gantt that can be shared by multiple
140projects. Until Gantt is ready for use by the Containers Service, we will
141implement only two provisioning schemes for containers:
142
1431. Create a container on a specified instance by using a nova instance guid.
1442. Auto-create instances (applies only until the Gantt scheduler is used)
145 2.1. Fill them sequentially until full.
146 2.2. Remove them automatically when they become empty.
147
148The above orchestration will be implemented using Heat. This requires some
149kind of hypervisor painting (such as host aggregates) for security reasons.
150
151The diagram below offers an overview of the system architecture. The OSC box
152indicates an OpenStack client, which will communicate with the Containers
153Service through a REST API. The containers service may silently create Nova
154instances if one with enough capacity to host the requested container is not
155already known to the Containers service. The containers service will maintain
156a database "Map" of containers, and what Nova instance each belongs to. Nova
157creates instances. Instances are created in Nova, and containers belong only
158to the Containers Service, and run within a Nova instance. If the instance
159includes the agent software "A", then it may be included in the inventory of
160the Containers service. Instances that do not contain an agent may not interact
161with the Containers Service, and can be controlled only by a Nova virt driver.
162
163::
164
165                            +---------+
166                            |   OSC   |
167                            +----+----+
168                                 |
169                            +----+----+
170 +-------- Nova -------+  +-+  REST   +-- Containers -+
171 |                     |  | +---------+    Service    |
172 |                     |  |                           |
173 |           +-------+ +--+ +-----+                   |
174 |           | Gantt | |  | | Map |                   |
175 |           +-------+ |  | +-----+                   |
176 |                     |  |                           |
177 +-----------+---------+  +---------------+-----------+
178             |                            |            
179 +-----------+----+ Compute Host ---------|-----------+
180 |                                    +---+---+       |
181 |                               +----+ Relay +---+   |
182 |                               |    +-------+   |   |
183 |                               |                |   |
184 | +-- Instance --+ +-- Instance |-+ +-- Instance |-+ |
185 | |              | |            | | |            | | |
186 | |              | |        +---+ | |        +---+ | |
187 | |              | |        |   | | |        |   | | |
188 | |              | |        | A | | |        | A | | |
189 | |              | |        |   | | |        |   | | |
190 | |              | |        +---+ | |        +---+ | |
191 | |              | |              | |              | |
192 | |              | | +---+  +---+ | | +---+  +---+ | |
193 | |              | | |   |  |   | | | |   |  |   | | |
194 | |              | | | C |  | C | | | | C |  | C | | |
195 | |              | | |   |  |   | | | |   |  |   | | |
196 | |              | | +---+  +---+ | | +---+  +---+ | |
197 | |              | |              | |              | |
198 | +--------------+ +--------------+ +--------------+ |
199 |                                                    |
200 +----------------------------------------------------+
201 +---+
202 | |
203 | A | = Agent
204 | |
205 +---+
206 +---+
207 | |
208 | C | = Container
209 | |
210 +---+
211
212
213Design Principles
214-----------------
2151. Leverage existing OpenStack projects for what they are good at. Do not
216 duplicate functionality, or copy code that can be otherwise accessed through
217 API calls.
2182. Keep modifications to Nova to a minimum.
2193. Make the user experience for end users simple and familiar.
2204. Allow for implementation of all features containers are intended to offer.
221
222
223Alternatives
224------------
225
2261. Extending Nova's existing feature set to offer container features
2271.1. Container features don't fit into Nova's idea of compute (VM/Server)
2282. A completely separate containers service forked from Nova.
2292.1. Would result in large overlap and duplication in features and code
230
231
232Data model impact
233-----------------
234For Nova, None. All new data planned will be in the Containers Service.
235
236
237REST API impact
238---------------
239For Nova, none. All new API calls will be implemented in the Containers
240Service. The OpenStack Containers Service API will be a superset of
241functionality offered by the, The `Docker Remote API:
242<https://docs.docker.com/reference/api/docker_remote_api/>`_
243with additionals to make is suitable for general use regardless of the backend
244container technology used, and to be compatible with OpenStack multi-tenancy
245and Keystone authentication.
246
247Specific Additions:
248
2491. Support for the X-Auth-Project-Id HTTP request header to allow for
250 multi-tenant use.
2512. Support for the X-Auth-Token HTTP request header to allow for authentication
252 with keystone.
253
254If either of the above headers are missing, a 401 Unauthorized response will
255be generated.
256
257Docker CLI clients may communicate with a Swarmd instance that is configured
258to use the OpenStack Containers API as the backend for libswarm. This will
259allow for tool compatibility with the Docker ecosystem using the officially
260supported means for integration of a distributed system.
261
262The scope of the full API will cause this spec to be too long to review, so
263the intent is to deal with the specific API design as a series of Gerrit
264reviews that submit API code as Not Implemented stubs with docstrings that
265clearly document the design, so allow for approval, and further implementation.
266
267Security impact
268---------------
269Because Nova will not be changed, there should be no security impacts to Nova.
270The Containers Service implementation, will have the following security related
271issues:
272
273* Need to authenticate against keystone using python-keystoneclient.
274* A trust token from Nova will be needed in order for the Containers Service
275 to call the Nova API on behalf of a user.
276* Limits must be implemented to control resource consumption in accordance with
277 quotas.
278* Providing STDIO access may generate a considerable amount of network chatter
279 between containers and clients through the relay. This could lead to
280 bandwidth congestion at the relays, or API nodes. An approach similar to
281 how we handle serial console access today will need to be considered to
282 mitigate this concern.
283
284Using containers implies a range of security considerations for cloud
285operators. These include:
286
287* Containers in the same instance share an operating system. If the kernel is
288 exploited using a security vulnerability, processes in once container may
289 escape the constraints of the container and potentially access other
290 resources on the host, including contents of other containers.
291* Output of processes may be persisted by the containers service in order to
292 allow asynchronous collection of exit status, and terminal output. Such
293 content may include sensitive information. Features may be added to mitigate
294 the risk of this data being replicated in log messages, including errors.
295* Creating containers usually requires root access. This means that the Agent
296 may need to be run with special privileges, or be given a method to
297 escalate privileges using techniques such as sudo.
298* User provided data is passed through the API. This will require sensible
299 data input validation.
300
301
302Notifications impact
303--------------------
304
305Contemplated features (in subsequent release cycles):
306
307* Notify the end user each time a Nova instance is created or deleted by
308 the Containers service, if (s)he has registered for such notifications.
309* Notify the user each on CRUD of containers containing start and end
310 notifications. (compute.container.create/delete/etc)
311* Notify user periodically of existence of container service managed
312 containers (ex compute.container.exists)
313
314
315Other end user impact
316---------------------
317
318The user interface will be a REST API. On top of that API will be an
319implementation of the libswarm API to allow for tools designed to use Docker
320to treat OpenStack as an upstream system.
321
322
323Performance Impact
324------------------
325
326The Nova API will be used to create instances as needed. If the Container to
327Instance ratio is 10, then the Nova API will be called at least once for every
32810 calls to the Containers Service. Instances that are left empty will be
329automatically deleted, so in the example of a 10:1 ratio, the Nova API will be
330called to perform a delete for every 10 deletes in the Container Service.
331Depending on the configuration, the ratio may be as low as 1:1.
332The Containers Service will only access Nova through its API, not by accessing
333its database.
334
335
336
337Other deployer impact
338---------------------
339
340Deployers may want to adjust the default flavor used for Nova Instances created
341by the Containers Service.
342
343There should be no impact on users of prior releases, as this introduces a new
344API.
345
346Developer impact
347----------------
348
349Minimal. There will be minimal changes required in Nova, if any.
350
351
352Implementation
353==============
354
355
356Assignee(s)
357-----------
358
359Primary assignee:
360aotto
361
362Other contributors:
363andrew-melton
364ewindisch
365
366
367Work Items
368----------
369
3701. Agent
3712. Relay
3723. API Service
3734. IO Relays
374
375
376Dependencies
377============
378
3791. <Links to Agent Blueprint and Spec here, once ready>
3802. Early implementations may use libswarm, or a python port of libswarm to
381 implement Docker API compatibility.
382
383Testing
384=======
385
386Each commit will be accompanied with unit tests, and Tempest functional tests.
387
388
389Documentation Impact
390====================
391
392A set of documentation for this new service will be required.
393
394
395References
396==========
397
398* Link to high level draft proposal from the Nova Midcycle Meetup for Juno:
399 `PDF <https://wiki.openstack.org/w/images/5/51/Containers_Proposal.pdf>`_
400* `Libswarm Source <https://github.com/docker/libswarm>`_
diff --git a/specs/implemented/create-trustee-user-for-each-bay.rst b/specs/implemented/create-trustee-user-for-each-bay.rst
new file mode 100644
index 0000000..5ad38cc
--- /dev/null
+++ b/specs/implemented/create-trustee-user-for-each-bay.rst
@@ -0,0 +1,186 @@
1==================================
2Create a trustee user for each bay
3==================================
4
5https://blueprints.launchpad.net/magnum/+spec/create-trustee-user-for-each-bay
6
7Some services which are running in a bay need to access OpenStack services.
8For example, Kubernetes load balancer [1]_ needs to access Neutron. Docker
9registry [2]_ needs to access Swift. In order to access OpenStack services,
10we can create a trustee for each bay and delegate a limited set of rights to
11the trustee. [3]_ and [4]_ give a brief introduction to Keystone's trusts
12mechanism.
13
14Problem description
15===================
16
17Some services which are running in a bay need to access OpenStack services,
18so we need to pass user credentials into the vms.
19
20Use Cases
21---------
22
231. Kubernetes load balancer needs to access Neutron [1]_.
242. For persistent storage, Cloud Provider needs to access Cinder to
25 mount/unmount block storage to the node as volume [5]_.
263. TLS cert is generated in the vms and need to be uploaded to Magnum [6]_ and
27 [7]_.
284. Docker registry needs to access Swift [2]_.
29
30Project Priority
31----------------
32
33High
34
35Proposed change
36===============
37When a user (the "trustor") wants to create a bay, steps for trust are as
38follows.
39
401. Create a new service account (the "trustee") without any role in a domain
41 which is dedicated for trust. Without any role, the service account can do
42 nothing in Openstack.
43
442. Define a trust relationship between the trustor and the trustee. The trustor
45 can delegate a limited set of roles to the trustee. We can add an option
46 named trust_roles in baymodel. Users can add roles which they want to
47 delegate into trust_roles. If trust_roles is not provided, we delegate all
48 the roles to the trustee.
49
503. Services in the bay can access OpenStack services with the trustee
51 credentials and the trust.
52
53The roles which are delegated to the trustee should be limited. If the services
54in the bay only need access to Neutron, we should not allow the services to
55access to other OpenStack services. But there is a limitation that a trustor
56must have the role which is delegated to a trustee [4]_.
57
58Magnum now only allows the user who create the bay to get the certificate to
59avoid the security risk introduced by Docker [8]_. For example, if other users
60in the same tenant can get the certificate, then they can use Docker API to
61access the host file system of a bay node and get anything they want::
62
63 docker run --rm -v /:/hostroot ubuntu /bin/bash \
64 -c "cat /hostroot/etc/passwd"
65
66If Keystone doesn't allow to create new service accounts when LDAP is used as
67the backend for Keystone, we can use a pre-create service account for all
68bays. In this situation, all the bays use the same service account and
69different trust. We should add an config option to choose this method.
70
71Alternatives
72------------
73
74Magnum can create a user for each bay with roles to access OpenStack Services
75in a dedicated domain. The method has one disadvantage. The user which is
76created by magnum may get the access to OpenStack services which this user can
77not access before. For example, a user can not access Swift service and create
78a bay. Then Magnum create a service account for this bay with roles to access
79Swift. If the user logins into the vms and get the credentials, the user can
80use these credentials to access Swift.
81
82Or Magnum doesn't prepare credentials and the user who create a bay needs to
83login into the nodes to manully add credentials in config files for services.
84
85Data model impact
86-----------------
87
88Trustee id, trustee password and trust id are added to Bay table in Magnum
89database.
90
91REST API impact
92---------------
93
94Only the user who create a bay can get the certificate of this bay. Other
95users in the same tenant can not get the certificate now.
96
97Security impact
98---------------
99
100Trustee id and trustee password are encrypted in magnum database. When Magnum
101passes these parameters to heat to create a stack, the transmission is
102encrypted by tls, so we don't need to encrypt these credentials. These
103credentials are hidden in heat, users can not query them in stack parameters.
104
105Trustee id, trustee password and trust id can be obtained in the vms. Anyone
106who can login into the vms can get them and use these credentials to access
107OpenStack services. In a production environment, these vms must be secured
108properly to prevent unauthorized access.
109
110Only the user who create the bay can get the certificate to access the COE
111api, so it is not a security risk even if the COE api is not safe.
112
113Notifications impact
114--------------------
115
116None
117
118Other end user impact
119---------------------
120
121None
122
123Performance impact
124------------------
125
126None
127
128Other deployer impact
129---------------------
130
131None
132
133Developer impact
134----------------
135
136None
137
138Implementation
139==============
140
141Assignee(s)
142-----------
143
144Primary assignee:
145 humble00 (wanghua.humble@gmail.com)
146Other contributors:
147 None
148
149Work Items
150----------
151
1521. Create an trustee for each bay.
1532. Change the policy so that only the user who create a bay can get the
154 certificate of the bay.
155
156Dependencies
157============
158
159None
160
161Testing
162=======
163
164Unit test and functional test for service accounts and the policy change.
165
166Documentation Impact
167====================
168
169The user guide and troubleshooting guide will be updated with details
170regarding the service accounts.
171
172References
173==========
174.. [1] http://docs.openstack.org/developer/magnum/dev/dev-kubernetes-load-balancer.html
175.. [2] https://blueprints.launchpad.net/magnum/+spec/registryv2-in-master
176.. [3] http://blogs.rdoproject.org/5858/role-delegation-in-keystone-trusts
177.. [4] https://wiki.openstack.org/wiki/Keystone/Trusts
178.. [5] https://github.com/kubernetes/kubernetes/blob/release-1.1/examples/mysql-cinder-pd/README.md
179.. [6] https://bugs.launchpad.net/magnum/+bug/1503863
180.. [7] https://review.openstack.org/#/c/232152/
181.. [8] https://docs.docker.com/engine/articles/security/#docker-daemon-attack-surface
182
183History
184=======
185
186None
diff --git a/specs/implemented/magnum-horizon-plugin.rst b/specs/implemented/magnum-horizon-plugin.rst
new file mode 100644
index 0000000..65bb3c2
--- /dev/null
+++ b/specs/implemented/magnum-horizon-plugin.rst
@@ -0,0 +1,171 @@
1..
2 This work is licensed under a Creative Commons Attribution 3.0 Unported
3 License.
4
5 http://creativecommons.org/licenses/by/3.0/legalcode
6
7===================================
8Web Interface for Magnum in Horizon
9===================================
10
11Launchpad blueprint:
12
13https://blueprints.launchpad.net/magnum/+spec/magnum-horizon-plugin
14
15Currently there is no way for a user to interact with Magnum through a web
16based user interface, as they are used to doing with other OpenStack
17components. This implementation aims to introduce this interface as an
18extension of Horizon (the OpenStack Dashboard) and expose all the features of
19Magnum in a way familiar to users.
20
21Problem description
22===================
23
24In order to increase adoption and usability of Magnum we need to introduce a UI
25component for users and administrators to interact with Magnum without the need
26to use the command line. The UI proposed to be built will model all of the
27features currently available in the Magnum REST API and built using the Horizon
28plugin architecture to remain in line with other OpenStack UI projects and
29minimise the amount of new code that needs to be added.
30
31Use Cases
32----------
331. An end user wanting to use Magnum with OpenStack who is not comfortable in
34 issuing commands with the python client will use the web user interface to
35 interact with Magnum.
362. An administrator may use the user interface to provide a quick overview of
37 what Magnum has deployed in their OpenStack environment.
38
39Proposed change
40===============
41
42The first step will be to extend the Horizon API to include CRUD operations
43that are needed to interact with Magnum. Assuming that there are no issues here
44and API changes/additions are not required to Magnum, we can begin to
45design/implement the interface. We will aim to minimize the amount of Magnum
46specific UI code that will need to be maintained by reusing components from
47Horizon. This will also speed up the development significantly.
48
49It is suggested the initial implementation of Magnum UI will include basic CRUD
50operations on BayModel and Bay resources. This will be the starting point for
51development and upon completion this will represent version 1.
52
53Future direction includes adding CRUD operations for other Magnum features
54(Pod, Container, Service, ReplicationController) and will be tracked by new
55blueprints as they represent significant additional effort. The ultimate goal,
56a user should be able to perform all normal interactions with Magnum through
57the UI with no need for interaction with the python client.
58
59Suggestions for further improvement include visualising Magnum resources to
60provide a quick overview of how resources are deployed.
61
62Bugs/Blueprints relating specifically to the Magnum UI will be tracked here:
63
64https://launchpad.net/magnum-ui
65
66Mockups/Designs will be shared using the OpenStack Invision account located
67here:
68
69https://openstack.invisionapp.com
70
71Alternatives
72------------
73
74One alternative to this approach is to develop an entirely separate UI
75specifically for Magnum. We will not use this approach as it does not fall in
76line with how other projects are managing their user interfaces and this
77approach would ultimately result in a significantly larger effort with much
78duplication with Horizon.
79
80Data model impact
81-----------------
82
83None
84
85REST API impact
86---------------
87
88For Magnum, none. The Horizon API will need to be extended to include Create,
89Read, Update, Delete operations for all features available in the Magnum REST
90API. However, this extension to the Horizon API will live in the Magnum UI tree
91not the upstream Horizon tree.
92
93Security impact
94---------------
95
96None
97
98Notifications impact
99--------------------
100
101None
102
103Other end user impact
104---------------------
105
106None
107
108Performance Impact
109------------------
110
111The Magnum API will be called from the user interface to return information to
112the user about the current state of Magnum objects and perform new interactions
113with Magnum. For every action a user performs from the user interface at least
114one API call to Magnum will need to be made.
115
116Other deployer impact
117---------------------
118
119As the Magnum user interface will be managed and stored outside of the Horizon
120project deployers will need to pull down the Magnum UI code and add this to
121their Horizon install.
122
123In order to add the Magnum UI to Horizon the deployer will have to copy an
124enable file to openstack_dashboard/local/enabled/ in their Horizon directory
125and then run Horizon as they would normally.
126
127Developer impact
128----------------
129
130None
131
132Implementation
133==============
134
135Assignee(s)
136-----------
137
138Primary assignee:
139 bradjones
140
141Work Items
142----------
143
1441. Extend Horizon API in include Magnum calls
1452. CRUD operations on BayModel and Bay resources
1463. CRUD operations on other Magnum features (Pod, Container, Service, etc.)
1474. Refine the user experience
148
149Dependencies
150============
151
152None
153
154Testing
155=======
156
157Each commit will be accompanied with unit tests. There will also be functional
158tests which will be used as part of a cross-functional gate test for Magnum.
159This additional gate test will be non-voting as failures will not indicate
160issues with Magnum but instead serves as advanced warning of any changes that
161could potentially break the UI.
162
163Documentation Impact
164====================
165
166An installation guide will be required.
167
168References
169==========
170
171None
diff --git a/specs/implemented/open-dcos.rst b/specs/implemented/open-dcos.rst
new file mode 100644
index 0000000..b450c0d
--- /dev/null
+++ b/specs/implemented/open-dcos.rst
@@ -0,0 +1,177 @@
1..
2 This work is licensed under a Creative Commons Attribution 3.0 Unported
3 License.
4
5 http://creativecommons.org/licenses/by/3.0/legalcode
6
7=================================
8Magnum and Open DC/OS Integration
9=================================
10
11Launchpad Blueprint:
12
13https://blueprints.launchpad.net/magnum/+spec/mesos-dcos
14
15Open DC/OS [1]_ is a distributed operating system based on the Apache Mesos
16distributed systems kernel. It enables the management of multiple machines as
17if they were a single computer. It automates resource management, schedules
18process placement, facilitates inter-process communication, and simplifies
19the installation and management of distributed services. Its included web
20interface and available command-line interface (CLI) facilitate remote
21management and monitoring of the cluster and its services.
22
23Open DC/OS now supports both docker containerizer and mesos containerizer.
24The mesos containerizer support both docker and AppC image spec, the mesos
25containerizer can manage docker containers well even if docker daemon is not
26running.
27
28End user can install Open DC/OS with different ways, such as vagrant, cloud,
29local etc. For cloud, the Open DC/OS only supports AWS now, end user can
30deploy a DC/OS cluster quickly with a template. For local install, there
31are many steps to install a Open DC/OS cluster.
32
33Problem Description
34===================
35
36COEs (Container Orchestration Engines) are the first class citizen in Magnum,
37there are different COEs in Magnum now including Kubernetes, Swarm and Mesos.
38All of those COEs are focusing docker container management, the problem is
39that the concept of container is not only limited in docker container, but
40also others, such as AppC, linux container etc, Open DC/OS is planning to
41support different containers by leveraging Mesos unified container feature
42and the Open DC/OS has a better management console for container orchestration.
43
44Currently, Magnum provides limited support for Mesos Bay as there is only one
45framework named as Marathon running on top of Mesos. Compared with Open DC/OS,
46the current Mesos Bay lack the following features:
47
481. App Store for application management. The Open DC/OS has a universe to
49 provide app store functions.
50
512. Different container technology support. The Open DC/OS support different
52 container technologies, such as docker, AppC etc, and may introduce OCI
53 support in future. Introducing Open DC/OS Bay can enable Magnum to support
54 more container technologies.
55
563. Better external storage integration. The Open DC/OS is planning to introduce
57 docker volume isolator support in next release, the docker volume isolator
58 is leveraging docker volume driver API to integrate with 3rd party
59 distributed storage platforms, such as OpenStack Cinder, GlusterFS, Ceph
60 etc.
61
624. Better network management. The Open DC/OS is planning to introduce CNI
63 network isolator in next release, the CNI network isolator is leveraging CNI
64 technologies to manage network for containers.
65
665. Loosely coupled with docker daemon. The Open DC/OS can work well for docker
67 container even if docker daemon is not running. The docker daemon now have
68 some issues in large scale cluster, so this approach avoids the limitation
69 of the docker daemon but still can enable end user get some docker features
70 in large scale cluster.
71
72
73Proposed Changes
74================
75
76We propose extending Magnum as follows.
77
781. Leverage bay driver work and structure this new COE as a bay driver.
79
802. Leverage mesos-slave-flags [3]_ to customize Open DC/OS.
81
82 Here is an example of creating an Open DC/OS baymodel that uses
83 docker/volume as isolator, linux as launcher and docker as image
84 provider: ::
85
86 magnum baymodel-create --name dcosbaymodel \
87 --image-id dcos-centos-7.2 \
88 --keypair-id testkey \
89 --external-network-id 1hsdhs88sddds889 \
90 --dns-nameserver 8.8.8.8 \
91 --flavor-id m1.small \
92 --docker-volume-size 5 \
93 --coe dcos \
94 --labels isolation=docker/volume,\
95 launcher=linux, \
96 image_providers=docker
97
98 Magnum will validate the labels together with the driver specified before
99 creating the bay and will return an error if the validation fails.
100
101 Magnum will continue to CRUD bays in the same way:
102
103 magnum bay-create --name dcosbay --baymodel dcosbaymodel --node-count 1
104
1053. Keep the old Mesos Bay and add a new Open DC/OS Bay. Once the Open DC/OS Bay
106 is stable, deprecate the Mesos Bay.
107
1084. Update unit and functional tests to support Open DC/OS Bay, it is also an
109 option to verify the Open DC/OS Bay in gate.
110
1115. Preserve the user experience by ensuring that any operation on Open DC/OS
112 Bay will be identical between a COE deployed by Magnum and a COE deployed
113 by other methods.
114
115
116REST API Impact
117---------------
118
119There will be no REST API exposed from Magnum for end user to operate Open
120DC/OS, end user can logon to Open DC/OS dashboard or call Open DC/OS REST
121API directly to manage the containers or the applications.
122
123Implementation
124==============
125
126Assignee(s)
127-----------
128
129Primary assignee:
130
131- Guang Ya Liu (jay-lau-513)
132
133Other contributors:
134
135- Qun Wang (wangqun)
136- Gao Jin Cao
137
138
139Work Items
140----------
141
1421. Build VM image for Open DC/OS Bay.
1432. Add Open DC/OS Bay driver.
1443. Add Heat template for Open DC/OS Bay.
1454. Add Open DC/OS Bay monitor.
1465. Document how to use the Open DC/OS Bay.
147
148Dependencies
149============
150
1511. This blueprint will focus on running on Open DC/OS in CentOS 7.2.
152
1532. Depend on blueprint
154
155https://blueprints.launchpad.net/magnum/+spec/mesos-slave-flags
156
157Testing
158=======
159
160Each commit will be accompanied with unit tests. There will also be
161functional tests which will be used as part of a cross-functional gate
162test for Magnum.
163
164Documentation Impact
165====================
166
167The Magnum Developer Quickstart document will be updated to support the Open
168DC/OS Bay introduced by including a short example and a full documentation
169with all the explanation for the labels in the user guide. Additionally,
170background information on how to use the Open DC/OS Bay will be included.
171
172References
173==========
174
175.. [1] https://dcos.io/docs/1.7/overview/what-is-dcos/
176.. [2] https://dcos.io/install/
177.. [3] https://blueprints.launchpad.net/magnum/+spec/mesos-slave-flags
diff --git a/specs/implemented/resource-quotas.rst b/specs/implemented/resource-quotas.rst
new file mode 100644
index 0000000..20edae0
--- /dev/null
+++ b/specs/implemented/resource-quotas.rst
@@ -0,0 +1,252 @@
1..
2 This work is licensed under a Creative Commons Attribution 3.0 Unported
3 License.
4
5 http://creativecommons.org/licenses/by/3.0/legalcode
6
7==========================
8Quota for Magnum Resources
9==========================
10
11Launchpad blueprint:
12
13https://blueprints.launchpad.net/magnum/+spec/resource-quota
14
15There are multiple ways to slice an OpenStack cloud. Imposing quota on these
16various slices puts a limitation on the amount of resources that can be
17consumed which helps to guarantee "fairness" or fair distribution of resource
18at the creation time. If a particular project needs more resources, the
19concept of quota, gives the ability to increase the resource count on-demand,
20given that the system constraints are not exceeded.
21
22
23Problem description
24===================
25At present in Magnum we don't have the concept of Quota on Magnum resources as
26a result of which, as long as the underlying Infrastructure as a Service(IaaS)
27layer has resources, any user can consume as many resources as they want, with
28the hardlimit associated with the tenant/project being the upper bound for the
29resources to be consumed. Quotas are tied closely to physical resources and are
30billable entity and hence from Magnum's perspective it makes sense to limit the
31creation and consumption of a particular kind of resource to a certain value.
32
33Use cases
34---------
35Alice is the admin. She would like to have the feature which will give her
36details of Magnum resource consumption so that she can manage her resource
37appropriately.
38
39a. Ability to know current resource consumption.
40b. Ability to prohibit overuse by a project.
41c. Prevent situation where users in the project get starved because users in
42 other project consume all the resource. Alice feels something like
43 "Quota Management" would help to guarantee "fairness".
44d. Prevent DOS kind of attack, abuse or error by users where an excessive
45 amount of resources are created.
46
47Proposed change
48===============
49Proposed change is to introduce a Quota Table which will primarily store the
50quota assigned to each resource in a project. For Mitaka, we will restrict
51the scope to a Bay, which are Magnum resources. Primarily, as a first step we
52will start of by imposing quota on number of bays to be created in a project.
53The change also plans to introduce REST API's to GET/PUT/POST/DELETE. CLIs to
54get information of Quota for a particular project will also be provided.
55
56For Mitaka, we will restrict the scope of the resources explicit created and
57managed by Magnum. Specifically for Mitaka we will focus on number of
58Bays only. Going ahead we might add Quota for containers, etc. The resources
59of which a Bay is constructed out of is inherently not only Magnum resource
60but involve resource from Nova, Cinder, Neutron etc. Limiting those resource
61consumption is out of the scope of this spec and needs a close collaboration
62with the quota management framework of the orchestration layer, since the
63orchestration layer can invoke the respective IaaS projects API's and get the
64consumption details before provisioning. As of now the orchestration layer
65used by Magnum, Heat, does not have the concept of Quota, so we will start with
66imposing Quota on resources which Magnum manages, Bay, more specifically for
67Mitaka.
68
69When a project is created and if the Magnum service is running, the default
70quota for Magnum resources will be set by the values configured in magnum.conf.
71Other Openstack projects like Nova [2]_, Cinder [3]_ follow a similar pattern
72and we will also do so and hence won't have a separate CLI for quota-create.
73Later if the user wants to change the Quota of the resource option will be
74provided to do so using magnum quota-update. In situation where all of the
75quota for a specific Magnum resource (Bay) have been consumed and is
76under use, admin will be allowed to set the quota to a any value lower than
77the usage or hardlimit to prohibit users from the project to create new
78Bays. This gives more flexibility to the admin to have a better control
79on resource consumption. Till the time the resource is not explicitly deleted
80the quota associated with the project, for a particular resource, won't be
81decreased. In short quota-update support will take into consideration the
82new hardlimit for a resource, specified by the admin, and will set the new
83value for this resource.
84
85Before the resource is created, Magnum will check for current count of the
86resource(Bays) created for a project. If the resource(Bay) count is less
87than the hardlimit set for the Bay, new Bay creation will be allowed. Since
88Bay creation is a long running operation, special care will be taken while
89computing the available quota. For example, 'in-progress' field in the Quota
90usages table will be updated when the resource(Bay) creation is initiated and
91is in progress. Lets say the quota hardlimit is 5 and 3 Bay's have already been
92created and two new requests come in to create new Bays. Since we have 3 Bays
93already created the 'used' field will be set to 3. Now the 'in-progress'
94field will be set to 2 till the time the Bay creation is successful. Once
95the Bay creation is done this field will be reset to 0, and the 'used'
96count will be updated from 3 to 5. So at this moment, hardlimit is 5, used
97is 5 and in-progress is 0. So lets say one more request comes in to create
98new Bay this request will be prohibited since there is not enough quota
99available.
100
101For Bays,
102
103available = hard_limit - [in_progress + used]
104
105In general,
106
107Resource quota available = Resource hard_limit - [
108(Resource creation in progress + Resources already created for project)]
109
110Alternatives
111------------
112At present there is not quota infrastructure in Magnum.
113
114Adding Quota Management layer at the Orchestration layer, Heat, could be an
115alternative. Doing so will give a finer view of resource consumption at the
116IaaS layer which can be used while provisioning Magnum resources which
117depend on the IaaS layer [1]_.
118
119Data model impact
120-----------------
121New Quota and Quota usages table will be introduced to Magnum database to
122store quota consumption for each resource in a project.
123
124Quota Table :
125
126+------------+--------------+------+-----+---------+----------------+
127| Field | Type | Null | Key | Default | Extra |
128+------------+--------------+------+-----+---------+----------------+
129| id | int(11) | NO | PRI | NULL | auto_increment |
130| created_at | datetime | YES | | NULL | |
131| updated_at | datetime | YES | | NULL | |
132| project_id | varchar(255) | YES | MUL | NULL | |
133| resource | varchar(255) | NO | | NULL | |
134| hard_limit | int(11) | YES | | NULL | |
135+------------+--------------+------+-----+---------+----------------+
136
137Quota usages table :
138
139+---------------+--------------+------+-----+---------+----------------+
140| Field | Type | Null | Key | Default | Extra |
141+---------------+--------------+------+-----+---------+----------------+
142| created_at | datetime | YES | | NULL | |
143| updated_at | datetime | YES | | NULL | |
144| id | int(11) | NO | PRI | NULL | auto_increment |
145| project_id | varchar(255) | YES | MUL | NULL | |
146| resource | varchar(255) | NO | | NULL | |
147| in_progress | int(11) | NO | | NULL | |
148| used | int(11) | NO | | NULL | |
149+---------------+--------------+------+-----+---------+----------------+
150
151
152REST API impact
153---------------
154REST API will be added for :
155
1561. quota-defaults List all default quotas for all tenants.
1572. quota-show List the currently set quota values for a tenant.
1583. quota-update Updates quotas for a tenant.
1594. quota-usage Lists quota usage for a tenant.
1605. quota-list List quota for all the tenants.
161
162A user with "admin" role will be able to do all the above operations but a user
163with "non-admin" role will be restricted to only get/list quota associated to
164his/her tenant. User with "non-admin" role can be a Member of the tenant less
165"admin" role.
166
167REST API for resources which will have quota imposed will be enhanced :
168
1691. Bay create
170Will check if there is quota available for Bay creation, if so proceed
171ahead with the request otherwise throw exception that not enough quota is
172available.
173
174Security impact
175---------------
176None
177
178Notifications impact
179--------------------
180None
181
182Other end user impact
183---------------------
184End user will have the option to look at the quota set on the resources, quota
185usage by a particular project.
186
187Performance Impact
188------------------
189None
190
191Other deployer impact
192---------------------
193None
194
195Developer impact
196----------------
197None
198
199Implementation
200==============
201
202Assignee(s)
203-----------
204
205Primary assignee:
206vilobhmm
207
208Other contributors:
209None
210
211Work Items
212----------
213
2141. Introduce Quota and Quota usages table in Magnum database.
2152. Introduce API to set/update Quota for a resource, specifically
216 bay, for Mitaka release.
2173. Introduce API to create Quota entry, by default, for a resource.
2184. Provide config options that will allow users/admins to set Quota.
2195. Make sure that if the resource is deleted the used count from the
220 quota_usages table will be decremented by the number of resources
221 deleted. For example, if resource, bay, is deleted then the entries
222 for it in the Quota usages table should be decremented by the
223 number of Bays deleted.
2246. Provide CLI options to view the quota details :
225 a. magnum quota-show <project-id>
226 b. magnum quota-update <project-id> <resource> <hard-limit>
227 c. magnum quota-defaults <project-id>
228 d. magnum quota-usage <project-id>
229 e. magnum quota-list
2307. Add conf setting for bays default quota since we will focus
231 on Bays for Mitaka.
232
233Dependencies
234============
235None
236
237Testing
238=======
239
2401. Each commit will be accompanied with unit tests.
2412. Gate functional tests will also be covered.
242
243Documentation Impact
244====================
245None
246
247References
248==========
249
250.. [1] http://lists.openstack.org/pipermail/openstack-dev/2015-December/082266.html
251.. [2] https://github.com/openstack/nova/blob/master/nova/quota.py
252.. [3] https://github.com/openstack/nova/blob/master/cinder/quota.py
diff --git a/specs/implemented/tls-support-magnum.rst b/specs/implemented/tls-support-magnum.rst
new file mode 100644
index 0000000..87bc72b
--- /dev/null
+++ b/specs/implemented/tls-support-magnum.rst
@@ -0,0 +1,226 @@
1=====================
2TLS support in Magnum
3=====================
4
5Launchpad blueprint:
6
7https://blueprints.launchpad.net/magnum/+spec/secure-kubernetes
8
9Currently there is no authentication in Magnum to provide access control to
10limit communication between the Magnum service and the Kubernetes service so
11that Kubernetes can not be controlled by a third party. This implementation
12closes this security loophole by using TLS as an access control mechanism.
13Only the Magnum server will have the key to communicate with any given
14Kubernetes API service under its control. An additional benefit of this
15approach is that communication over the network will be encrypted, reducing
16the chance of eavesdropping on the communication stream.
17
18Problem Description
19-------------------
20
21Magnum currently controls Kubernetes API services using unauthenticated HTTP.
22If an attacker knows the api_address of a Kubernetes Bay, (s)he can control
23the cluster without any access control.
24
25Use Cases
26---------
27
281. Operators expect system level control to be protected by access control that
29is consistent with industry best practices. Lack of this feature may result in
30rejection of Magnum as an option for hosting containerized workloads.
31
32Proposed Changes
33----------------
34
35The complete implementation of TLS support in Magnum can be further decomposed
36into below smaller implementations.
37
381. TLS support in Kubernetes Client Code.
39-----------------------------------------
40
41The current implementation of Kubernetes Client code doesn't have any
42authentication. So this implementation will change the client code to
43provide authentication using TLS.
44
45Launchpad blueprint:
46
47https://blueprints.launchpad.net/magnum/+spec/tls-pythonk8sclient
48
492. Generating certificates
50----------------------------
51
52This task is mainly on how certificates for both client(magnum-conductor)
53and server(kube-apiserver) will be generated and who will be the certificate
54authority(CA).
55
56These files can be generated in two ways:
57
582.1. Magnum script
59-------------------
60
61This implementation will use standard tool to generate certificates and
62keys. This script will be registered on Kubernetes master node while creating
63bay. This script will generate certificates, start the secure kube-apiserver
64and then register the client certificates at Magnum.
65
662.2. Using Barbican
67-------------------
68
69Barbican can also be used as a CA using Dogtag. This implementation will use
70Barbican to generate certificates.
71
723. TLS Support in Magnum code
73------------------------------
74
75This work mainly involves deploying a secure bay and supporting the use of
76certificates in Magnum to call Kubernetes API. This implementation can be
77decomposed into smaller tasks.
78
793.1. Create secure bay
80----------------------
81
82This implementation will deploy a secure kube-apiserver running on Kubernetes
83master node. To do so following things needs to be done:
84
85* Generate certificates
86* Copy certificates
87* Start a secure kube-apiserver
88
893.1.1. Generate certificates
90----------------------------
91
92The certificates will be generated using any of the above implementation in
93section 2.
94
953.1.2. Copy certificates
96------------------------
97
98This depends on how cert and key is generated, the implementation will differ
99with each case.
100
1013.1.2.1. Using Magnum script
102----------------------------
103
104This script will generate both server and client certificates on Kubernetes
105master node. Hence only client certificates needs to be copied to magnum host
106node. To copy these files, the script will make a call to magnum-api to store
107files.
108
1093.1.2.2. Using Barbican
110-----------------------
111
112When using Barbican, the cert and key will be generated and stored in Barbican
113itself. Either magnum-conductor can fetch the certificates from Barbican and
114copy on Kubernetes master node or it can be fetched from Kubernetes master node
115also.
116
1173.1.3. Start a secure kube-apiserver
118------------------------------------
119
120Above generated certificates will be used to start a secure kube-apiserver
121running on Kubernetes master node.
122
123Now that we have a secure Kubernetes cluster running, any API call to
124Kubernetes will be secure.
125
126
1273.2. Support https
128------------------
129
130While running any Kubernetes resource related APIs, magnum-conductor will
131fetch certificate from magnum database or Barbican and use it to make secure
132API call.
133
1344. Barbican support to store certificates securely
135----------------------------------------------------
136
137Barbican is a REST API designed for the secure storage, provisioning and
138management of secrets. The client cert and key must be stored securely. This
139implementation will support Barbican in Magnum to store the sensitive data.
140
141Data model impact
142-----------------
143
144New table 'cert' will be introduced to store the certificates.
145
146REST API impact
147---------------
148
149New API /certs will be introduced to store the certificates.
150
151Security impact
152---------------
153
154After this support, Magnum will be secure to be used in actual production
155environment. Now all the communication to Kubernetes master node will be
156secure.
157The certificates will be generated by Barbican or standard tool signed by
158trusted CAs.
159The certificates will be stored safely in Barbican when the Barbican cert
160storage option is selected by the administrator.
161
162Notifications impact
163--------------------
164
165None
166
167Other end user impact
168---------------------
169
170None
171
172Performance impact
173------------------
174
175None
176
177Other deployer impact
178---------------------
179
180Deployer will need to install Barbican to store certificates.
181
182Developer impact
183----------------
184
185None
186
187Implementation
188--------------
189
190Assignee(s)
191-----------
192
193Primary assignee
194 madhuri(Madhuri Kumari)
195 yuanying(Motohiro Otsuka)
196
197Work Items
198----------
199
2001. TLS Support in Kubernetes Client code
2012. Support for generating keys in Magnum
2023. Support creating secure Kubernetes cluster
2034. Support Barbican in Magnum to store certificates
204
205Dependencies
206------------
207
208Barbican(optional)
209
210Testing
211-------
212
213Each commit will be accompanied with unit tests. There will also be functional
214test to test both good and bad certificates.
215
216Documentation Impact
217--------------------
218
219Add a document explaining how TLS cert and keys can be generated and guide
220updated with how to use the secure model of bays.
221
222
223References
224----------
225
226None