zun/specs/cpuset-container.rst

4.8 KiB

Supporting CPU sets in ZUN

Related Launchpad Blueprint:

https://blueprints.launchpad.net/zun/+spec/cpuset-container

ZUN presently does not have a way to allow users to specify dedicated resources for workloads that require higher performance. Such workloads can be classified as Network Function Virtualization (NFV) based, AI based or HPC based. This spec takes a first step towards supporting such workloads with dedicated resources. The first of such resources can be the cpusets or cores on a given physical host.

Problem description

Exposing cpusets to the cloud users cannot be done in its raw form. This is because, exposing such parameters to the end user breaks the cloudy model of doing things.

Exposing cpusets can be broadly thought of as combination of user policies and host capabilities.

The user policies are applied against compute host capabilities and if it matches, the user is allowed to perform the CRUD operations on a container.

Proposed change

  1. Piggy back on the work done for host capabilities.

More details of this work would be covered on a seperate blueprint: https://blueprints.launchpad.net/zun/+spec/expose-host-capabilities

2. Hydrate the schema with information obtained via calling driver specific methods that obtain the details of a host inventory. For cpusets, lscpu -p can be used to obtain the required information. Implement a periodic task that inventories the host at regular intervals.

3. Define 2 user cpu-policies called "dedicated" and "shared". The first policy signifies that the user wants to use dedicated cpusets for their workloads. The shared mode is very similar to the default behavior. If unless specified, the behavior will be defaulted to "shared".

4. Write driver methods to provision containers with dedicated cpusets. The logic of 'what' cpusets should be picked up for a given requests lies in the control of the zun code and is not exposed to the user.

5. The cpu-policy parameter is specified in conjuction with the vcpus field for container creation. The number of vcpus shall determine the number of cpusets requested for dedicated usage.

6. If this feature is being used with the zun scheduler, then the scheduler needs to be aware of the host capabilities to choose the right host.

For example:

$ zun run -i -t --name test --cpu 4 --cpu-policy dedicated

We would try to support scheduling using both of these policies on the same host.

How it works internally?

Once the user specifies the number of cpus, we would try to select a numa node that has the same or more number of cpusets unpinned that can satisfy the request.

Once the cpusets are determined by the scheduler and it's corresponding numa node, a driver method should be called for the actual provisoning of the request on the compute node. Corresponding updates would be made to the inventory table.

In case of the docker driver - this can be achieved by a docker run equivalent:

$ docker run -d ubuntu --cpusets-cpu="1,3" --cpuset-mems="1,3"

The cpuset-mems would allow the memory access for the cpusets to stay localized.

If the container is in paused/stopped state, the DB will still continue to block the pinset information for the container instead of releasing it.

Design Principles

  1. Build a host capability model that can be leveraged by the zun scheduler.

2. Create abstract user policies for the cloud user instead of raw values.

Alternatives

None

Data model impact

  • Introduce a new field in the container object called 'cpu_policy'.
  • Add a new numa.py object to store the inventory information.
  • Add numa topology obtained to the compute_node object.

REST API impact

The existing container CRUD APIs should allow a new parameter for cpu policy.

Security impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

None

Other deployer impact

None

Developer impact

None

Implementation

Assignee(s)

Primary assignee: Sudipta Biswas (sbiswas7@in.ibm.com)

Other contributors: Hongbin Lu, Pradeep Singh

Work Items

  1. Create the new schema.
  2. Add cpu_policy field in the REST APIs and zun client.
  3. Write logic to hydrate the inventory tables.
  4. Implement a periodic task that inventories the host.
  5. Write logic to check the cpusets of a given host.
  6. Implement unit/integration test.

Dependencies

Testing

Each patch will have unit tests.

Documentation Impact

A set of documentation for this new feature will be required.