diff --git a/specs/queens/python-build-install-simplification.rst b/specs/queens/python-build-install-simplification.rst new file mode 100644 index 0000000..ef86afa --- /dev/null +++ b/specs/queens/python-build-install-simplification.rst @@ -0,0 +1,339 @@ +Python Build/Install Process Simplification +########################################### +:date: 2017-09-06 13:00 +:tags: python, build, source, repo + +The current python wheel/venv build process is not easily understood, and the +install process has become complicated. This blueprint aims to work towards +making it simpler to deploy, simpler to understand and to make many of the +current features which are forced on all deployers to be opt-in. + +Launchpad Blueprint: +https://blueprints.launchpad.net/openstack-ansible/+spec/python-build-install-simplification + +Problem description +=================== + +Building +~~~~~~~~ +The Python repository used for OpenStack-Ansible deployments is used to +prepare `Python wheels`_ for any git- or pypi-sourced packages for an +environment. Using wheels speeds up the installation of the package and +takes away the need to install the distribution packages required to +compile the package when installing. + +The repository preparation process also prepares `Python virtualenvs`_ +for all OSA roles with the prefix ``os_`` (which are expected to be +OpenStack services) in order to speed up the deployment of the services +by downloading a complete virtualenv instead of installing the packages +individually for every host that needs the service. + +The ``py_pkgs`` lookup, which pulls together the information used by the +build process. It is a black box in terms of what it does, making some +decisions about the information it reads and outputs which are not +documented anywhere other than in the code itself. The code is not easily +modified without breaking the process and is therefore most often left +alone and not well maintained, resulting in an increasing amount of technical +debt. The subsequent jinja in the repo-build role which processes it is +tough to work through and not easily maintained. While both of these could +be adjusted to make use of different plugins or filters, it would remain +a set of black boxes which are complex to untangle. + +The way that git repositories are specified and parameters are provided +to the build process does not scale very well. Each git repo requires at +least two flat variables to be set (``git_repo`` and ``git_install_branch``) +and can optionally have more set. This model of setting variables makes it +really easy to override individual settings, but requires the use of a +pattern match mechanism to discover all the settings (which is why we use +the lookup to do it). The settings are also put in disparate places, making +them hard to find - defaults/repo_packages, role/defaults. It is not very +obvious to most newcomers how to change them and it is not obvious to many +veterans what many of the settings mean. It often requires a lot of code +walking to understand the meaning of some settings like ``venvwithindex`` +and ``ignorerequirements``. + +The git clone process used to fetch the git sources in order to use when +building is done asynchronously in order to improve the time to completion, +however individual asynchronous tasks cannot be retried in Ansible, and the +git clones commonly fail. This is an Ansible limitation which we could work +around by implementing our own action module, but this would increase the +technical debt as we would have to constantly keep the module code updated +as we update to later versions of Ansible. + +When building wheels, pip has no way of resolving all dependencies up-front. +The only capability it has is to resolve the requirements for the current +package. It then processes each package requirement in turn. To do so requires +downloading the package and unpacking it to read the requirements. This is a +sequential process and therefore takes a long time when processing packages +with a lot of requirements as is typical for OpenStack projects. + +In Kilo the OpenStack requirements management process did not have the jobs +which tested the co-installability of all OpenStack packages and produced the +``upper-constraints.txt`` file as a manifest of which package versions worked +together. We therefore needed to do our own processing of all python packages +which would be installed by the roles and had to compile a set of requirements +and constraints across them all for the purpose of building the wheels, and +ensuring that the installed set were consistent for a build. When the +OpenStack requirements repository started publishing the upper-constraints +file we adopted it immediately to help keep builds more consistent. However, +we still produce our own ``requirements_absolute_requirements.txt`` file +which is used for all pip install tasks in order to ensure consistency and +to ensure that the packages we built from git are used (instead of making +the install in the role do the install from the git source, it installs from +the wheel held in the repo server). However this is not practical any more +as there are requirements for different services and needs which are not +resolvable down to a common set - we need to be able to allow the installation +of any version of packages and only apply constraints when needed. + +Some of the venvs we build do not adhere to the OpenStack requirements process +and therefore sometimes cannot be built using the upper constraints file. +There has also been some interest in being able to do mixed series deployments +instead of homogenous deployments. This would involve preparing a venv +containing packages from a different series with a different set of +constraints. Currently the constraints used in the repo build process are +global - we only have the ability to enable/disable their use when building +venvs. It would be better to be able to specify a global fallback for +constraints, but to allow per venv constraints too. + +The use of Python 2.7 for OpenStack and Ansible is waning and the need to +shift everything to use Python 3.5 has arisen as a new requirement. The +tooling will need to be shifted to implement the venvs using Python 3.5 +where applicable, but may still need to prepare venvs using Python 2.7 if +a service does not yet support running in a Python 3.5 environment. + +In Newton we introduced the ability to do multi-architecture builds to cater +for multiple architectures, then had to also split out multi-distro builds due +do wheel/venvs references to C libraries being different for each distro due +to the libraries available being different. Currently this is working, but it +makes the repo build process much more complex and take a lot more time. The +process to synchronise the per-distro and per-architecture built artifacts is +error prone and confuses many newcomers to the project. + +In order to facilitate using the repo-server to respond to pip index queries, +multiple directories and symlinks have been used to prepare the appropriate +structure so that the correct responses are given back to pip. The process of +setting up all the symlinks is very time consuming and in some places the +process may cause dead links, especially when rebuilding for a specific release +tag. + +Storing +~~~~~~~ +Once the wheels, venvs and other artifacts are built for an environment they +are stored and synchronised between the repo containers using a combination of +rsync and lsyncd. While this sync process is generally OK, it is commonly a +cause for confusion and requires a complex troubleshooting process to figure +out why packages are not present. + +Installing +~~~~~~~~~~ +The consumption of the prepared wheels and virtualenvs has changed over +time. With the introduction of ``developer_mode`` into the roles there +is a lot of code and functionality duplication between the repo build +process and the role installation process. + +The need to cater for the optional inclusion of a variety of plugins/drivers +in the venvs either through the use of additional Python packages or by +symlinking system packages into the venv (when the package is proprietary or +unavailable via git or pypi) causes further complexity in the process. + +When executing a pip installation, pip always looks for packages in the +following order: local cache, local folder, default index, extra indexes. +Pip will always check all locations before deciding which to use for the +installation. This means that if there are multiple indexes used, it queries +them all. This can be slow if any of those are not local to the environment. + +.. _Python wheels: http://pythonwheels.com +.. _Python virtualenvs: https://virtualenv.pypa.io + +Proposed change +=============== + +* Change the repo build process to, by default, only build wheels for git + sources it is given without also building the dependencies. The ability + to build all wheels will still be there, but will not be the default + behaviour. This will cut down the time taken in this process when in CI, + development environments or small online environments where it is not + necessary to build/store all the wheels. The full build will only be + necessary for offline deployments and for environments where the deployer + specifically opts-in to ensuring that everything is built. + +* Replace the current storage structure for wheels with a flat directory. + This directory will be served via the pypi API provided by the very simple + `pypiserver`_ application. If we need to continue to provide per-distro + or per-architecture wheels then we could implement distro/arch indexes + which are supplied by individual folders. However, it is unlikely that + this will be necessary. + +* Use nginx as a reverse proxy which responds to requests from pip by first + trying against the local pypiserver, then against tarballs.openstack.org + and then against pypi. This will allow nginx to cache all downloaded + packages (speeding up subsequent requests) without the repo server having + build them. + +* Implement changes to the roles to allow service-specific constraints to + be applied when building venvs. This allows a CI process to build service + venvs and to publish the list of tested versions for that service. Then + for production builds the published list can be used as a constraint for + the venv to ensure that production builds use the same versions. This + solves a problem we have today where some projects (eg: tempest, rally, + gnocchi) have to be built unconstrained as they do not conform to the + global requirements process. + +* Implement changes to each role to handle the wheel building and venv + building, but do it in such a way that only the build can be executed + by using tags, setting a specific flag, or include_role and tasks_from. + The specific dependencies can then be itemised in the role and the role + can be used for artifact preparation. + +* Remove all pip install activities from hosts, replacing them with the + use of distro packages exclusively for any python requirements on the + hosts. We should avoid implementing as many python packages on the host + as possible and focus all efforts on implementing everything we need + (including the Ansible requirements for targeted hosts) into venvs. + All Ansible tasks should then specifically use the appropriate venv + when executing tasks, avoiding the use of any python libraries on the + host. This prevents system package conflicts and will reduce the host + package installation requirements. + +* Implement a playbook which is optionally used to prepare pre-built venvs + for an environment as they are today. If a deployer wishes to prepare + the venvs in a build process, the playbook should be exercised in the + build process and should be executed on a designated 'build host' which + will make use of ephemeral containers and/or virtual machines on the build + host to exercise the builds for the necessary distribution and architecture + combinations. + +* Remove the complex git caching/staging process which exists today and + make the use of the repo server for git caching for the services that + require it (eg: nova-console uses novnc/spice from git) entirely optional. + +* Implement a playbook which can be used to stage offline installs by + downloading all built artifacts (completed, perhaps by a CI job) to the + deployment host, then distributing them appropriately. + +* Simplify the constraints management by implementing the use of + --constraints in the following order: + + --constraint user-specified-constraints.txt + --constraint openstack-ansible-pins.txt + --constraint openstack-upper-constraints.txt + + This would replace the current method which merges the various constraints + into one file, requiring a fair amount of jinja magic because a single + file cannot have two constraints and resolve successfully into a single + result as we need in our current mechanism. + +* Implement changes to roles to ensure that the build process and the + packages only required when building (dev headers, etc) are only + used when a build is being executed. The build packages and the runtime + packages will be changed into separate lists so that the runtime environment + is only installing the packages it needs. + +* Ensure that 'optional' pip packages are installed into the venv during the + build stage, rather than during the install stage. + +.. _pypiserver: https://pypiserver.readthedocs.io + +Alternatives +------------ + +* The build process can remain as-is, continuing to confuse deployers and + difficult to maintain. + +* The build process can be changed to only build and store wheels for packages + which are pip installed onto the hosts, and only to build and store the + venvs for distribution. + +Playbook/Role impact +-------------------- + +Playbooks will be added to cater for the build process and the staging +process. The roles will be adjusted to properly separate out the build +tasks and the distro packages to install for the build (versus those +required when using pre-built wheels). + +Upgrade impact +-------------- + +Care will be taken to ensure that upgrades happen as they do today. + +Security impact +--------------- + +The security posture should be improved by the reduction of packages installed +onto hosts and containers when a full set of artifacts are built. + +Performance impact +------------------ + +The performance of the deployment should be improved due to the reduction in +time taken to deploy with pre-built packages if a full set of artifacts are +built. + +End user impact +--------------- + +There is no end-user impact for consumers of an OpenStack cloud, except +perhaps that upgrades will be quicker to execute, thus resulting in reduced +maintenance slot requirements. + +Deployer impact +--------------- + +* As deployments and upgrades will be quicker to execute, deployers will be + able to execute them in shorter maintenance slots. + +* Deployers will need to understand how better to utilise the CI process to + prepare the required artifacts to speed up deployments. + +Developer impact +---------------- + +As the build process will be integrated into the roles, it will be easier to +understand how it works and what it does. + +Dependencies +------------ + +This spec will be implemented in partnership with +https://blueprints.launchpad.net/openstack-ansible/+spec/deployment-stages + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + jesse-pretorius (odyssey4me) + +Work items +---------- + +Each of the roles implemented in the default AIO will be worked through in +sequence to re-arrange and optimise based on this workflow. The work items +are not being detailed here but will be reflected in gerrit through the +blueprint's topic and will be visible in launchpad. + +Testing +======= + +As this process matures, it may be simpler to use the integrated build for +all role testing instead of having two seperate test implementations. This +reduces technical debt for the project. + +Documentation impact +==================== + +This work will need to include documentation updates which describe the new +way that deployments can be implemented using full artifact builds and +how to implement offline installs. + +References +========== + +* https://12factor.net/ + +* http://www.clearlytech.com/2014/01/04/12-factor-apps-plain-english/ + +