diff --git a/doc/source/index.rst b/doc/source/index.rst index 6ca43c2..a70cddb 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -31,6 +31,7 @@ permits. :glob: :maxdepth: 1 + specs/cleanup-test-node-python specs/deploy-ci-dashboard specs/jenkins-job-builder_2.0.0-api-changes specs/nodepool-drivers diff --git a/specs/cleanup-test-node-python.rst b/specs/cleanup-test-node-python.rst new file mode 100644 index 0000000..bd206bf --- /dev/null +++ b/specs/cleanup-test-node-python.rst @@ -0,0 +1,164 @@ +:: + + Copyright 2020 OpenStack Foundation + + This work is licensed under a Creative Commons Attribution 3.0 + Unported License. + http://creativecommons.org/licenses/by/3.0/legalcode + +===================================== +Cleanup Test Node Python Installation +===================================== + +https://storyboard.openstack.org/TODO + +The OpenDev Nodepool builders use minimal distro elements to build +our test node images up from scratch. We have done this in order to +reduce the size of images, control what goes in them, and to install +glean which has been required for images to boot properly in some some +clouds. Unfortunately, because we install glean, a python project, we +drag in a python toolchain (pip, virtualenv, etc) from pypi. This can create +problems if jobs later expect these tools to be distro package installed. + +Problem Description +=================== + +As noted above we build our test node images from scratch. One of the reasons +for this is to install the glean utility via pip. In order to do that we +pull in latest pip to install glean for us. We also create several virtualenvs +for os-testr, bindep, and a zuul-cloner compatibility shim. To do this we +use latest pip to install latest virtualenv. Finally, we install tox using +latest pip as many of our jobs leverage it to drive testing. + +Historically this has been fine as we have primarily tested python software +that want to install using up to date python development tools. Over time +we've shifted to be more of a general purpose CI platform and jobs that don't +want latest python development tools have had to work around the decisions +we have made on our images. + +Recently this was made worse by a virtualenv release that was incompatible +with older virtualenv and the tools built around it. In debugging this we +discovered that we use `python3 -m venv` and `virtualenv` on different +platforms to create our system level virtualenvs for os-testr, bindep, and +zuul-cloner. This resulted in different behaviors on different platforms +and made debugging difficult. + +Ideally we would use a consistent set of tooling for system level python +utilities and avoid assuming global latest pip on the images entirely. +This would lead to consistent behavior for our utilities across platforms, +and jobs that aren't testing python from source can interact with the system +in the manner they choose. + +Proposed Change +=============== + +All platforms we use today support python3 (including latest CentOS 7). This +allows us to use `python3 -m venv` on all platforms to create system level +virtualenvs for tools like os-testr, bindep, and zuul-cloner. Additionally, +we can move glean and tox into system-level virtualenvs using +`python3 -m venv`. If we do this we can avoid installing pip and virtualenv +from pypi at a global level. + +This will get us consistent utility behavior across platforms and makes life +easier for jobs that don't assume latest python development tools are +preinstalled. + +We will need to accomodate existing jobs that assume an up to date python +development utility set. For these jobs that use tox they can simply refer +to the tox that has been installed in a system level virtualenv. For jobs +that need virtualenv and/or pip they will need to install these tools +at job runtime. We can update base jobs as necessary to do that automatically +for most jobs. In order to reduce the cost of this installation we can +precache get-pip.py as well as wheels for these tools and their dependencies. + +Alternatives +------------ + +We could use distro packages for python development tools. These tend to end +up out of date, and will result in different behaviors across platforms. + +We could bootstrap everything at job runtime. This will put unwanted pressure +on our mirrors and caches ans the vast majority of jobs will now install +a consistent set of tools. + +We could replace glean with a non python project. Unfortuantely glean encodes +so many random cloud behaviors that rewriting it would be a fairly signficant +effort that we don't have time for. + +We could continue with the current image build processes, but provide a zuul +job role that cleans up python development tools for jobs that expect system +pacakges. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + TBD + +Gerrit Topic +------------ + +Use Gerrit topic "cleanup-test-image-python" for all patches related to this spec. + +.. code-block:: bash + + git-review -t cleanup-test-image-python + +Work Items +---------- + +* Communicate this spec and its changes broadly as it has the chance to impact + a number of projects, teams, and jobs. +* Do this for a new test image and label + + * Remove pip-and-virtualenv from our image element dependency list. + * Install python3 and python3-venv in all image builds. + * Replace inconsistent system level virtualenvs with `python3 -m venv` + virtualenvs. + * Add new system level virtualenvs for glean and tox. + +* Apply the above changes to our production images and labels once tested + and working. + +Repositories +------------ + +openstack/project-config will have its nodepool elements as well as +nodepool-builder configs updated. + +Servers +------- + +This will affect all of our single use test nodes. + +DNS Entries +----------- + +None + +Documentation +------------- + +We will update the OpenDev Test Environment docs: +https://docs.openstack.org/infra/manual/testing.html + +Security +-------- + +N/A + +Testing +------- + +We will apply these changes to a new image and label so that production +images are unaffected. Once this new image/label is available in Zuul we +can run a representative set of jobs against it to ensure the expected +behavior. + +Dependencies +============ + +None