Spec to cleanup python dev tools on our test images

Our test images are constructed in a particular manner largely driven by
history of python focus in our CI system. This focus is no longer
present and we'd like to make our images a bit more predictable and
consistent.

This spec outlines the plan for this.

Change-Id: I067b4d2650d3950fc6fa24b3b93d069f66b09dde
This commit is contained in:
Clark Boylan 2020-02-24 10:20:23 -08:00
parent cab1a48a1a
commit e6be520789
2 changed files with 165 additions and 0 deletions

View File

@ -31,6 +31,7 @@ permits.
:glob:
:maxdepth: 1
specs/cleanup-test-node-python
specs/deploy-ci-dashboard
specs/jenkins-job-builder_2.0.0-api-changes
specs/nodepool-drivers

View File

@ -0,0 +1,164 @@
::
Copyright 2020 OpenStack Foundation
This work is licensed under a Creative Commons Attribution 3.0
Unported License.
http://creativecommons.org/licenses/by/3.0/legalcode
=====================================
Cleanup Test Node Python Installation
=====================================
https://storyboard.openstack.org/TODO
The OpenDev Nodepool builders use minimal distro elements to build
our test node images up from scratch. We have done this in order to
reduce the size of images, control what goes in them, and to install
glean which has been required for images to boot properly in some some
clouds. Unfortunately, because we install glean, a python project, we
drag in a python toolchain (pip, virtualenv, etc) from pypi. This can create
problems if jobs later expect these tools to be distro package installed.
Problem Description
===================
As noted above we build our test node images from scratch. One of the reasons
for this is to install the glean utility via pip. In order to do that we
pull in latest pip to install glean for us. We also create several virtualenvs
for os-testr, bindep, and a zuul-cloner compatibility shim. To do this we
use latest pip to install latest virtualenv. Finally, we install tox using
latest pip as many of our jobs leverage it to drive testing.
Historically this has been fine as we have primarily tested python software
that want to install using up to date python development tools. Over time
we've shifted to be more of a general purpose CI platform and jobs that don't
want latest python development tools have had to work around the decisions
we have made on our images.
Recently this was made worse by a virtualenv release that was incompatible
with older virtualenv and the tools built around it. In debugging this we
discovered that we use `python3 -m venv` and `virtualenv` on different
platforms to create our system level virtualenvs for os-testr, bindep, and
zuul-cloner. This resulted in different behaviors on different platforms
and made debugging difficult.
Ideally we would use a consistent set of tooling for system level python
utilities and avoid assuming global latest pip on the images entirely.
This would lead to consistent behavior for our utilities across platforms,
and jobs that aren't testing python from source can interact with the system
in the manner they choose.
Proposed Change
===============
All platforms we use today support python3 (including latest CentOS 7). This
allows us to use `python3 -m venv` on all platforms to create system level
virtualenvs for tools like os-testr, bindep, and zuul-cloner. Additionally,
we can move glean and tox into system-level virtualenvs using
`python3 -m venv`. If we do this we can avoid installing pip and virtualenv
from pypi at a global level.
This will get us consistent utility behavior across platforms and makes life
easier for jobs that don't assume latest python development tools are
preinstalled.
We will need to accomodate existing jobs that assume an up to date python
development utility set. For these jobs that use tox they can simply refer
to the tox that has been installed in a system level virtualenv. For jobs
that need virtualenv and/or pip they will need to install these tools
at job runtime. We can update base jobs as necessary to do that automatically
for most jobs. In order to reduce the cost of this installation we can
precache get-pip.py as well as wheels for these tools and their dependencies.
Alternatives
------------
We could use distro packages for python development tools. These tend to end
up out of date, and will result in different behaviors across platforms.
We could bootstrap everything at job runtime. This will put unwanted pressure
on our mirrors and caches ans the vast majority of jobs will now install
a consistent set of tools.
We could replace glean with a non python project. Unfortuantely glean encodes
so many random cloud behaviors that rewriting it would be a fairly signficant
effort that we don't have time for.
We could continue with the current image build processes, but provide a zuul
job role that cleans up python development tools for jobs that expect system
pacakges.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
TBD
Gerrit Topic
------------
Use Gerrit topic "cleanup-test-image-python" for all patches related to this spec.
.. code-block:: bash
git-review -t cleanup-test-image-python
Work Items
----------
* Communicate this spec and its changes broadly as it has the chance to impact
a number of projects, teams, and jobs.
* Do this for a new test image and label
* Remove pip-and-virtualenv from our image element dependency list.
* Install python3 and python3-venv in all image builds.
* Replace inconsistent system level virtualenvs with `python3 -m venv`
virtualenvs.
* Add new system level virtualenvs for glean and tox.
* Apply the above changes to our production images and labels once tested
and working.
Repositories
------------
openstack/project-config will have its nodepool elements as well as
nodepool-builder configs updated.
Servers
-------
This will affect all of our single use test nodes.
DNS Entries
-----------
None
Documentation
-------------
We will update the OpenDev Test Environment docs:
https://docs.openstack.org/infra/manual/testing.html
Security
--------
N/A
Testing
-------
We will apply these changes to a new image and label so that production
images are unaffected. Once this new image/label is available in Zuul we
can run a representative set of jobs against it to ensure the expected
behavior.
Dependencies
============
None