Commit Graph

68 Commits

Author SHA1 Message Date
Zuul c1d1954e91 Merge "Add option to install everything in global venvs" 2023-08-11 18:04:15 +00:00
Dan Smith c3b0b9034e Disable waiting forever for connpool workers
This will cause apache to no longer wait forever for a connection
pool member to become available before returning 503 to the client.
This may help us determine if some of the timeouts we see when
talking to the services come from an overloaded apache.

Change-Id: Ibc19fc9a53e2330f9aca45f5a10a59c576cb22e6
2023-08-04 07:16:27 -07:00
Clark Boylan a40f9cb91f Add option to install everything in global venvs
Since we are python3 only for openstack we create a single python3
virtualenv to install all the packages into. This gives us the benefits
of installing into a virtualenv while still ensuring coinstallability.
This is a major change and will likely break many things.

There are several reasons for this. The change that started this effort
was pip stopped uninstalling packages which used distutils to generate
their package installation. Many distro packages do this which meant
that pip installed packages and distro packages could not coexist in the
global install space. More recently git has made pip installing repos as
root more difficult due to file ownership concerns.

Currently the switch to the global venv is optional, but if we go down
this path we should very quickly remove the old global installation
method as it has only caused us problems.

Major hurdles we have to get over are convincing rootwrap to trust
binaries in the virtualenvs (so you'll notice we update rootwrap
configs).

Some distros still have issues, keep them using the old setup for now.

Depends-On: https://review.opendev.org/c/openstack/grenade/+/880266
Co-Authored-By: Dr. Jens Harbott <frickler@offenerstapel.de>
Change-Id: If9bc7ba45522189d03f19b86cb681bb150ee2f25
2023-08-02 07:07:25 +02:00
Zuul b52dceee7b Merge "Switch TLS tests to TLSv1.2+ only" 2023-07-21 16:37:18 +00:00
Martin Kopec ec07b343d2 Remove support for opensuse
We haven't been testing the distro for a while in CI, e.g. in
Tempest, the jobs on opensuse15 haven't been executed for a year
now.
Therefore the patch removes opensuse support from devstack.

Closes-Bug: #2002900
Change-Id: I0f5e4c644e2d14d1b8bb5bc0096d1469febe5fcc
2023-02-16 12:01:39 +01:00
Dan Smith 64d68679d9 Improve API log parsing
Two runs of the same job on the same patch can yield quite different
numbers for API calls if we just count the raw calls. Many of these
are tempest polling for resources, which on a slow worker can require
many more calls than a fast one.

Tempest seems to not change its User-Agent string, but the client
libraries do. So, if we ignore the regular "python-urllib" agent
calls, we get a much more stable count of service-to-service API
calls in the performance report.

Note that we were also logging in a different (less-rich) format for
the tls-proxy.log file, which hampers our ability to parse that
data in the same format. This switches it to "combined" which is used
by the access.log and contains more useful information, like the
user-agent, among other things.

Change-Id: I8889c2e53f85c41150e1245dcbe2a79bac702aad
2022-05-12 07:55:30 -07:00
Michael Johnson 35bc600da1 Fix tls-proxy on newer versions of openssl
Newer versions of openssl (CentOS9Stream for example) do not like using sha1.
Devstack will fail on these systems[1] with the following error:
801B93DCE77F0000:error:03000098:digital envelope routines:do_sigver_init:invalid digest:crypto/evp/m_sigver.c:333:
This patch updates the tls-proxy code in devstack to use sha256 instead of sha1 which allows devstack to complete when tls-proxy is enabled.

[1] https://zuul.opendev.org/t/openstack/build/1d90b22a39c74e24a8390861b3c5f957/log/job-output.txt#5535

Closes-Bug: #1962600

Change-Id: I71e1371affe32f070167037b0109a489d196bd31
2022-03-11 20:28:39 +00:00
Jens Harbott 3f28c272d0 Remove deprecated tail_log function
This function has been deprecated for a long time, let's finally
remove it. It is only generating a warning anyway.

Change-Id: I7bd440adf2ce8283e3ad3d5d09e6b2b877e2b42e
2020-10-28 13:06:52 +00:00
Zuul d3b41b528d Merge "Allow IP-based subject alt names" 2020-07-07 08:43:50 +00:00
Jens Harbott d7a82f41e4 Drop support for python2
python2 is EOL, let's move on and only support python3.

Change-Id: Ieffda4edea9cc19484c04420ed703f7141ef9f15
2020-06-26 15:27:32 +02:00
Ian Wienand 3cd41019b0 lib/tls: use python3 to run inline script
We only need to run this fixup for the active python now we are 3 only.

Change-Id: I7616e5ee5693b2890fb7f6bd9052890a82904c22
2020-04-22 14:01:53 +10:00
Julia Kreger 0fe25e31a8 Add the IPv6 IP to the TLS cert
For some crazy reason, we've forgotten about trying
to use IPv6 addresses directly with the SSL certificates.

So lets add some logic so clients can connect directly
with the v6 IP.

Change-Id: Ie8b8a2d99945f028bebe805b83bfd863b7b72d57
2019-08-12 08:46:56 +02:00
Dirk Mueller dc01a8ab63 Switch TLS tests to TLSv1.2+ only
This would more likely match a relevant production deployment.

Change-Id: I4ee2ff0c00a8e33fd069a782b32eed5fef62c01b
2019-07-14 22:33:45 +02:00
Clark Boylan e344c97c0e Set apache proxy-initial-not-pooled env var
We've run into what appears to be a race with apache trying to reuse a
pooled connection to a backend when that pool connection is closing.
This leads to errors like:

  [Fri Dec 07 21:44:10.752362 2018] [proxy_http:error] [pid 19073:tid 139654393218816] (20014)Internal error (specific information not available): [client 104.130.127.213:45408] AH01102: error reading status line from remote server 127.0.0.1:60999
  [Fri Dec 07 21:44:10.752405 2018] [proxy:error] [pid 19073:tid 139654393218816] [client 104.130.127.213:45408] AH00898: Error reading from remote server returned by /image/v2/images/ec31a4fd-e22b-4e97-8c6c-1ef330823fc1/file

According to the internets this can be addressed (at the cost of some
performance) by setting the proxy-initial-not-pooled env var for mod
proxy. From the mod_proxy docs:

  If this variable is set, no pooled connection will be reused if the client
  request is the initial request on the frontend connection. This avoids the
  "proxy: error reading status line from remote server" error message caused
  by the race condition that the backend server closed the pooled connection
  after the connection check by the proxy and before data sent by the proxy
  reached the backend. It has to be kept in mind that setting this variable
  downgrades performance, especially with HTTP/1.0 clients.

Closes-Bug: #1807518

Change-Id: I374deddefaa033de858b7bc15f893bf731ad7ff2
2018-12-08 18:24:26 +00:00
Tim Burke 0137703825 Allow IP-based subject alt names
... even when no other subject alt names provided

Previously, a non-voting job in barbican's gate would fail with something like

  X509 V3 routines:X509V3_parse_list:invalid null name:v3_utl.c:319:
  X509 V3 routines:DO_EXT_NCONF:invalid extension string:v3_conf.c:140:name=subjectAltName,section=DNS:pykmip-server,,IP:198.72.124.103
  X509 V3 routines:X509V3_EXT_nconf:error in extension:v3_conf.c:95:name=subjectAltName, value=DNS:pykmip-server,,IP:198.72.124.103

because we'd have an invalid empty string.

Change-Id: I5459b8976539924cd6cc6c1e681b6753a76b804c
2018-11-30 14:40:12 -08:00
aojeagarcia 9a543a81ac Don't use ipv6 for DNS SAN fields with python3
Python2 match routines for x509 fields are broken and have to use
the DNS field for ip addresses.

The problem is that if you use ipv6 addresses in the DNS field,
urllib3 fails when trying to encode it.

Since python3 match routines for x509 fields are correct, this patch
disables the hack for python3, encoding the ip address in the
corresponding field only of the certificate.

Partial-Bug: #1794929
Depends-On: https://review.openstack.org/#/c/608468

Change-Id: I7b9cb15ccfa181648afb12be51ee48bed14f9156
Signed-off-by: aojeagarcia <aojeagarcia@suse.com>
2018-10-07 21:21:12 +00:00
Jens Harbott dc7b429463 Fix running with SERVICE_IP_VERSION=6
- There are some locations where we need the raw IPv6 address instead of the
  url-quoted version enclosed in brackets.
- Make nova-api-metadata service listen on IPv6 when we need that.
- Use SERVICE_HOST instead of HOST_IP for TLS_IP.

Change-Id: Id074be38ee95754e88b7219de7d9beb06f796fad
Partial-Bug: 1656329
2018-03-11 08:53:41 +00:00
Zuul 9f71c4ad4e Merge "nova: add support for TLS between novnc proxy & compute nodes" 2018-02-20 09:39:19 +00:00
Jens Harbott 1db9b5d3ca Remove apache tls-proxy sites when stopping
Currently doing a cycle of

    ./stack.sh; ./unstack.sh; ./stack.sh

fails because the leftover tls-proxy sites will cause apache startup to
fail on the second stack.sh run. So we need to disable these sites on
running stop_tls_proxy.

Change-Id: I03e6879be332289d19ca6a656f5f9f139dffff6f
Closes-Bug: 1718189
2017-11-10 10:43:19 +11:00
Daniel P. Berrange e9870eb18d nova: add support for TLS between novnc proxy & compute nodes
Nova is gaining the ability to run TLS over the connection between the
novnc proxy service and the QEMU/KVM compute node VNC server.

This adds a new config param - 'NOVA_CONSOLE_PROXY_COMPUTE_TLS=True' -
which instructs devstack to configure libvirt/QEMU to enable TLS for the
VNC server, and to configure the novncproxy to use TLS when connecting.
NB this use of TLS is distinct from use of TLS for the public facing API
controlled by USE_SSL, they can be enabled independently.

This is done in a generic manner so that it is easy to extend to cover
use of TLS with the SPICE and serial console proxy services too.

Change-Id: Ib29d3f5f18533115b9c51e27b373e92fc0a28d1a
Depends-on: I9cc9a380500715e60bd05aa5c29ee46bc6f8d6c2
Implements bp: websocket-proxy-to-host-security
2017-10-19 18:32:51 +00:00
Jenkins 80021b8f9f Merge "Fix URLs when running with tls-proxy enabled" 2017-09-08 15:27:18 +00:00
Jens Harbott 411c34da69 Fix URLs when running with tls-proxy enabled
Various services are returning broken links when running behind
tls-proxy. These issues can be fixed by setting the X-Forwarded-Proto
header in the apache config and letting oslo_middleware parse it.

Change-Id: Ibe5dbdc4644ec812f0435f59319666fc336c195a
Partial-Bug: 1713731
2017-08-29 14:40:26 +00:00
Jens Harbott 4639984b96 Update function description for start_tls_proxy
In [1] the definition of the function was changed, adding the service
name as first parameter. Since this seems to have caused failures in
some plugins, at least update the function template accordingly.

[1] Ifcba410f5969521e8b3d30f02795541c1661f83a

Change-Id: I4d03957f8d3a18625f06379fb21aa7ba55e32797
2017-08-28 11:43:37 +00:00
Ian Wienand 139837d69d Make TLS logs more readable
After looking at these for I9881f2e7d51fdd9fc0f7fb3e37179aa53171b531 I
found them not as useful as they could be.

Fix the CustomLog command, that wants the logfile then the format
string (or a nickname, which the LogFormat line wasn't setting).  Use
standard micro-second timestamps, and trim the access log to have more
relevant info.

Change-Id: I9f4c8ef38ab9e08aeced7b309d4a5276de07af4b
2017-08-09 06:30:22 +10:00
Jenkins 8f314400d8 Merge "Set specified header size when enabling tls-proxy" 2017-06-29 23:00:35 +00:00
Clark Boylan f4dbd12f78 Set specified header size when enabling tls-proxy
As part of getting swift's functional testing to work properly through
the tls-proxy we need to increase the allowed request header size in
apache. This was a non issue without tls proxy as requests hit the
eventlet webserver directly which was configured via the swift config
which sets this relatively large limit (by default devstack configures
swift to have a header size limit of 16384).

Now we pass in an optional parameter to start_tls_proxy that includes
the desired header size. lib/swift then passes in the value it also
configures in its swift.conf.

If not explicitly set we default to 8190 which is apache2's default.

Change-Id: Ib2811c8d3cbb49cf94b70294788526b15a798edd
2017-06-05 12:47:50 -07:00
Jenkins dc9ef55fc6 Merge "Make stack.sh work on SUSE" 2017-05-31 20:48:10 +00:00
Clark Boylan 35649ae0d2 Make stack.sh work on SUSE
This adds packages to suse for systemd python linkages as well as
apache2 and which. And configures mod_proxy and mod_proxy_uwsgi with
a2enmod.

We also properly query if apache mods are enabled to avoid running
into systemd service restart limits. Enable mod_version across the board
as we use it and it may not be enabled by default (like in SUSE).

Also in addition to enabling mod_ssl we enable the SSL flag so that TLS
will work...

Finally we tell the system to trust the devstack CA.

Change-Id: I3442cebfb2e7c2550733eb95a12fab42e1229ce7
2017-05-28 09:58:51 -07:00
Clark Boylan 4baac65725 Use proper python when configuring certs
We have to do silly overrides of cert locations for requests for
reasons. If we are running under python3 then we were previously looking
in the wrong location for the requests certs. Update the cert fixing
function to properly use python3 to find the certs if python3 is
enabled.

Change-Id: Id1369da0d812edcf9b1204e9c567f8bfe77c48b2
2017-05-27 20:57:56 -07:00
Clark Boylan faffde1f97 Use string cert CA defaults
Switch from sha1 to sha256 and from 1024 bits to 2048 bits. Do this
because things don't like the old inseucre sha1+1024bits combo.

Change-Id: Iae2958969aed0cd880844e19e8055c8bdc7d064d
2017-04-27 09:54:27 -07:00
Ian Wienand f6a2d2cd4e Always restart apache
As described in [1], it seems that mod_wsgi is not "graceful" reload
safe.  Upon re-init, it can end up in a segfault loop.

The "reload" (not *restart*) after setting up uwsgi was added with
I1d89be1f1b36f26eaf543b99bde6fdc5701474fe but not causing an issue
until uwsgi was enabled.

We do not notice in the gate, because the TLS setup ends up doing a
restart after this setup.  In the period between the
write_uwsgi_config and that restart, Apache is sitting in a segfault
loop, but we never noticed because we don't try talking to it.  Other
jobs that don't do any further apache configuration have started
failing, however.

Looking at the original comments around "reload_apache_server" I'm not
sure if it is still necessary.  [2] shows it is not used outside these
two calls.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1445540
[2] http://codesearch.openstack.org/?q=reload_apache_server&i=nope&files=&repos=

Closes-Bug: #1686210
Change-Id: I5234bae0595efdcd30305a32bf9c121072a3625e
2017-04-26 11:09:59 +10:00
Sean Dague a1446b960f always retry proxy errors
When an apache worker gets a proxy error, it will not retry talking to
the backend server until the retry timeout expires. We bring up the
proxy server *before* the backend server, and poll it. If we are
running a small number of workers, there is a likely chance that we're
going to hit one that errored before the backend was up, thus failing
for now real reason.

Set this to 0 instead to mean always retry failed connections.

Change-Id: I9e584f087bd375f71ddf0c70f83205c425094a17
Ref: https://httpd.apache.org/docs/2.4/mod/mod_proxy.html#proxypass
2017-04-17 14:31:21 -04:00
Sean Dague f3b2f4c853 Remove USE_SSL support
tls-proxy is the way we're now doing a standard install using https
between services. There is a lot more work to make services directly
handle https, and having python daemons do that directly is a bit of
an anti pattern. Nothing currently tests this in project-config from
my recent grepping, so in the interest of long term maintenance,
delete it all.

Change-Id: I910df4ceab6f24f3d9c484e0433c93b06f17d6e1
2017-04-17 07:27:32 -04:00
Clark Boylan 8cf9acd577 Tune apache connection limits down
We are facing memory pressure in gate testing. Apache is fairly large so
tune its connection limits down to try and squeeze out more useable
memory. THis should be fine for dev envs, also tlsproxy is not enabled
by default so we can check that this tuning works well on a subset of
jobs before making it default everywhere.

Data comparisons done with gate-tempest-dsvm-neutron-full-ubuntu-xenial
jobs.

Old: http://logs.openstack.org/37/447037/2/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/721fc6f/logs/screen-peakmem_tracker.txt.gz
       PID   %MEM             RSS       PPID       TIME     NLWP WCHAN                     COMMAND
     20504    0.2           16660      19589   00:00:00       34 -                         /usr/sbin/apache2 -k start
     20505    0.2           16600      19589   00:00:00       34 -                         /usr/sbin/apache2 -k start
     20672    0.2           16600      19589   00:00:00       34 -                         /usr/sbin/apache2 -k start
     20503    0.1           14388      19589   00:00:00       34 -                         /usr/sbin/apache2 -k start
     19589    0.1            9964          1   00:00:00        1 -                         /usr/sbin/apache2 -k start
Total RSS: 74212

New: http://logs.openstack.org/41/446741/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/fa4d2e6/logs/screen-peakmem_tracker.txt.gz
       PID   %MEM             RSS       PPID       TIME     NLWP WCHAN                     COMMAND
      8036    0.1           15316       8018   00:00:01       34 -                         /usr/sbin/apache2 -k start
      8037    0.1           15228       8018   00:00:01       34 -                         /usr/sbin/apache2 -k start
      8018    0.1            8584          1   00:00:00        1 -                         /usr/sbin/apache2 -k start
Total RSS: 39128

Note RSS here is in KB. Total difference is 35084KB or about
34MB. Not the biggest change, but we seem to be functional and it
almost halves the apache overhead.

Change-Id: If82fa347db140021197a215113df4ce38fb4fd17
2017-03-17 11:42:41 -07:00
Jenkins 42a914cadf Merge "Revert "tls proxy: immediately close a connection to the backend"" 2017-02-21 21:02:03 +00:00
Jordan Pittier 4370925181 TLS proxy: disable HTTP KeepAlive
There's a race condition when a client makes a request "at the same
time" the HTTP connection is being closed by Apache because the
`KeepAliveTimeout` is expired.

This is explained in detail and can be reproduce using
https://github.com/mikem23/keepalive-race or
https://github.com/JordanP/openstack-snippets/blob/master/keepalive-race/keep-alive-race.py

Just disable KeepAlive to fix the
('Connection aborted.', BadStatusLine("''",)) error we are seeing.

Change-Id: I46e9f70ee740ec7996c98d386d5289c1491e9436
2017-02-14 16:59:07 +01:00
Jordan Pittier bc3d01c8ec Revert "tls proxy: immediately close a connection to the backend"
This reverts commit e0a37cf21e.

This didn't help fixing bug #1630664. Issue seems to be between
client<--->Apache2, not between Apache2<--->eventlet

Change-Id: I092c1bbf0c5848b50fc9e491d1e9211451208a89
2017-02-14 15:46:03 +00:00
Jordan Pittier e0a37cf21e tls proxy: immediately close a connection to the backend
Force mod_proxy to immediately close a connection to the backend
after being used, and thus, disable its persistent connection and
pool for that backend.

Let's see if that helps fixing bug #1630664 (the
Connection aborted/ BadStatusLine thing).

We already have an ER query (in queries/1630664.yaml) that should show
whether this is effective.

Change-Id: I03b09f7df5c6e134ec4091a2f8dfe8ef614d1951
2017-02-10 15:04:52 +01:00
Clark Boylan cfb9f057ea Tune apache connections for tls proxy
We are seeing connection errors to the proxy occasionally. These errors
do not result in a logged http request or error to the backends,
resulting in a theory that the proxy itself may just not be able to
handle the number of connections. More than double the total number of
connections that will be accepted by the proxy in an attempt to fix
this.

Change-Id: Iefa6c43451dd1f95927528d2ce0003c84248847f
Related-bug: 1630664
2016-11-29 10:43:05 -08:00
Daniel P. Berrange c30b8def82 Move certificate setup earlier in deployment
Currently the x509 certificate setup is done after all the
openstack services have been deployed. This is OK because
none of the services require that the x509 certs exist
when they are being deployed. With the integration of TLS
into the nova novnc proxy (and later spice & serial proxy)
service, x509 certs will need to exist before Nova is
deployed.

The CA setup must thus be moved earlier in the devstack
deployment flow, prior to the setup of any services. One
part of the CA setup, however, fixes up the global cert
bundle locations and this can only be done after the
python requests module is install, thus must remain in
its current location.

Change-Id: Idcd264fb73bb88dc2f4280c53c013dfe4364afff
2016-11-15 11:24:04 +00:00
Sean Dague f06455e1b5 Add a screen session for tls logs
When tls is enabled, we aren't bringing the logs to the forefront,
which makes it hard to debug when things go wrong. This does that.

Change-Id: I7c6c7e324e16da6b9bfa44f4bad17401ca4ed7e3
2016-10-07 06:57:03 -04:00
Clark Boylan 66ce5c257a Update apache tls proxy logs
This creates log files per proxy vhost and sets the log level to info to
help debug potential issues with tls proxying.

Change-Id: I02a62224662b021b35c293909ba045b4b74e1df8
2016-10-05 16:25:53 -07:00
Jenkins e75d5044f4 Merge "Update certificate creation for urllib3" 2016-09-27 11:26:47 +00:00
Ian Cordasco 69e3c0aac9 Update certificate creation for urllib3
urllib3 1.18 was released today and contains new more correct hostname
matching that takes into account the ipAddress portion of a certificate
and disallows matching an IP Address against a DNS hostname.

Change-Id: I37d247b68911dc85f55adec6a7952ed321c1b1d8
2016-09-26 12:21:41 -07:00
Clark Boylan 323b726783 Don't make root CA if it exists
To support multinode testing where we just copy the CA to all the
instances don't remake the CA if it already exists.

The end result is that you can trusty a single chain and all your
clients will be happy regardless of which host they are talking to.

Change-Id: I90892e6828a59fa37af717361a2f1eed15a87ae4
2016-09-26 11:37:18 +00:00
Gregory Haynes 4b49e409f8 Use apache for tls-proxy ssl termination
Stud is now abandonware (see https://github.com/bumptech/stud) and is
not packaged in xenial. Lets use Apache for SSL termination since its
there already.

Change-Id: Ifcba410f5969521e8b3d30f02795541c1661f83a
2016-09-20 08:14:11 -07:00
Rob Crittenden be00e95da5 Add OS_CACERT to userrc_early and ensure SERVICE_HOST is SAN
OS_CACERT was being added directly to the environment rather
than usercc_early. This caused an untrusted CA error to be
thrown.

Ensure that SERVICE_HOST is in the Subject Alt. Names of the
issued TLS server cert. The gate sets it to 127.0.0.1 which
wasn't being handled. Only the FQDN of the host and actual
IP address of the machine were being added.

Change-Id: I8a91dffe1a5263d2bcc99ea406a8556045b52be2
2016-03-28 10:00:52 -04:00
Ian Wienand ada886dd43 Don't mix declaration and set of locals
Ia0957b47187c3dcadd46154b17022c4213781112 proposes to have bashate
find instances of setting a local value.  The issue is that "local"
always returns 0, thus hiding any failure in the commands running to
set the variable.

This is an automated replacement of such instances

Depends-On: I676c805e8f0401f75cc5367eee83b3d880cdef81
Change-Id: I9c8912a8fd596535589b207d7fc553b9d951d3fe
2015-10-07 17:03:32 +11:00
Rob Crittenden 1987fcc8a3 Replace pip-installed requests CA bundle with link
If the version of python-requests required is higher than
that provided by the operating system, pip will install
it from upstream.

The upstream version provides its own CA certificate bundle
based on the Mozilla bundle, and defaults to that in case
a CA certificate file is not specified for a request.

The distribution-specific packages point to the system-wide
CA bundle that can be managed by tools such as
update-ca-trust (Fedora/RHEL) and update-ca-certificates
(Debian/Ubuntu).

When installing in SSL/TLS mode, either with SSL=True or by
adding tls-proxy to ENABLED_SERVICES, if a non-systemwide
CA bundle is used, then the CA generated by devstack will
not be used causing the installation to fail.

Replace the upstream-provided bundle with a link to the
system bundle when possible.

Change-Id: I651aec93398d583dcdc8323503792df7ca05a7e7
Closes-Bug: #1459789
2015-06-16 17:57:09 -04:00
Dean Troyer dc97cb71e8 Mostly docs cleanups
Fix documentation build errors and RST formatting

Change-Id: Id93153400c5b069dd9d772381558c7085f64c207
2015-03-28 14:35:12 -05:00