Commit Graph

277 Commits

Author SHA1 Message Date
James E. Blair 9105ffe00b Add script to generate openapi spec
The existing openapi spec document (used to generate the swagger
ui page in the web app as well as the rst documentation) is
both incomplete and wrong due to bitrot.

This change adds a script which automatically generates much of
the api documentation from the code.  The output is still incomplete,
but it does include at least the same endpoints currently documented,
and of those, all of the inputs and outputs.

Due to its automatic generation, all of the endpoints and their
inputs are now documented.  Only some outputs are missing (as well
as explanatory text, which was pretty thin before).

It does the following:

* Inspects the cherrypy router object to determine the endpoints to
  include, and identifies their HTTP methods and the python functions
  that implement them.
* It inspects the function python docstring to get summary documentation
  for the endpoint.
* It inspects the function arguments and compares them to the
  router path to determine if each is a path or query parameter,
  as well as whether each is required.
* It merges type and descriptive information from the python docstring
  about each parameter.
* For output, a schema system similar to voluptuous is used to describe
  the output names and types, as well as optional descriptive information.
  One of two function decorators are used to describe the output.

It removes the documentation for the status page output format.  This API
is specially optimized for the Zuul status page, is very complex, and we
should therefore not encourage end-users to develop against it.  The
endpoint itself is documented as such, but the response value is
undocumented.

Future work:

More descriptive text and output formats can be documented.

Change-Id: Ib1a2aad728c4a7900841a8e3b617c146f2224953
2024-03-09 11:25:40 -08:00
Zuul 617bbb229c Merge "Fix validate-tenants isolation" 2024-02-28 02:46:55 +00:00
James E. Blair 1cc276687a Change status json to use "refs" instead of "changes"
This is mostly an internal API change to replace the use of the
word "change" with "ref" in the status json.  This matches the
database and build/buildsets records.

Change-Id: Id468d16d6deb0af3d1c0f74beb1b25630455b8f9
2024-02-09 07:39:52 -08:00
James E. Blair 1f026bd49c Finish circular dependency refactor
This change completes the circular dependency refactor.

The principal change is that queue items may now include
more than one change simultaneously in the case of circular
dependencies.

In dependent pipelines, the two-phase reporting process is
simplified because it happens during processing of a single
item.

In independent pipelines, non-live items are still used for
linear depnedencies, but multi-change items are used for
circular dependencies.

Previously changes were enqueued recursively and then
bundles were made out of the resulting items.  Since we now
need to enqueue entire cycles in one queue item, the
dependency graph generation is performed at the start of
enqueing the first change in a cycle.

Some tests exercise situations where Zuul is processing
events for old patchsets of changes.  The new change query
sequence mentioned in the previous paragraph necessitates
more accurate information about out-of-date patchsets than
the previous sequence, therefore the Gerrit driver has been
updated to query and return more data about non-current
patchsets.

This change is not backwards compatible with the existing
ZK schema, and will require Zuul systems delete all pipeline
states during the upgrade.  A later change will implement
a helper command for this.

All backwards compatability handling for the last several
model_api versions which were added to prepare for this
upgrade have been removed.  In general, all model data
structures involving frozen jobs are now indexed by the
frozen job's uuid and no longer include the job name since
a job name no longer uniquely identifies a job in a buildset
(either the uuid or the (job name, change) tuple must be
used to identify it).

Job deduplication is simplified and now only needs to
consider jobs within the same buildset.

The fake github driver had a bug (fakegithub.py line 694) where
it did not correctly increment the check run counter, so our
tests that verified that we closed out obsolete check runs
when re-enqueing were not valid.  This has been corrected, and
in doing so, has necessitated some changes around quiet dequeing
when we re-enqueue a change.

The reporting in several drivers has been updated to support
reporting information about multiple changes in a queue item.

Change-Id: I0b9e4d3f9936b1e66a08142fc36866269dc287f1
Depends-On: https://review.opendev.org/907627
2024-02-09 07:39:40 -08:00
James E. Blair fb7d24b245 Fix validate-tenants isolation
The validate-tenants scheduler subcommand is supposed to perform
complete tenant validation, and in doing so, it interacts with zk.
It is supposed to isolate itself from the production data, but
it appears to accidentally use the same unparsed config cache
as the production system.  This is mostly okay, but if the loading
paths are different, it could lead to writing cache errors into
the production file cache.

The error is caused because the ConfigLoader creates an internal
reference to the unparsed config cache and therefore ignores the
temporary/isolated unparsed config cache created by the scheduler.

To correct this, we will always pass the unparsed config cache
into the configloader.

Change-Id: I40bdbef4b767e19e99f58cbb3aa690bcb840fcd7
2024-01-31 14:58:45 -08:00
Zuul d12ec11321 Merge "Improve support for web enqueue/dequeue" 2024-01-14 16:31:05 +00:00
Zuul 0ffe18a930 Merge "Index job map by uuid" 2024-01-13 17:27:33 +00:00
James E. Blair 50f068ee6d Add a build-times web endpoint
This endpoint runs an optimized query for returning information
suitable for displaying a graph of build times.

This includes a schema migration to add some indexes to aid
the query.

Change-Id: I56e8422a599c1ee51216f26fcae5a39013066e6b
2024-01-03 13:06:07 -08:00
James E. Blair a9ff6b3410 Improve support for web enqueue/dequeue
This change:

* Returns the build/buildset oldrev through the REST API (this
  field was missing).
* Updates the web UI so that when enqueuing or dequeueing a ref it will
  send exactly the oldrev/newrev values it received, including None/null.
* No longer translate None to 40*'0' when creating internal management
  events.

In concert, these changes allow a user to re-enqueue exactly as
originally enqueued buildsets for branch tips (periodic pipeline) as
well as ref updates (tag/post pipelines).

Additionally, the re-enqueue method in the web UI is updated to support
re-enqueing tag and branch heads (it only worked on change and
ref-updates before).

Finally, the buildset page is updated to show the old and new revs
if they are non-null.

Change-Id: I9886cd44f8b4bae6f4a5ce3644f0598a73ecfe0a
2023-12-14 10:18:33 -08:00
James E. Blair cb3c4883f2 Index job map by uuid
This is part of the circular dependency refactor.  It changes the
job map (a dictionary shared by the BuildSet and JobGraph classes
(BuildSet.jobs is JobGraph._job_map -- this is because JobGraph
is really just a class to encapsulate some logic for BuildSet))
to be indexed by FrozenJob.uuid instead of job name.  This helps
prepare for supporting multiple jobs with the same name in a
buildset.

Change-Id: Ie17dcf2dd0d086bd18bb3471592e32dcbb8b8bda
2023-12-12 10:22:25 -08:00
Zuul 50e06b4e74 Merge "Tidy some auth exceptions" 2023-12-01 20:04:14 +00:00
Matthieu Huin 2ba13b9575 zuul-web: ensure HTTPError is invoked with 4xx status code
The _getTenantOrRaise helper function would raise an HTTP error if a tenant
isn't ready, and attempt to return a 204 status code along the error. Cherrypy
however doesn't allow for HTTP errors with status codes under 400.

Instead, return status code 422 ("Unprocessable Entity") when the tenant
isn't ready, which seems more appropriate: the configuration isn't
loaded so the query cannot be processed.

Change-Id: I41547df26a04698627c8f0697557e1e5039c0e1e
2023-11-22 14:59:26 +01:00
James E. Blair 00949554c9 Implement server-side filtering and pagination of config errors
In order to support pagination of config errors (so that when a user
does decide to look at the config errors page, we don't necessarily
need to transfer all of the data each time, implement server-side
filtering and pagination.

A later change can implement the same in the web ui.

Change-Id: I0c6cb8a10cd4d807ed92cad438ef592b1cdaf19b
2023-11-20 14:17:16 -08:00
Zuul 138b6a1379 Merge "Refactor bundle in sql connection" 2023-11-17 01:03:36 +00:00
Simon Westphahl 68d7a99cee
Send job parent data + artifacts via build request
With job parents that supply data we might end up updating the (secret)
parent data and artifacts of a job multiple times in addition to also
storing duplicate data as most of this information is part of the
parent's build result.

Instead we will collect the parent data and artifacts before scheduling
a build request and send it as part of the request paramters.

If those parameters are part of the build request the executor will use
them, otherwise it falls back on using the data from the job for
backward compatibility.

This change affects the behavior of job deduplication in that input data
from parent jobs is no longer considered when deciding if a job can be
deduplicated or not.

Change-Id: Ic4a85a57983d38f033cf63947a3b276c1ecc70dc
2023-11-15 07:24:52 +01:00
James E. Blair c3efd73b11 Tidy some auth exceptions
These two web auth exceptions produce output that's a little different
than the others, in that they duplicate the error and description
json fields.  Use an error field that looks like the other auth
errors (which are unique short strings derived from class names).

This lets clients concatenate the two and produce a reasonable
output.

Change-Id: I1703444c19bfa0a06e11c3c521b7f46b31053d7b
2023-10-25 13:08:46 -07:00
James E. Blair 18fb324f1e Add auth token to websocket
When making a websocket request, browsers do not send the
"Authorization" header.  Therefore if a Zuul tenant is run in
a configuration where authz is required for read-only access,
the websocket-based log streaming will always fail.

To correct this, we will remove the http request authz check
from the console-stream endpoint, and add an optional token
parameter to the websocket message payload.  The JS web app
will be responsible for sending the auth token in the payload,
and the web server will validate it if it is required for the
tenant.  Thanks to Andrei Dmitriev for this suggestion.

Since we essentially have two different authz code paths in
zuul-web now, in order to share as much code as possible, the
authz sequence is refactored in such a way that the final authz
check can be deferred.  First we create an AuthContext at the
start of the request which stores tenant and header information,
then the actual validation is performed in a separate step where
the token can optionally be provided.

In the http code path, we create the AuthContext and validate
immediately, using the Authorization header, and we do all of that
in the cherrypy tool at the start of the request.

In the websocket code path, we create the AuthContext as the
websocket handler is being created by the cherrypy request handler,
then we perform validation after receiving a message on the
websocket.  We use the token supplied from the request.

Error handling is adjusted so in the http code path, exceptions
that return appropriate http errors are raised, but in the
websocket path, these are caught and translated into websocket
close calls.

A related issue is that we perform no validation that the
streaming build log being requested belongs to the tenant via
which the request is being sent.  This was unecessary before
read-only access was an option, but now that it is, we should
check that a streaming build request arrives via the correct
tenant URL.  This change adjusts that as well.

During testing, it was noted that the tenant configuration syntax
allows admin-rules and access-rules to use the scalar-or-list
pattern, however some parts of the code assumed only lists.  The
configloader is updated to use scalar-or-list for both of those
values.

Change-Id: Ifd4c21bb1fe962bf23acb5b4f10b3bbaba61e63a
Co-Authored-By: Andrei Dmitriev <andrei.dmitriev@nokia.com>
2023-10-24 07:29:55 -07:00
James E. Blair 0a08299b5f Refactor bundle in sql connection
This refactors the sql connection to accomodate multiple
simulataneous changes in a buildset.

The change information is removed from the buildset table and
placed in a ref table.  Buildsets are associated with refs
many-to-many via the zuul_buildset_ref table.  Builds are also
associated with refs, many-to-one, so that we can support
multiple builds with the same job name in a buildset, but we
still know which change they are for.

In order to maintain a unique index in the new zuul_ref table (so that
we only have one entry for a given ref-like object (change, branch,
tag, ref)) we need to shorten the sha fields to 40 characters (to
accomodate mysql's index size limit) and also avoid nulls (to
accomodate postgres's inability to use null-safe comparison operators
on indexes).  So that we can continue to use change=None,
patchset=None, etc, values in Python, we add a sqlalchemy
TypeDectorator to coerce None to and from null-safe values such as 0
or the empty string.

Some previous schema migration tests inserted data with null projects,
which should never have actually happened, so these tests are updated
to be more realistic since the new data migration requires non-null
project fields.

The migration itself has been tested with a data set consisting of
about 3 million buildsets with 22 million builds.  The runtime on one
ssd-based test system in mysql is about 22 minutes and in postgres
about 8 minutes.

Change-Id: I21f3f3dfc8f93a23744856e5b82b3c948c118dc2
2023-10-19 17:42:09 -07:00
James E. Blair 4ebf9296f3 Add tenant_status web endpoint
The config error list is getting longer, for everyone, especially
now that we are including warnings.  To avoid loading the entirety
when it's not necessary, add an API endpoint that simply returns
the number of config errors.  This can later be used by the web app
to highlight the blue bell without actually fetching the errors.

This endpoint is named tenant_status so that if we find more details
like this that we want to add in the future, we can extend the returned
dictionary with them.

It is not added to the tenant info endpoint because that is
unauthenticated and should not leak information about the tenant.

Change-Id: Ie11eb26dc38e28922ddabbca39d89cda7e763d13
2023-09-22 07:10:55 -07:00
James E. Blair eb803984a0 Use tenant-level layout locks
The current "layout_lock" in the scheduler is really an "abide" lock.
We lock it every time we change something in the abide (including
tenant layouts).  The name is inherited from pre-multi-tenant Zuul.

This can cause some less-than-optimal behavior when we need to wait to
acquire the "layout_lock" for a tenant reconfiguration event in one
thread while another thread holds the same lock because it is
reloading the configuration for a different tenant.  Ideally we should
be able to have finer-grained tenant-level locking instead, allowing
for less time waiting to reconfigure.

The following sections describe the layout lock use prior to this
commit and how this commit adjusts the code to make it safe for
finer-grained locking.

1) Tenant iteration

The layout lock is used in some places (notably some cleanup methods)
to avoid having the tenant list change during the method.  However,
the configloader already performs an atomic replacement of the tenant
list making it safe for iteration.  This change adds a lock around
updates to the tenant list to prevent corruption if two threads update
it at the same time.

The semaphore cleanup method indirectly references the abide and
layout for use in global and local semaphores.  This is just for path
construction, and the semaphores exist apart from the abide and layout
configurations and so should not be affected by either changing while
the cleanup method is running.

The node request cleanup method could end up running with an outdated
layout objects, including pipelines, however it should not be a
problem if these orphaned objects end up refreshing data from ZK right
before they are removed.

In these cases, we can simply remove the layout lock.

2) Protecting the unparsed project branch cache

The config cache cleanup method uses the unparsed project branch cache
(that is, the in-memory cache of the contents of zuul config files) to
determine what the active projects are.

Within the configloader, the cache is updated and then also used while
loading tenant configuration.  The layout lock would have made sure
all of these actions were mutually exclusive.  In order to remove the
layout lock here, we need to make sure the Abide's
unparsed_project_branch_cache is safe for concurrent updates.

The unparsed_project_branch_cache attribute is a dictionary that
conains references to UnparsedBranchCache objects.  Previously, the
configloader would delete each UnparsedBranchCache object from the
dictionary, reinitialize it, then incrementially add to it.

This current process has a benign flaw.  The branch cache is cleared,
and then loaded with data based on the tenant project config (TPC)
currently being processed.  Because the cache is loaded based on data
from the TPC, it is really only valid for one tenant at a time despite
our intention that it be valid for the entire abide.  However, since
we do check whether it is valid for a given TPC, and then clear and
reload it if it is not, there is no error in data, merely an
incomplete utilization of the cache.

In order to make the cache safe for use by different tenants at the
same time, we address this problem (and effectively make it so that it
is also *effective* for different tenants, even at different times).
The cache is updated to store the ltime for each entry in the cache,
and also to store null entries (with ltimes) for files and paths that
have been checked but are not present in the project-cache.  This
means that at any given time we can determine whether the cache is
valid for a given TPC, and support multiple TPCs (i.e., multiple
tenants).

It's okay for the cache to be updated simultaneously by two tenants
since we don't allow the cache contents to go backwards in ltime.  The
cache will either have the data with at least the ltime required, or
if not, that particular tenant load will spawn cat jobs and update it.

3) Protecting Tenant Project Configs (TPCs)

The loadTPC method on the ConfigLoader would similarly clear the TPCs
for a tenant, then add them back.  This could be problematic for any
other thread which might be referencing or iterating over TPCs.  To
correct this, we take a similar approach of atomic replacement.

Because there are two flavors of TPCs (config and untrusted) and they
are stored in two separate dictionaries, in order to atomically update
a complete tenant at once, the storage hierarchy is restructured as
"tenant -> {config/untrusted} -> project" rather than
"{config/untrusted} -> tenant -> project".  A new class named
TenantTPCRegistry holds both flavors of TPCs for a given tenant, and
it is this object that is atomically replaced.

Now that these issues are dealt with, we can implement a tenant-level
thread lock that is used simply to ensure that two threads don't
update the configuration for the same tenant at the same time.

The scheduler's unparsed abide is updated in two places: upon full
reconfiguration, or when another scheduler has performed a full
reconfiguration and updated the copy in ZK.  To prevent these two
methods from performing the same update simultaneously, we add an
"unparsed_abide_lock" and mutually exclude them.

Change-Id: Ifba261b206db85611c16bab6157f8d1f4349535d
2023-08-24 17:32:25 -07:00
James E. Blair 1b042ba4ab Add job failure output detection regexes
This allows users to trigger the new early failure detection by
matching regexes in the streaming job output.

For example, if a unit test job outputs something sufficiently
unique on failure, one could write a regex that matches that and
triggers the early failure detection before the playbook completes.

For hour-long unit test jobs, this could save a considerable amount
of time.

Note that this adds the google-re2 library to the Ansible venvs.  It
has manylinux wheels available, so is easy to install with
zuul-manage-ansible.  In Zuul itself, we use the fb-re2 library which
requires compilation and is therefore more difficult to use with
zuul-manage-ansible.  Presumably using fb-re2 to validate the syntax
and then later actually using google-re2 to run the regexes is
sufficient.  We may want to switch Zuul to use google-re2 later for
consistency.

Change-Id: Ifc9454767385de4c96e6da6d6f41bcb936aa24cd
2023-08-21 16:41:21 -07:00
Simon Westphahl 3b011296e6 Keep task stdout/stderr separate in result object
Combining stdout/stderr in the result can lead to problems when e.g.
the stdout of a task is used as an input for another task.

This is also different from the normal Ansible behavior and can be
surprising and hard to debug for users.

The new behavior is configurable and off by default to retain backward
compatibility.

Change-Id: Icaced970650913f9632a8db75a5970a38d3a6bc4
Co-Authored-By: James E. Blair <jim@acmegating.com>
2023-08-17 16:22:41 -07:00
James E. Blair 9e0f2b5694 Fix setting autoholds through API with change supplied
The set autohold api endpoint incorrectly handled supplied values
such that if the user supplied a change without a ref it would always
use the default ref (.*).  This corrects the case handling and
adds tests.

Change-Id: I1ae14c327fd8fd2b866013d4d5078a9fbd85f843
2023-06-01 15:45:59 -07:00
James E. Blair 84e0e76e2f Add error information to config-errors API endpoint
This is the first in a series of changes to improve the usability
of the web view of config errors.  The end goal is to be able to
display them in a more structured manner.  A secondary goal is to
eventually add warnings (eg, deprecation warnings) which is
really only feasible if we have structured presentation of
errors.

This change does the following:

* Adds severity and error names to existing configuration errors
* And makes them available via the config-errors API endpoint
* Reduces the call sites for the error accumulator
  (LoadingErrors.addError)
* Unifies the calling convention for the accumulator
  (we stop passing in Exception objects)

Change-Id: Ia17dd3e7ad8cdfa8a07bb03b871078415d0c145e
2023-05-25 15:41:37 -07:00
Zuul e812ce6a3d Merge "Add missing event id to management events" 2023-05-22 12:07:51 +00:00
Simon Westphahl 711e1e5c98
Add missing event id to management events
The change management events via Zuul web and the command socket did not
have an event ID assigned. This made it harder to debug issues where we
need to find the logs related to a certain action.

Change-Id: I05ccbc13c7f906f91e13fb66e4a01a51fc822676
2023-04-14 08:27:29 +02:00
James E. Blair 84c0420792 Add statement timeouts to some web sql queries
The SQL queries are designed to be highly optimized and should return
in milliseconds even with millions of rows.  However, sometimes
query planners are misled by certain characteristics and can end
up performing suboptimally.

To protect the web server in case that happens, set a statement or
query timeout for the queries which list builds or buildsets.  This
will instruct mysql or postgresql to limit execution of the buildset
or build listing queries to 30 seconds -- but only if these queries
originate in zuul-web.  Other users (such as the admin tools) may
still run these queries without an explicit time limit (though the
server may still have one).

Unfortunately (or perhaps fortunately) the RDBMSs can occasionally
satisfy the queries we use in testing in less than 1ms, making a
functional test of this feature impractical (we are unable to set
the timeout to 0ms).

Change-Id: If2f01b33dc679ab7cf952a4fbf095a1f3b6e4faf
2023-03-13 14:57:29 -07:00
Clark Boylan 2747ea6f56 Fix DeprecationWarning: ssl.PROTOCOL_TLS is deprecated
Since python 3.10 ssl.PROTOCOL_TLS has been deprecated. We are expected
to use ssl.PROTOCOL_TLS_CLIENT and ssl.PROTOCOL_TLS_SERVER depending on
how the sockets are to be used. Switch over to these new constants to
avoid the DeprecationWarning.

One thing to note is that PROTOCOL_TLS_CLIENT has default behaviors
around cert verification and hostname checking. Zuul is already
explicitly setting those options the way it wants to and I've left that
alone to avoid trouble if the defaults change later.

Finally, this doesn't fix the occurence of this error that happens
within kazoo. A separate PR has been made upstream to kazoo and this
should be fixed in the next kazoo release.

Change-Id: Ib41640f1d33d60503066464c8c98f865a74f003a
2023-02-07 16:37:20 -08:00
James E. Blair fe04739c78 Reuse queue items after reconfiguration
When we reconfigure, we create new Pipeline objects, empty the values
in the PipelineState and then reload all the objects from ZK.  We then
re-enqueue all the QueueItems to adjust and correct the object
pointers between them (item_ahead and items_behind).  We can avoid
reloading all the objects from ZK if we keep queue items from the
previous layout and rely on the re-enqueue method correctly resetting
any relevant object pointers.

We already defer this re-enqueue work to the next pipeline processing
after a reconfiguration (so the reconfiguration itself doesn't take
very long, but now the first pipeline run after a reconfiguration must
perform a complete refresh).  With this change, that first refresh
is no longer be a complete refresh but a normal refresh, so we will get
the benefits of previous reductions in refresh times.

The main risk of this change is that it could introduce a memory leak.
During development, additional debugging was performed to verify that
after a re-enqueue, there are no obsolete layout or pipeline objects
reachable from the pipeline state object.

On schedulers where a re-enqueue does not take place (these schedulers
would simply see the layout update and re-create their PipelineState
python objects and refresh them after another scheduler has already
performed the re-enqueue), we need to ensure that we update any
internal references to Pipeline objects (which then lead to Layout
objects and can cause memory leaks).

To address that, we update the pipeline references in the ChangeQueue
instances underneath a given PipelineState when that state is being
reset after a reconfiguration.

This change also removes the pipeline reference from the QueueItem,
replacing it with a property that uses the pipeline reference
on the ChangeQueue instead.  This removes one extra place where
an incorrect reference could cause a memory leak.

Change-Id: I7fa99cd83a857216321f8d946fd42abd9ec427a3
2022-12-13 13:19:48 -08:00
James E. Blair 1245d100ca Refactor merge mode name lookup
This is repeated in a few places, centralize it.

Change-Id: I7bbed1f5f9faad31affa71ef17fbfc1740c54db8
2022-11-10 15:52:46 -08:00
James E. Blair 3a981b89a8 Parallelize some pipeline refresh ops
We may be able to speed up pipeline refreshes in cases where there
are large numbers of items or jobs/builds by parallelizing ZK reads.

Quick refresher: the ZK protocol is async, and kazoo uses a queue to
send operations to a single thread which manages IO.  We typically
call synchronous kazoo client methods which wait for the async result
before returning.  Since this is all thread-safe, we can attempt to
fill the kazoo pipe by having multiple threads call the synchronous
kazoo methods.  If kazoo is waiting on IO for an earlier call, it
will be able to start a later request simultaneously.

Quick aside: it would be difficult for us to use the async methods
directly since our overall code structure is still ordered and
effectively single threaded (we need to load a QueueItem before we
can load the BuildSet and the Builds, etc).

Thus it makes the most sense for us to retain our ordering by using
a ThreadPoolExecutor to run some operations in parallel.

This change parallelizes loading QueueItems within a ChangeQueue,
and also Builds/Jobs within a BuildSet.  These are the points in
a pipeline refresh tree which potentially have the largest number
of children and could benefit the most from the change, especially
if the ZK server has some measurable latency.

Change-Id: I0871cc05a2d13e4ddc4ac284bd67e5e3003200ad
2022-11-09 10:51:29 -08:00
Zuul 7606304159 Merge "Change merge mode default based on driver" 2022-10-27 02:25:37 +00:00
James E. Blair 9d2e1339ff Support authz for read-only web access
This updates the web UI to support the requirement for authn/z
for read-only access.

If authz is required for read access, we will automatically redirect.
If we return and still aren't authorized, we will display an
"Authorization required" page (rather than continuing and popping up
API error notifications).

The API methods are updated to send an authorization token whenever
one is present.

Change-Id: I31c13c943d05819b4122fcbcf2eaf41515c5b1d9
2022-10-25 20:22:42 -07:00
James E. Blair 95ec2c45e5 Set Access-Control-Allow-Origin headers in check_auth tool
Since we check authorization in every method except info now,
set the headers in the check_auth tool instead of the individual
methods; that way they are set even in the case of a 401.

Change-Id: I397180122e03915694ba6e59b4bd3a743120ee6e
2022-10-25 20:22:40 -07:00
James E. Blair c22f2c98e0 Add access-rules configuration and documentation
This allows configuration of read-only access rules, and corresponding
documentation.  It wraps every API method in an auth check (other than
info endpoints).

It exposes information in the info endpoints that the web UI can use
to decide whether it should send authentication information for all
requests.  A later change will update the web UI to use that.

Change-Id: I3985c3d0b9f831fd004b2bb010ab621c00486e05
2022-10-25 20:22:33 -07:00
James E. Blair 8c47d9ce4e Add api-root tenant config object
In order to allow for authenticated read-only access to zuul-web,
we need to be able to control the authz of the API root.  Currently,
we can only specify auth info for tenants.  But if we want to control
access to the tenant list itself, we need to be able to specify auth
rules.

To that end, add a new "api-root" tenant configuration object which,
like tenants themselves, will allow attaching authz rules to it.

We don't have any admin-level API endpoints at the root, so this change
does not add "admin-rules" to the api-root object, but if we do develop
those in the future, it could be added.

A later change will add "access-rules" to the api-root in order to
allow configuration of authenticated read-only access.

This change does add an "authentication-realm" to the api-root object
since that already exists for tenants and it will make sense to have
that in the future as well.  Currently the /info endpoint uses the
system default authentication realm, but this will override it if
set.

In general, the approach here is that the "api-root" object should
mirror the "tenant" object for all attributes that make sense.

Change-Id: I4efc6fbd64f266e7a10e101db3350837adce371f
2022-10-25 20:19:39 -07:00
James E. Blair 8f2dd91cbf Add check_auth tool to zuul-web
Authentication checking in the admin methods of zuul-web is very
duplicative.  Consolidate all of the auth checks into a cherrypy
tool that we can use to decorate methods.

This tool also anticipates that we will have read-only checks in
the future, but for now, it is still only used for admin checks.

This tool also populates some additional parameters (like tenant
and auth info) so that we don't need to call "getTenantOrRaise"
multiple times in a request.

Several methods performed HTTP method checks inside the method
which inhibits our ability to wrap an entire method with an
auth_check.  To resolve this, we now use method conditions on
the routes dispatcher.  As a convention, I have put the
options handling on the "GET" methods since they are most likely
to be universal.

Change-Id: Id815efd9337cbed621509bb0f914bdb552379bc7
2022-10-25 20:19:25 -07:00
Zuul 99959a3fa3 Merge "Simplify tenant_authorizatons check" 2022-10-26 02:21:25 +00:00
Zuul 75573b7aec Merge "Remove unused /api/user/authorizations REST endpoint" 2022-10-25 23:31:51 +00:00
Zuul dcc1c9194a Merge "Rename admin-rule to authorization-rule" 2022-10-25 23:31:48 +00:00
Zuul b70d8de85b Merge "Include skipped builds in database and web ui" 2022-10-25 04:10:18 +00:00
James E. Blair e2a472bc97 Change merge mode default based on driver
The default merge mode is 'merge-resolve' because it has been observed
that it more closely matches the behavior of jgit in Gerrit (or, at
least it did the last time we looked into this).  The other drivers
are unlikely to use jgit and more likely to use the default git
merge strategy.

This change allows the default to differ based on the driver, and
changes the default for all non-gerrit drivers to 'merge'.

The implementation anticipates that we may want to add more granularity
in the future, so the API accepts a project as an argument, and in
the future, drivers could provide a per-project default (which they
may obtain from the remote code review system).  That is not implemented
yet.

This adds some extra data to the /projects endpoint in the REST api.
It is currently not easy (and perhaps not possible) to determine what a
project's merge mode is through the api.  This change adds a metadata
field to the output which will show the resulting value computed from
all of the project stanzas.  The project stanzas themselves may have
null values for the merge modes now, so the web app now protects against
that.

Change-Id: I9ddb79988ca08aba4662cd82124bd91e49fd053c
2022-10-13 10:31:19 -07:00
James E. Blair 55ec721fa8 Simplify tenant_authorizatons check
This method iterates over all tenants but only needs to return
information about a single tenant.  Simplify the calculation for
efficiency.

This includes a change in behavior for unknown tenants.  Currently,
a request to /api/tenant/{name}/authorizations will always succeed
even if the tenant does not exist (it will return an authorization
entry indicating the user is not an admin of the unknown tenant).
This is unnecessary and confusing.  It will now return a 404 for
the unknown tenant.

In the updated unit test, tenant-two was an unknown tenant; its name
has been updated to 'unknown' to make that clear.

(Since the test asserted that data were returned either way, it is
unclear whether the original author of the unit test expected
tenant-two to be unknown or known.)

Change-Id: I545575fb73ef555b34c207f8a5f2e70935c049aa
2022-10-06 15:38:24 -07:00
James E. Blair 5e6dbf2001 Remove unused /api/user/authorizations REST endpoint
This has not beeen used for a while and can be removed.  This will
simplify the authorization code in zuul-web.

Change-Id: I0fa6c4fb87672c44d3f97db0be558737b4f102bc
2022-10-06 15:38:24 -07:00
James E. Blair 3a0eaa1ffe Rename admin-rule to authorization-rule
This is a preparatory step to add access-control for read-level
access to the API and web UI.  Because we will likely end up with
tenant config that looks like:

- tenant:
    name: example
    admin-rules: ['my-admin-rule']
    access-rules: ['my-read-only-rule']

It does not make sense for 'my-read-only-rule' to be defined as:

- admin-rule:
    name: read-only-rule

In other words, the current nomenclature conflates (new word:
nomenconflature) the idea of an abstract authorization rule and
what it authorizes.  The new name makes it more clear than an
authorization-rule can be used to authorize more than just admin
access.

Change-Id: I44da8060a804bc789720bd207c34d802a52b6975
2022-10-06 15:38:24 -07:00
James E. Blair 0738d31b08 Include skipped builds in database and web ui
We have had an on-and-off relationship with skipped builds in the
database.  Generally we have attempted to exclude them from the db,
but we have occasionally (accidentally?) included them.  The status
quo is that builds with a result of SKIPPED (as well as several
other results which don't come from the executor) are not recorded
in the database.

With a greater interest in being able to determine which jobs ran
or did not run for a change after the fact, this job deliberately
adds all builds (whether they touch an executor or not, whether
real or not) to the database.  This means than anything that could
potentially show up on the status page or in a code-review report
will be in the database, and can therefore be seen in the web UI.

It is still the case that we are not actually interested in seeing
a page full of SKIPPED builds when we visit the "Builds" tab in
the web ui (which is the principal reason we have not included them
in the database so far).  To address this, we set the default query
in the builds tab to exclude skipped builds (it is easy to add other
types of builds to exclude in the future if we wish).  If a user
then specifies a query filter to *include* specific results, we drop
the exclusion from the query string.  This allows for the expected
behavior of not showing SKIPPED by default, then as specific results
are added to the filter, we show only those, and if the user selects
that they want to see SKIPPED, they will then be included.

On the buildset page, we add a switch similar to the current "show
retried jobs" switch that selects whether skipped builds in a buildset
should be displayed (again, it hides them by default).

Change-Id: I1835965101299bc7a95c952e99f6b0b095def085
2022-10-06 13:28:02 -07:00
James E. Blair 1eda9ccf96 Correct exit routine in web, merger
Change I216b76d6aaf7ebd01fa8cca843f03fd7a3eea16d unified the
service stop sequence but omitted changes to zuul-web.  Update
zuul-web to match and make its sequence more robust.

Also remove unecessary sys.exit calls from the merger.

Change-Id: Ifdebc17878aa44d57996e4bdd46e49e6144b406b
2022-10-05 13:25:07 -07:00
Zuul ac9958ada5 Merge "Trace received Github events" 2022-10-04 03:34:14 +00:00
Simon Westphahl 7d52b98373
Trace received Github events
We'll create a span when zuul-web receives a Github webhook event which
is then linked to the span for the event pre-processing step.

The pre-processing span context will be added to the trigger events and
with Icd240712b86cc22e55fb67f6787a0974d5308043 complete tracing of the
whole chain from receiving a Github event until a change is enqueued.

Change-Id: I1734a3a9e44f0ae01f5ed3453f8218945c90db58
2022-09-30 09:50:37 +02:00
James E. Blair 06cfe2cacd Add semaphores to REST API
This adds information about semaphores to the REST API.

It allows for inspection of the known semaphores in a tenant, the
current number of jobs holding the semaphore, and information about
each holder iff that holder is in the current tenant.

Followup changes will add zuul-client and zuul-web support for the
API, along with docs and release notes.

Change-Id: I6ff57ca8db11add2429eefcc8b560abc9c074f4a
2022-09-07 14:28:12 -07:00