Commit Graph

10187 Commits

Author SHA1 Message Date
Monty Taylor c8edccc60a Remove iso8601
As of python 3.11, the stdlib datetime.fromisoformat supports all of
the iso formats out there. We were just using this lib to parse
inbound github strings, so we should be able to drop this.

Change-Id: Iff684dc1701298cd5ba760c8059a901ed0c5d522
2024-06-05 14:34:53 -07:00
Zuul bbe24cbf8c Merge "Sacrifice Ansible procs when OOM" 2024-06-05 10:28:18 +00:00
Zuul a8fe188e40 Merge "Add buildset events to buildset summary page" 2024-06-03 18:43:27 +00:00
Zuul a573f29b22 Merge "Fix change cache upgrade" 2024-06-03 17:00:27 +00:00
James E. Blair c0484c9d7c Sacrifice Ansible procs when OOM
When Linux runs out of memory and activates the OOM killer, it
scores processes based on how much memory they are using[1].  If
a job triggers an OOM by causing ansible-playbook to use a lot
of RAM, normally we would expect the OOM killer to kill Ansible.
However, if the executor is busy, it may be using a lot of RAM
as well, and its score may exceed the score of the smaller
Ansible process.  Nonetheless, we would still rather kill the
Ansible process.

This adjusts the score for the bubblewrap and ansible processes
so that they will have a score increased by an amount equal to
about 20% of system RAM.  This effectively means that as long
as the executor uses less than 20% of system RAM, it is guaranteed
to score lower than Ansible (and likely will continue to score
lower for some significant amount over that as well, depending
on how much RAM Ansible is using).

We read the executor's oom_score_adj when we initialize the bwrap
driver and add 200 to it in order to accomodate the situation where
the executor has its own oom_score_adj.  We always want the bwrap
children to have a higher score than the executor.

The choom program adjusts the OOM score for the command that it
executes, and this is inherited by child processes.  So we adjust
bwrap and expect ansible-playbook to inherit it.

It is also possible to adjust the score of the exeucotor process
lower (so the executor could be made less likely to be a target)
but that requires root privileges, so is not implemented in this
change.

[1] https://lxr.linux.no/#linux+v6.7.1/mm/oom_kill.c#L201

Change-Id: I3a3d116cf68b84b8a6f9ec13808d1d2c2008008f
2024-06-03 09:12:57 -07:00
Zuul 1dab1df57b Merge "fix(web): stop toggling builds on clicks" 2024-05-31 22:17:44 +00:00
James E. Blair 62a47ef78c Add buildset events to buildset summary page
This exposes the recently added buildset events on the buildset
summary page.

Change-Id: Icd22461961f991cb0f50a19427c2182c19902d27
2024-05-28 11:58:05 -07:00
James E. Blair 46e702de6c Fix change cache upgrade
The backwards-compat code for the recent updates to the change
cache did not handle some edge cases.  This corrects the issue
and adds a test for all of the possible combinations of queries
in the old change cache format.

Change-Id: I9521d877ff516ff31be2354951a2b8dc3f8c9671
2024-05-27 18:01:55 -07:00
Zuul 1c72b68bae Merge "Fix race in test_timer_multi_scheduler" 2024-05-24 09:13:30 +00:00
Zuul 1fc917cd66 Merge "Fix race in test_live_reconfiguration_command_socket" 2024-05-23 23:33:29 +00:00
James E. Blair d49c3a51dd Fix missing unsafe marker in new build_refs var
The recent change to add build_refs added a new place where we include
the commit message in a zuul variable, but did not mark it unsafe,
which means ansible may try to template it.  Commit messages frequently
have character sequences in them that look like jinja templates, so
for convenience, we mark them !unsafe to stop ansible from templating
them.

Correct the missing !unsafe marker, and add a test that verifies the
commit message does not appear in the vars unmarked.

Change-Id: I45ebed779884cab32718064f40716581854ea03a
2024-05-23 13:21:34 -07:00
James E. Blair 5d19cc3deb Fix race in test_timer_multi_scheduler
This test was failing to settle on a busy/slow system.  We don't
actually need to run the timer-triggered job in this test, we only
check the job registration.  So configure it to run less often so
that it is more likely that the test framework will be able to
settle.

(The only thing changed in the new test fixture is:

        - time: '* * * * * */1 1'
to
        - time: '* * * * * 1 1'
)

Change-Id: I8a69f3bfabf8faab32597e5304c3f681b15c95cd
2024-05-21 10:18:01 -07:00
James E. Blair 06e06a40ed Fix race in test_live_reconfiguration_command_socket
We check that the reconfiguration time has advanced, but it has
a resolution of 1 second, so make sure this test takes at least
one second.

Change-Id: Ie90358ca921a1e96c959ac2d8ee1ae6995b8f2fb
2024-05-21 10:02:57 -07:00
Zuul 00bd1a99cb Merge "Add image debug build option" 2024-05-20 20:12:33 +00:00
Zuul 0489801751 Merge "Add --disable-pipelines option" 2024-05-20 20:06:40 +00:00
Zuul 888015b16c Merge "Add support for excluding locked branches" 2024-05-20 20:01:36 +00:00
Zuul fbbce513ff Merge "Gitlab: make change.files persistent" 2024-05-17 21:29:40 +00:00
Zuul cac0e8bb5d Merge "Ignore /COMMIT_MSG in files matchers even more" 2024-05-17 21:16:34 +00:00
Zuul d4ddb1eb1f Merge "Support negated regexes in files/irrelevant-files" 2024-05-17 21:14:13 +00:00
Zuul 01608f1e94 Merge "Add buildset event db table" 2024-05-17 16:29:38 +00:00
Zuul cf0dfd34c6 Merge "Fix loop when api-root authentication is configured" 2024-05-15 14:55:00 +00:00
James E. Blair 762a96e571 Add buildset event db table
This adds a new database table, zuul_buildset_event, which stores
event information related to the buildset.  At the moment, the
only event information stored is a description of the trigger event
which originally caused the item to be enqueued.  Later changes may
add other events such as the reason the item was dequeued or
superceded, or the reason a buildset was canceled due to a gate reset.

A particular goal is to be able to inform users who see unsolicited
reports on their changes (which can happen due to a different change
in a dependency cycle being enqueued) understand why the report was
made.

The new event information is exposed via the rest api and a later
change will add it to the web ui.

Change-Id: I9bcbb64faa3d499a26b90d20932009b6ce226061
2024-05-14 17:57:54 -07:00
James E. Blair c592e7bcd5 Add support for excluding locked branches
This adds support for excluding locked (read-only) branches.  This is
currently only supported by the Github driver.

Change-Id: I360edeb04c9734189396e8c5ddbed17e7f7464a8
2024-05-14 10:53:31 -07:00
James E. Blair 750ff54717 Add a github graphql query for branch protection
The only way to find out if a branch is locked is by using a graphql
query for branch protection rules.  This change adds a such a query.
Because neither our graphql client nor our fake graphql server fully
implemented pagination, and we expect to routinely have more than 100
branches, this change also handles pagination.

And since this in graphql, it handles nested pagination.

The word "Fake" is removed from the fake graphql server classes because
graphene introspects them to determine their object type names.  Using
their real names makes the object types match which is now important
since one of our new queries specifies an object type.

The github connection class is updated to use this query for protected
branches.  Locking support will come in a future change.

Change-Id: I2633332396b79280984f0ebfa64a955d24fb7bae
2024-05-14 10:53:29 -07:00
James E. Blair 306f640672 Refactor branch cache to support more queries
The current branch cache is hyper-optimized to support exactly two
types of branch queries: all branches for a project, or unprotected
branches for a project.  GitHub provides another axis: "locked"
branches.  Cconceivably, other code review systems could as well, and
there may be even more axes in the future.

In order to support locked branches in a future change, we must first
refactor the branch cache to support more than two queries.  This
change implements that with the following scheme:

The branch cache will be a dictionary of project_name -> ProjectInfo,
and ProjectInfo will hold general information about the project such
as supported merge modes and default branch, as well as another
dictionary branch_name -> BranchInfo.  The BranchInfo record will hold
boolean flags indicating whether the branch is protected or locked.

Additionally, the project_info record will hold a set of which queries
for these flags have been performed and whether they were successful
or not.  This allows us to determine whether the branch_info flags
are valid or not.  For example, if we have only performed a query to
get all the branches, and a caller requests a list of protected
branches, we know that the BranchInfo.protected bool is not valid,
so we return a LookupEerror to the caller which will trigger another
query to get the protected branch list which will then be used to
update the branch cache, setting the protected bool to true where
appropriate on existing BranchInfo objects and setting the protected
query flag on ProjectInfo.  The result matches the current behavior,
but is extensible to support more flags.

In order to minimize the size of the branch cache in ZooKeeper, the
BranchInfo object is serialized as a simple integer with a bitmap of
the associated booleans.  Likewise, the several queries are stored in
the serialized ProjectInfo as two bitmaps (one for success, and one
for failed).

This change stubs out the "locked" flag and query in some places, just
to demonstrate sufficiency for future use, but it does not implement
support for locked branches yet.  A future change will do that.
As long as we don't actually add any locked branches, we can still
serialize to the old branch cache data structure, so this change does
so to enable rolling upgrades.  Tests of the upgrade path and
continued operation on only the old data path are included.

Change-Id: I8841e675295f15e5d6dd004f9e34836b8bbbdb63
2024-05-14 10:53:27 -07:00
James E. Blair f1eda4e94f Run the upgrade test job
This runs the recently added upgrade test job.

Change-Id: I3751a46eea20f1e0ec80ec39611720215fbb27b7
2024-05-14 10:53:24 -07:00
James E. Blair 8bf6add186 Add an upgrade test
This adds a framework for upgrade testing where we split a
functional test in half, running the first half on the previous
commit and the second half on the current commit.  This may allow
us to catch upgrade errors which are otherwise difficult to find
in tests because they require data generated by old/removed code.

This does not run the test, since it operates on the current and
prior commits, only a commit that follows this one can run the
job successfully.

Change-Id: I9d4d4af42fb1f684a88ec5a7e747b132423696f1
2024-05-14 10:53:22 -07:00
James E. Blair ea933f6b3f Make the test change database serializable
This adds methods to save and restore the test change database.
This can be used in a future change to perform an upgrade test
where we save and restore the fake gerrit (and other drivers)
state across an upgrade.

Some minor changes are made to FakeGerritChange to avoid storing
non-pickleable values.  Similarly, the Gerrit reporter was using
ZuulConfigKey objects as review categories.  That's fine since
they behave like strings everywhere we use them, but they aren't
pickleable, and we don't need the extra line context information
after loading the config, so we discard it and convert them to
strings.

Change-Id: Ifa404203e414932f50811291ad56a661f2875af0
2024-05-14 10:53:20 -07:00
James E. Blair 629f48e291 Move fake gerrit and pagure into dedicated files
This change is merely a reorganization to move the fake gerrit and
gitlab classes into their own files to match github and gitlab.

The Fake*Connection classes for all 4 drivers are also moved into
their respective files.  This is accomplished by moving some symbols
from base.py into a new tests/util.py to resolve the import cycle
(which is likely why they were not there in the first place).

Change-Id: I274b9e5abf6086656f8ceb5a16dab2f8393deead
2024-05-14 10:51:21 -07:00
Albin Vass a94768c645 Fix unbound variable in call to check_config_path
Change-Id: Ib7e90a5d5b3a2ed108d9ed7672b73974ea13048d
2024-05-08 18:20:27 +02:00
Dong Zhang f10ce32b23 Fix loop when api-root authentication is configured
When api-root authentication is configured, access of zuul/api/tenants
would fail with 401 before authentication is done. Before this fix,
however, the browser would continuesly sending the request before
it has a chance to redirect to the authenticate page, which results
in a endless loop in some browser (as tested in Firefox), or loops for
randomly a few seconds and finally redirected to the authentication
page (as tested in Chrome).

It is fixed that it calls fetchTenantsIfNeeded() only when authentication
is done or not needed (api-root authencation is not configured).

Change-Id: I2cc67c791d694f329cd48c09d81cdda452eff12c
2024-05-08 14:27:22 +02:00
Zuul 9fb90a4348 Merge "Check pre run failure cases with only 2 retry attempts" 2024-05-07 07:21:02 +00:00
Zuul 9ce5c5e471 Merge "Limit bytes when reading Ansible output" 2024-05-06 20:15:39 +00:00
Zuul 1198e3cb32 Merge "gerrit: Add `approval-change` trigger" 2024-05-06 19:32:55 +00:00
Clark Boylan 1b869c8237 Check pre run failure cases with only 2 retry attempts
test_pre_run_failure_retry has been hitting timeouts semi regularly.
This test was checking that after all retry attempts other things
continue on. Rather than increasing the timeout time to avoid timeouts
we reduce the number of retry attempts from 3 to 2 which should make
things run faster and within the timeout. This should be safe from a
testing perspective because we're still doing at least one retry which
ensures that previous code paths are still exercised.

Change-Id: If78c7cdfac63c30f9e52a1a5984a662ab969c2ee
2024-05-06 11:46:23 -07:00
Joshua Watt ffb615e6c7 gerrit: Add `approval-change` trigger
Adds a new type of trigger to the Gerrit driver that only triggers if
the approval value was changed by the user in the comment. This is
useful if Zuul is configured to allow many different scores to trigger a
pipeline (with an additional requirement on all of them), but arbitrary
comments made while the scores are present should _not_ trigger (or
potentially re-trigger) the pipeline. This can happen because Gerrit
sends all approvals by a user on all comments, regardless of if they
were changed by the comment.

The new `approval-change` trigger requirement inspects the `oldValue`
field in the Gerrit event. The pipeline will only trigger if this value
is present and not equal to the new approval value (thus, only when the
user actually changed it).

`oldValue` has been present since at least Gerrit 3.4

Change-Id: I88cf840ae8b4e63c77f10ee68b6901e85f7c5fb1
2024-05-03 15:39:46 -07:00
James E. Blair 47591f086d Ignore /COMMIT_MSG in files matchers even more
We have mostly managed to ignore the /COMMIT_MSG in files matchers
(because it is unintuitive that it would be considered), but the recent
change to allow negated regexes in irrelevant-files exposed the fact
that it can still have an effect in that case.  The new feature hasn't
been used yet, so we could silently correct this, but a very similar
construction would be possible with the deprecated style of regex with
a negative lookahead.  Just in case someone wrote that, this change
includes a release note letting them know they can drop the /COMMIT_MSG
from their regex.

Change-Id: Ide04ed01224b5c0c48ab8d3c15ea7aef324cc42d
2024-04-30 16:09:41 -07:00
James E. Blair 3589762367 Speed up merger git resets
The merger starts every operation by resetting the repository.
That means clearing out any failed previous merges and updating
and restoring the branch state to match the upstream origin
repo.

For repos with a very large number of branches (10k), this can take
some time (minutes).  This is mostly due to the inefficiency of
looking up the origin ref one at a time (gitpython reads the
packed-refs file for each lookup, ironically negating the benefit
of packed-refs).  To bypass this, use our previously developed
method for getting all the refs efficiently and do that once at
the start of the reset method.

Change-Id: If21245cd562c6499378c4f3353332d87c4ca4b47
2024-04-30 15:47:17 -07:00
Zuul 25cc922116 Merge "Fix issue with reopened PR dependencies" 2024-04-29 22:12:19 +00:00
Zuul e81f063df2 Merge "Replace status_url with item_url in pipeline reporter templates" 2024-04-29 22:02:22 +00:00
Zuul e12a98b905 Merge "Temporarily pin urllib3 != 2.1.0" 2024-04-29 21:37:31 +00:00
Zuul de6bd67e1f Merge "Add zuul.build_refs variable" 2024-04-29 21:37:27 +00:00
Simon Westphahl 0349628249 Fix issue with reopened PR dependencies
Given two PRs with B depending on A which are enqueued in gate, A is
closed and then immediately reopened.

This sequence of events will currently dequeue A and then immediately
enqueue it behind B. Since the check for whether a dependency is already
in the queue doesn't care if it's ahead or behind the current change,
we'll not dequeue B and the content of builds executed by B will not
include A.

This change updates the check to determine if a change is already in
the queue to only check for changes ahead of it.  This causes B to
be correctly dequeued in the next pipeline pass.

This behavior is correct, but isn't always intuitive or consistent.
If the time between closing and reopening a change is long enough for
a pipeline process, then both changes will be enqueued by the reopening
(because we check for changes needing enqueued changes and enqueue them
behind).  But if both events are processed in a single pipeline run,
then the removal of B happens after the re-enqueue of A which means that
it won't be re-added.

To correct this, whenever we remove abandoned changes, we will also remove
changes behind them that depend on the removed abandoned changes at the
same time.  This means that in our scenario above, the re-enqueue happens
under the same conditions as the original enqueue, and both A and B are
re-enqueued.

Co-Authored-By: James E. Blair <jim@acmegating.com>
Change-Id: Ia1d79bccb9ea39e486483283611601aa23903000
2024-04-26 14:20:07 -07:00
Christian von Schultz 5933704a6a Catch ZeroDivisionError when f_files=0
On BTRFS, f_files and f_ffree are always 0. For now, assume there is
no limit by setting files_percent_avail to 100%.

Change-Id: I53455e46101130596ae178a5933fe51ebaee206f
2024-04-26 17:56:34 +02:00
Simon Westphahl 6e163780e3 Temporarily pin urllib3 != 2.1.0
It looks like urllib3 version 2.1.0 causes problems when connecting to
Windows nodes.

Fixed in 2.2.0, but ibm-cos-sdk is preventing that from installing, so
exclude 2.1.0 for now.

https://github.com/urllib3/urllib3/pull/3326

Change-Id: I5d4a33c477d6872389c1d4197e926991b70f06ec
2024-04-24 16:11:20 -07:00
Zuul 2c2a2d61a5 Merge "Gerrit: skip ref-updated /meta events" 2024-04-24 19:50:03 +00:00
James E. Blair 7028745bbe Add --disable-pipelines option
This facilitaties the creation of a Zuul system with a running config
(that will be kept up to date as long as it receives events) but does
not run any jobs or make any reports.  This can be used in conjunction
with zuul-web to serve REST API requests for introspection, or to create
a standby Zuul system with a warmed config cache, or to support other
debugging techniques.

This change adds an extra assertion to the wait-for-init test since it
would be too similar otherwise.  It also adds some documentation for
wait-for-init (so that both similar options are documented) and support
for setting both options by environment variables for ease of use
in k8s environments.

Change-Id: I3ee83b08c8280066cfa0744f2e30e41edd0f364c
2024-04-23 09:42:39 -07:00
James E. Blair 1b9e2c0d83 Gitlab: make change.files persistent
Gitlab does not include files lists with merge request events, so
we always have Zuul perform that calculation itself.  But if a
merge request is updated after the fileschanges job completes, we
would overwrite our files list with None again.

Instead, recognize that for a given change+patchset, files are
immutable and don't update it if it's set already.

This matches the logic in Github, which, although it does provide
files with PR events, it only returns the first 300, so this logic
is needed in those cases.

Change-Id: I115c69b97e17cd4b01e3fa6c70140add6254283d
2024-04-22 13:34:31 -07:00
Zuul 4d6db3602f Merge "Update docs for job.dependencies and provides/requires" 2024-04-19 12:17:10 +00:00
James E. Blair 239fe205ec Gerrit: skip ref-updated /meta events
Approximately 40% of all Gerrit events that OpenDev processes are
ref-updated events for refs/changes/.../meta refs.  This is likely
due to the increased use of notedb for storing data in Gerrit.  Since
Zuul users are not likely to need to trigger off of ref-updates to
the meta ref, let's avoid enqueing them into Zuul's event queue.
This will reduce ZK traffic.

Change-Id: I724f5b20790d1ad32e72b1ce642355c2257026c1
2024-04-18 15:02:13 -07:00