Commit Graph

1255 Commits

Author SHA1 Message Date
Simon Westphahl 8223a336af
Don't consider type of dependencies to process
So far, when getting parent jobs recursively we've kept a set of tuples
with the job UUID and the dependency type (hard/soft) that needed to be
processed.

This worked as long as all relevant jobs used the same type of
dependency to a job. However, when hard/soft dependencies are mixed this
could lead to a job being processed twice. Zuul wrongly detected this as
a job dependency cycle.

This problem also lead to exceptions in pipeline processing as some
schedulers detected a dependency cycle whereas other did not. The reason
here is the non-deterministic odering of the set data type.

Since we don't need the information about the dependency type, we'll
just keep a set of job UUIDs to be processed, which prevents Zuul from
processing a job with a different type of dependency twice.

Note: The provided regression test did not fail consistently due to the
set non-determinism mentioned above.

Change-Id: I6459cbc90bf745ddda2da9dbc25bcee97005d933
2024-04-15 14:54:54 +02:00
Clark Boylan 505253bed2 Handle artifacts created by non change refs
Artifacts processing assumed it was always operating on Change objects
as speculative artifact state only makes sense for speculative Changes.
However, this was not enforced and we would attempt to process artifacts
for refs/branches/tag/etc leading to this exception:

  Traceback (most recent call last):
    File "/usr/local/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1065, in _executeJobs
      self.sched.executor.execute(
    File "/usr/local/lib/python3.11/site-packages/zuul/executor/client.py", line 63, in execute
      params = zuul.executor.common.construct_build_params(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/zuul/executor/common.py", line 97, in construct_build_params
      ) = item.getJobParentData(job)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/zuul/model.py", line 5431, in getJobParentData
      artifact_data = job.artifact_data or self.getArtifactData(job)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/zuul/model.py", line 5414, in getArtifactData
      self.providesRequirements(job, data)
    File "/usr/local/lib/python3.11/site-packages/zuul/model.py", line 5377, in providesRequirements
      'change': change.number,
                ^^^^^^^^^^^^^
  AttributeError: 'Tag' object has no attribute 'number'

Address this by checking that the object type is Change before we handle
artifacts for it. If it isn't a Change we report that there are no
satisfied provides/requires for this dependency by the currently
processed queue item.

Change-Id: I5ba7f30e56fc890847b86ee776e74272ec04046d
2024-04-08 09:00:44 -07:00
Zuul b2c0d69cbc Merge "Add a zuul.buildset_refs variable" 2024-03-25 10:09:16 +00:00
Zuul b0a7ed2899 Merge "Attempt to preserve triggering event across re-enqueues" 2024-03-25 10:09:13 +00:00
Zuul 51ec0d9159 Merge "Use the triggering change as the zuul change" 2024-03-25 09:56:06 +00:00
James E. Blair 632839804c Add a zuul.buildset_refs variable
This adds information about the changes associated with a
circular dependency queue item.  Currently the bundle_id can be
used to identify which of the items in zuul.items is related to
the current dependency cycle.  That variable is deprecated, so
zuul.buildset_refs can be used to replace that functionality.

Since it repeats some of the information at the top level (eg
zuul.change, zuul.project, etc), the code is refactored so they
can share the dictionary construction.  That is also used by
zuul.items.  This results in a few extra fields in zuul.items
now, such as the change message, but that is relatively
inconsequential, so is not called out in the release notes.

The src_dir is similarly included in all of these places.  In
writing this change it was discovered that
zuul.items.project.src_dir always used the golang scheme, but
zuul.project.src_dir used the correct per-job workspace scheme.
This has been corrected so that they both use the per-job scheme
now.

A significant reorganization of the job variable documentation is
included.  Previously we had a section with additional variables
for each item type, but since many of these are duplicated at the
top level, in the item list, and now in the refs list, that
structure became difficult to work with.  Instead, the
documentation for each of these sections now exhaustively lists
all of the possible variables.  This makes for some repitition,
but it also means that a user can see at a glance what variables
are available, and we can deep-link to each one.  To address the
variation between different item types, the variables that mutate
based on item type now contain a definition list indicating what
types they are valid for and their respective meanings.

Change-Id: Iab8f99d4c4f40c44d630120c458539060cc725b5
2024-03-22 06:41:36 -07:00
James E. Blair 1350ce8ad6 Use NodesetNotFoundError class
This error exception class went unused, likely due to complications
from circular imports.

To resolve this, move all of the configuration error exceptions
into the exceptions.py file so they can be imported in both
model.py and configloader.py.

Change-Id: I19b0f078f4d215a2e14c2c7ed893ab225d1e1084
2024-03-18 15:03:58 -07:00
James E. Blair 6ccbdacdf2 Attempt to preserve triggering event across re-enqueues
When a dependency cycle is updated, we will re-enqueue the changes
in the cycle so that each of the changes goes thorugh the process
of being added to the queue with the updated contents of the cycle.
That may mean omitting changes from the cycle, or adding new ones,
or even splitting into two cycles.

In that case, in order to preserve the idea of the
"triggering change", carry over the triggering change from when the
cycle was originally enqueued.

Note that this now exposes us to the novel idea that the triggering
change may not be part of the queue item.

Change-Id: I9e00009040f91d7edc31f4928e632edde4b2745f
2024-03-13 13:07:08 -07:00
James E. Blair 802c5a8ca6 Use the triggering change as the zuul change
In the case of circular dependencies with job deduplication, we
arbitrarily pick one of the changes as the zuul change (to use
when setting the zuul.change job variable and friends).  In
theory, it shouldn't matter which change we use, but in practice,
users may be surprised if it is something other than the triggering
change. Since it doesn't really matter to Zuul, let's set the zuul
change to the triggering change when possible.  It still needs to
be one of the changes for the job, so if the triggering change
itself doesn't actually run the job (easily possible if the job is
only run on dependent changes), then we will fall back to the
current behavior.  And of course the change must be one of the
item's changes, so in the case of linear dependencies, we're not
going to start setting it to some other queue item's change.

If we are unable to set it to the triggering change, then the
behavior remains undefined beyond setting it to one of the job's
changes arbitrarily.

Included in this change is a cleanup of a no-longer-needed api
migration from 12->13 related to EventInfo objects that was
missed due to a missing MODEL_API tag.

Information about the triggering change is added to the EventInfo
object to implement this feature.

Because the fallback behavior and the model upgrade behavior are
the same, we don't need to add any conditional api behavior or
upgrade testing -- in both cases we will simply use the current
behavior.

Change-Id: Iee5a7d975fea1f7491b652c406c24d73ada7a1a1
2024-03-13 13:07:08 -07:00
Zuul 5e7c2f2ef6 Merge "Add job name back to node request data" 2024-03-11 10:12:16 +00:00
Zuul 2354b8a631 Merge "Only use latest proposed config for project-branch" 2024-03-11 10:09:13 +00:00
Simon Westphahl e41af7c312
Add job name back to node request data
With the circular dependency refactoring we also removed the job name
from the requestor data in the node request. However, this could
previously be used as part of the dynamic-tags in Nodepool which might
be useful for billing and cost calculations.

Add back the job name so those use-cases start working again.

Change-Id: Ie3be39819bf84d05a7427cd0e859f485de90835d
2024-03-07 08:02:30 +01:00
Zuul a56c9c0ea9 Merge "Produce consistent merge commit shas" 2024-03-06 09:47:14 +00:00
Simon Westphahl c52a059821
Revise decision to not deduplicate noop jobs
Since circular dependencies are now modelled as multiple changes in a
single item there is no good reason anymore not to deduplicate noop
jobs.

So far the noop job was listed separately for each change that was part
of a cycle and with that cluttered the UI, especially for large
dependency cycles.

Change-Id: Ic8d447df7d9040a767c88127ba361badc9ef016a
2024-03-04 15:35:18 +01:00
Simon Westphahl 1b3649e754
Only use latest proposed config for project-branch
When an item updates config we will schedule a merge for the proposed
change and its dependencies.

The merger will return a list of config files for each merged change.
The scheduler upon receiving the merge result will combine the collected
config files for a project-branch from all involved changes.

This lead to the problem that the old content of renamed config files
were still used when building the dynamic layout.

Since the config we receive from the merger is always exhaustive, we
just need to keep the latest config files.

Another (or additional) fix would be to only return the latest config
files for a project-branch from the mergers. However, in case of
circular dependencies it could make sense in the future to get the
update config per change to report errors more precisely.

Change-Id: Iebf49a9efe193788199197bf7846e336d96edf19
2024-03-01 14:42:33 +01:00
James E. Blair 3e4caaac4b Produce consistent merge commit shas
Use a fixed timestamp and merge message so that zuul mergers
produce the exact same commit sha each time they perform a merge
for a queue item.  This can help correlate git repo states for
different jobs in the same change as well as across different
changes in the case of a dependent change series.

The timestamp used is the "configuration time" of the queue item
(ie, the time the buildset was created or reset).  This means
that it will change on gate resets (which could be useful for
distinguishing one run of a build from another).

Change-Id: I3379b19d77badbe2a2ec8347ddacc50a2551e505
2024-02-26 16:32:46 -08:00
James E. Blair 1bec2014bc Remove updateJobParentData method
This method was added as part of the initial deduplication work in
959a0b9834.  Since we now collect
parent data at the time that we run the job, this method doesn't
actually do anything other than decide when jobs are ready to run.

This change moves that logic back into the findJobsToRun method
and removes the unecessary updateJobParentData method.

Change-Id: Iac744a24ee3902360eeaef371808657a8eeb2080
2024-02-09 10:19:08 -08:00
James E. Blair fa274fcf25 Remove most model_api backwards-compat handling
Since we require deleting the state (except config cache) for the
circular dependency upgrade, we can now remove any zookeeper
backwards compatability handling not in the config cache.  This
change does so.  It leaves upgrade handling for two attributes
in the branch cache (even though they are old) because it is
plausible that even systems that have been upgrading regularly
may have non-upgraded data in the branch cache for infrequently
used branches, and the cost for retaining the upgrade handling
is small.

Change-Id: I881578159b06c8c7b38a9fa0980aa0626dddad31
2024-02-09 07:39:55 -08:00
James E. Blair 1cc276687a Change status json to use "refs" instead of "changes"
This is mostly an internal API change to replace the use of the
word "change" with "ref" in the status json.  This matches the
database and build/buildsets records.

Change-Id: Id468d16d6deb0af3d1c0f74beb1b25630455b8f9
2024-02-09 07:39:52 -08:00
James E. Blair 1f026bd49c Finish circular dependency refactor
This change completes the circular dependency refactor.

The principal change is that queue items may now include
more than one change simultaneously in the case of circular
dependencies.

In dependent pipelines, the two-phase reporting process is
simplified because it happens during processing of a single
item.

In independent pipelines, non-live items are still used for
linear depnedencies, but multi-change items are used for
circular dependencies.

Previously changes were enqueued recursively and then
bundles were made out of the resulting items.  Since we now
need to enqueue entire cycles in one queue item, the
dependency graph generation is performed at the start of
enqueing the first change in a cycle.

Some tests exercise situations where Zuul is processing
events for old patchsets of changes.  The new change query
sequence mentioned in the previous paragraph necessitates
more accurate information about out-of-date patchsets than
the previous sequence, therefore the Gerrit driver has been
updated to query and return more data about non-current
patchsets.

This change is not backwards compatible with the existing
ZK schema, and will require Zuul systems delete all pipeline
states during the upgrade.  A later change will implement
a helper command for this.

All backwards compatability handling for the last several
model_api versions which were added to prepare for this
upgrade have been removed.  In general, all model data
structures involving frozen jobs are now indexed by the
frozen job's uuid and no longer include the job name since
a job name no longer uniquely identifies a job in a buildset
(either the uuid or the (job name, change) tuple must be
used to identify it).

Job deduplication is simplified and now only needs to
consider jobs within the same buildset.

The fake github driver had a bug (fakegithub.py line 694) where
it did not correctly increment the check run counter, so our
tests that verified that we closed out obsolete check runs
when re-enqueing were not valid.  This has been corrected, and
in doing so, has necessitated some changes around quiet dequeing
when we re-enqueue a change.

The reporting in several drivers has been updated to support
reporting information about multiple changes in a queue item.

Change-Id: I0b9e4d3f9936b1e66a08142fc36866269dc287f1
Depends-On: https://review.opendev.org/907627
2024-02-09 07:39:40 -08:00
Simon Westphahl 5b75d407b5
Fix bug in legacy code for getting parent jobs
ERROR zuul.Scheduler: Exception in pipeline processing:
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.11/site-packages/zuul/scheduler.py", line 2362, in _process_pipeline
    while not self._stopped and pipeline.manager.processQueue():
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1882, in processQueue
    item_changed, nnfi = self._processOneItem(
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1740, in _processOneItem
    ready = self.prepareItem(item)
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1625, in prepareItem
    while item.deduplicateJobs(log):
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 5889, in deduplicateJobs
    self.updateJobParentData()
  File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 5793, in updateJobParentData
    for parent_job in job_graph.getParentJobsRecursively(job):
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 3845, in getParentJobsRecursively
    self._legacyGetParentJobNamesRecursively(
  File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 3883, in _legacyGetParentJobNamesRecursively
    current_job.name)
    ^^^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'name'

Change-Id: Icf0e719ba50d2e69833d39cd04a67e4b9ac93686
2024-01-22 08:18:06 +01:00
Simon Westphahl 80d57497db
Ignore soft dependencies when getting parent jobs
This fixes a bug in model API <= 20 legacy implementation to get parent
jobs recursively. This leads to pipeline processing to be aborted when
hitting this edge case.

Traceback (most recent call last):
  File "/opt/zuul/lib/python3.11/site-packages/zuul/scheduler.py", line 2362, in _process_pipeline
    while not self._stopped and pipeline.manager.processQueue():
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1882, in processQueue
    item_changed, nnfi = self._processOneItem(
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1740, in _processOneItem
    ready = self.prepareItem(item)
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1625, in prepareItem
    while item.deduplicateJobs(log):
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 5886, in deduplicateJobs
    self.updateJobParentData()
  File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 5790, in updateJobParentData
    for parent_job in job_graph.getParentJobsRecursively(job):
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 3845, in getParentJobsRecursively
    self._legacyGetParentJobNamesRecursively(
  File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 3891, in _legacyGetParentJobNamesRecursively
    raise Exception("Job %s depends on %s which was not run." %
Exception: Job child depends on soft-parent which was not run.

Change-Id: Ia60e7f95fbde775fcc8dfe7d60f720753473d6e3
2024-01-17 11:45:15 +01:00
Zuul d12ec11321 Merge "Improve support for web enqueue/dequeue" 2024-01-14 16:31:05 +00:00
James E. Blair dc49013bb6 Remove unused variable in job graph
This is a minor code cleanup.

Change-Id: Ic5e57118d1ec5db112f5bebbcca1b3481b546c94
2024-01-10 16:10:55 -08:00
James E. Blair 663931cba3 Include job_uuid in BuildRequests/Events
This is part of the circular dependency refactor.

It updates BuildRequests to include the job uuid (similar to the
previous change with NodeRequests).  We also use the job uuid when
sending BuildResultEvents.

Change-Id: If7a55138d0cb5a358c62e8cd97ee322087b09a6b
2024-01-08 08:54:56 -08:00
James E. Blair 7262ef7f6f Include job_uuid in NodeRequests
This is part of the circular dependency refactor.  It updates the
NodeRequest object to include the job_uuid in addition to the job_name
(which is temporarily kept for backwards compatability).  When node
requests are completed, we now look up the job by uuid if supplied.

Change-Id: I57d4ab6c241b03f76f80346b5567600e1692947a
2023-12-20 10:44:04 -08:00
James E. Blair a9ff6b3410 Improve support for web enqueue/dequeue
This change:

* Returns the build/buildset oldrev through the REST API (this
  field was missing).
* Updates the web UI so that when enqueuing or dequeueing a ref it will
  send exactly the oldrev/newrev values it received, including None/null.
* No longer translate None to 40*'0' when creating internal management
  events.

In concert, these changes allow a user to re-enqueue exactly as
originally enqueued buildsets for branch tips (periodic pipeline) as
well as ref updates (tag/post pipelines).

Additionally, the re-enqueue method in the web UI is updated to support
re-enqueing tag and branch heads (it only worked on change and
ref-updates before).

Finally, the buildset page is updated to show the old and new revs
if they are non-null.

Change-Id: I9886cd44f8b4bae6f4a5ce3644f0598a73ecfe0a
2023-12-14 10:18:33 -08:00
James E. Blair 9201f9ee28 Store builds on buildset by uuid
This is part of the circular dependency refactor.

This updates the buildset object in memory (and zk) to store builds
indexed by frozen job uuid rather than job name.  This also updates
everal related fields and also temporary dictionaries to do the same.

This will allow us, in the future, to have more than one job/build
in a buildset with the same name (for different changes/refs).

Change-Id: I70865ec8d70fb9105633f0d03ba7c7e3e6cd147d
2023-12-12 11:58:21 -08:00
James E. Blair cb3c4883f2 Index job map by uuid
This is part of the circular dependency refactor.  It changes the
job map (a dictionary shared by the BuildSet and JobGraph classes
(BuildSet.jobs is JobGraph._job_map -- this is because JobGraph
is really just a class to encapsulate some logic for BuildSet))
to be indexed by FrozenJob.uuid instead of job name.  This helps
prepare for supporting multiple jobs with the same name in a
buildset.

Change-Id: Ie17dcf2dd0d086bd18bb3471592e32dcbb8b8bda
2023-12-12 10:22:25 -08:00
James E. Blair 071c48c5ae Freeze job dependencies with job graph
This is part of the circular dependency refactor.

Update the job graph to record job dependencies when it is frozen,
and store these dependencies by uuid.  This means our dependency
graph points to actual frozen jobs rather than mere job names.

This is a pre-requisite to being able to disambiguate dependencies
later when a queue item supports multiple jobs with the same name.

The behavior where we would try to unwind an addition to the job
graph if it failed is removed.  This was originally written with the
idea that we would try to run as many jobs as possible if there was
a config error.  That was pre-zuul-v3 behavior.  Long since, in all
cases when we actually encounter an error adding to the job graph,
we bail and report that to the user.  No longer handling that
case simplifies the code somewhat and makes it more future-proof
(while complicating one of the tests that relied on that behavior
as a shortcut).

This attempts to handle upgrades by emulating the old behavior
if a job graph was created on an older model version.  Since it
relies on frozen job uuids, it also attempts to handle the case
where a frozenjob does not have a uuid (which is a very recent
model change and likely to end up in the same upgrade for some
users) by emulating the old behavior.

Change-Id: I0070a07fcb5af950651404fa8ae66ea18c6ca006
2023-12-06 16:41:18 -08:00
James E. Blair e6856c8f98 Further fixes to check deduplication
This fixes a similar case as I86c159c82b858e67433bdaa1e479471b21ea8b86
but at a different point in the pipeline lifecycle.

In both cases, a queue item in check has completed and been
dequeued, while a second item that deduplicated one of its jobs
remains.

We store both the node request and nodeset information on the buildset
object.  When builds are finished, we clear the node request from the
buildset object.  Depending on timing, we may have information about the
nodeset while we may or may not have the request id.

We use the presence of nodeset information to decide whether to run a
job, but we pass the node request to the executor for it to lock the
nodeset.

This patch handles the case where we have deduplicated nodeset information
but have already cleared the node request id.  In that case, we would
attempt to run the job a second time while passing in None as the
node request id.

To correct this, if we learn that we are unable to restore information
about a deduplicated build, clear all of the nodeset and node request
information about that job from the secondary buildset so that when it
attempts to re-run the build, it starts from a clean slate.

This case may be a superset of the other case which means we won't hit
the failsafe code added in I86c159c82b858e67433bdaa1e479471b21ea8b86.
That should be fine, and we will remove code related to both cases once
the circular dependency refactor is complete.

Change-Id: I95511763236d5a99c14477989d573bd5c1726b76
2023-11-21 15:05:58 -08:00
Zuul 7fae80f574 Merge "Fix flaky deduplication test" 2023-11-20 15:34:01 +00:00
Zuul 138b6a1379 Merge "Refactor bundle in sql connection" 2023-11-17 01:03:36 +00:00
Simon Westphahl 68d7a99cee
Send job parent data + artifacts via build request
With job parents that supply data we might end up updating the (secret)
parent data and artifacts of a job multiple times in addition to also
storing duplicate data as most of this information is part of the
parent's build result.

Instead we will collect the parent data and artifacts before scheduling
a build request and send it as part of the request paramters.

If those parameters are part of the build request the executor will use
them, otherwise it falls back on using the data from the job for
backward compatibility.

This change affects the behavior of job deduplication in that input data
from parent jobs is no longer considered when deciding if a job can be
deduplicated or not.

Change-Id: Ic4a85a57983d38f033cf63947a3b276c1ecc70dc
2023-11-15 07:24:52 +01:00
James E. Blair f63a580012 Fix flaky deduplication test
The test test_job_deduplication_child_of_diff_parent has been flaky
because the following sequence can happen (some steps omitted):

* Two changes enqueued with a deduplicated job
* Pipeline processor runs and issues node request for deduplicated build
* Pipeline processor runs again and processes events related to
  first change
* Node request completes (after processing started processing first change)
* Pipeline processor continues and processes second change
* Pipeline processor attempts to start job for deduplicated build
  (an exception is raised)

The processor should ignore the completed node request on the second change
and allow the next pass to handle it when processing the first change.

To ensure this, we set the in-memory-only "ready to run" flag on the job
to False for any deduplicated job.  This way we will only start builds
when processing their actual queue item.

Because of the sequencing of this race condition (needing to return a build
completed event between processing two queue items in the same pipeline run),
it is impractical to add a test.

Change-Id: I8aa2a8adab273304bf518ca3def429dda7f8c544
2023-11-14 15:53:43 -08:00
Simon Westphahl 93b4f71d8e
Store frozen jobs using UUID instead of name
Change the frozen job storage in ZK from being identified by name to
UUID. This allows us to handle multiple frozen jobs in a buildset with
the same name.

The job graph will get a new field that is a sorted list of job UUIDs
with the index being the same as for the job name list. Jobs that are
created with the old name-based path will have their UUID set to None.

This is done in preparation of the circular dependency refactoring as
detailed in I8c754689ef73ae20fd97ac34ffc75c983d4797b0.

Change-Id: Ic4df16e8e1ec6908234ecdf91fe08408182d05bb
2023-11-10 07:24:35 +01:00
Zuul a0a53ef49f Merge "Refactor configuration error handling" 2023-11-08 15:54:06 +00:00
Zuul f291e52850 Merge "Reduce config error context" 2023-11-08 15:50:58 +00:00
Zuul 952733340b Merge "Don't schedule initial merge for branch/ref items" 2023-11-08 13:59:01 +00:00
Simon Westphahl e4d07a77cb
Check for None values when comparing ZuulMark
ERROR zuul.Pipeline.foobar.check: [e: ...] Error in dynamic layout
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1230, in _loadDynamicLayout
    relevant_errors, severity_error = self._findRelevantErrors(
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1128, in _findRelevantErrors
    if ((err.key not in parent_error_keys) or
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 275, in __eq__
    self.mark == other.mark and
    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 170, in __eq__
    return (self.line == other.line and
                         ^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'line'

Change-Id: I75c465bab1c8de6b1d48919272c5889987db4450
2023-10-31 11:16:15 +01:00
James E. Blair 0ab44e153c Refactor configuration error handling
This refactors the the config error handling based on nested
context managers that add increasing amounts of information
about error locations.  In other words, when we start processing
a Job object, we will:

  with ParseContext.errorContext(info about job):
    do some stuff
    with ParseContext.errorContext(info about job attr):
      do some stuff regarding the job attribute
    with ParseContext.errorContext(next job attr):
      do stuff with a different attr

We store a stack of error contexts on the parse context, and at any
point we can access the accumulator for the most recent one with
ParseContext.accumulator in order to add a warning or error.  If we
have an attribute line number, we'll use it, otherwise we'll just
use the object-level information.

We also collapse the exception hnadlers into a single context
manager which catches exceptions and adds them to the accumulator.
This lets us decide when to catch an exception and skip to the next
phase of processing separately from where we narrow our focus to
a new object or attribute.  These two actions often happen together,
but not always.

This should serve to simplify the configloader code and make it
easier to have consistent error handling within.

Change-Id: I180f9b271acd4b62039003fa8b38900f9863bad8
2023-10-30 16:19:45 -07:00
Simon Westphahl 6c6872841b
Don't schedule initial merge for branch/ref items
Currently we schedule a merge/repo-state for every item that is added to
a pipeline. For changes and tags we need the initial merge in order to
build a dynamic layout or to determine if a given job variant on a
branch should match for a tag.

For other change-types (branches/refs) we don't need the initial
merge/repo-state before we can freeze the job graph. The overhead of
those operations can become quite substantial for projects with a lot of
branches that also have a periodic pipeline config, but only want to
execute jobs for a small subset of those branches.

With this change, branch/ref changes that don't execute any jobs will
be removed without triggering any merge/repo state requests.

In addition we will reduce the number of merge requests for branch/ref
changes as the initial merge is skipped in all cases.

Change-Id: I157ed52dba8f4e197b35798217b23ec7f035b2d9
2023-10-27 12:20:57 +02:00
James E. Blair a29b7e75af Reduce config error context
We include the yaml text of the entire configuration object in
many of our configuration error reports.  This can be quite large
which is not always helpful to the user, and can cause operational
problems with large numbers of errors of large objects.

To address that, let's only include a few lines.  But let's also
try to make those lines useful by centering the actual attribute
that caused the error in our snippet.

To do this, we need to keep the line number of all of our configuration
attribute keys.  This is accomplished by subclassing str and adding
a "line" attribute so that it behaves exactly like a string most
of the time, but when we go to assemble a configuration error, we can
check to see if we have a line number, and if so, zoom in on that line.

Example:

Zuul encountered a deprecated syntax while parsing its configuration
in the repo org/common-config on branch master.  The problem was:

  All regular expressions must conform to RE2 syntax, but an
  expression using the deprecated Perl-style syntax has been detected.
  Adjust the configuration to conform to RE2 syntax.

  The RE2 syntax error is: invalid perl operator: (?!

The problem appears in the "branches" attribute
of the "org/project" project stanza:

  ...
      name: org/project
      check:
        jobs:
          - check-job:
              branches: ^(?!invalid).*$
      gate:
        jobs:
          - check-job
      post:
  ...

  in "org/common-config/zuul.yaml@master", line 84

This reduction is currently only performed for deprecation warnings but
can be extended to the rest of the configuration errors in subsequent
changes.

Change-Id: I9d9cb286dacee86d54ea854df48d7a4dd37f5f12
2023-10-26 08:50:20 -07:00
Simon Westphahl 810191b60e
Select correct merge method for Github
Starting with Github Enterprise 3.8[0] and github.com from September
2022 on[1], the merge strategy changed from using merge-recursive to
merge-ort[0].

The merge-ort strategy is available in the Git client since version
2.33.0 and became the default in 2.34.0[2].

If not configured otherwise, we've so far used the default merge
strategy of the Git client (which varies depending on the client
version). With this change, we are now explicitly choosing the default
merge strategy based on the Github version. This way, we can reduce
errors resulting from the use of different merge strategies in Zuul and
Github.

Since the newly added merge strategies must be understood by the mergers
we also need to bump the model API version.

[0] https://docs.github.com/en/enterprise-server@3.8/admin/release-notes
[1] https://github.blog/changelog/2022-09-12-merge-commits-now-created-using-the-merge-ort-strategy/
[2] https://git-scm.com/docs/merge-strategies#Documentation/merge-strategies.txt-recursive

Change-Id: I354a76fa8985426312344818320980c67171d774
2023-10-24 07:15:39 +02:00
James E. Blair 0a08299b5f Refactor bundle in sql connection
This refactors the sql connection to accomodate multiple
simulataneous changes in a buildset.

The change information is removed from the buildset table and
placed in a ref table.  Buildsets are associated with refs
many-to-many via the zuul_buildset_ref table.  Builds are also
associated with refs, many-to-one, so that we can support
multiple builds with the same job name in a buildset, but we
still know which change they are for.

In order to maintain a unique index in the new zuul_ref table (so that
we only have one entry for a given ref-like object (change, branch,
tag, ref)) we need to shorten the sha fields to 40 characters (to
accomodate mysql's index size limit) and also avoid nulls (to
accomodate postgres's inability to use null-safe comparison operators
on indexes).  So that we can continue to use change=None,
patchset=None, etc, values in Python, we add a sqlalchemy
TypeDectorator to coerce None to and from null-safe values such as 0
or the empty string.

Some previous schema migration tests inserted data with null projects,
which should never have actually happened, so these tests are updated
to be more realistic since the new data migration requires non-null
project fields.

The migration itself has been tested with a data set consisting of
about 3 million buildsets with 22 million builds.  The runtime on one
ssd-based test system in mysql is about 22 minutes and in postgres
about 8 minutes.

Change-Id: I21f3f3dfc8f93a23744856e5b82b3c948c118dc2
2023-10-19 17:42:09 -07:00
James E. Blair 1a226acbd0 Emit stats for more build results
We currently typically only emit build stats for success and failure,
but do not emit stats for NODE_FAILURE, CANCELED, SKIPPED, etc.

To round out the feature so that all build results are emitted, this
includes a small refactor to report build stats at the same time we
report build results to the database (it almost all cases).  One
exception to that is when we receive a non-current build result --
that generally happens after a build is canceled, so we don't need
to emit the stats again.

Change-Id: I3bdf4e2643e151e2eae9f204f62cdc106df876b4
2023-09-26 11:32:03 -07:00
Zuul a75d640b8e Merge "Add a bundle-changed safety check" 2023-09-19 17:05:03 +00:00
Zuul d44b9875b0 Merge "Fix deduplicating child jobs in check" 2023-09-15 22:22:02 +00:00
Zuul 5294c582b1 Merge "Fix deduplication of child jobs in check" 2023-09-15 18:24:14 +00:00
Zuul 2d76e35129 Merge "Add more deduplication tests" 2023-09-15 18:17:40 +00:00