When neutron-server is down, ovs-agent waits for it to become available
during agent startup. When neutron-server is up, but it cannot reach the
DB, it can do nothing pretty much the same way. However ovs-agent
reacted differently to this failure. With this patch it reacts the same
way and delays its startup until neutron-server is up together with its
DB.
Change-Id: Ia55e82540aedc236e9b016bb58047d0b437eeb99
Closes-Bug: #2025341
Now that we use setproctitle for neutron-server workers (and
neutron-keepalived-state-change), this has the side effect of changing
the process name for agents, impacting some monitoring systems. More
details in launchpad bug.
This patch fixes it by setting the name with setproctitle to:
agent name (original process name).
Also use the newly introduced name constants to replace existing
hardcoded uses.
Change-Id: I74c3a4d3e9f833752571a75f196560cd45529385
Closes-Bug: #1881297
Now that we are python3 only, we should move to using the built
in version of mock that supports all of our testing needs and
remove the dependency on the "mock" package.
This patch moves all references to "import mock" to
"from unittest import mock". It also cleans up some new line
inconsistency.
Fixed an inconsistency in the OVSBridge.deferred() definition
as it needs to also have an *args argument.
Fixed an issue where an l3-agent test was mocking
functools.partial, causing a python3.8 failure.
Unit tests only, removing from tests/base.py affects
functional tests which need additional work.
Change-Id: I40e8a8410840c3774c72ae1a8054574445d66ece
When an agent reports the state, the timestamp is sent along with the
agent status. This timestamp now is logged if "log_agent_heartbeats" is
activated.
Change-Id: Ifc88dfb3041aa07b197f395172b69399796ba46a
Related-Bug: #1799555
This patch switches callbacks over to the payload object style events
for AGENT AFTER_CREATE and AFTER_UPDATE based notifications. To do
so a DBEventPayload object is used with the publish() method to
pass along the API related data.
Change-Id: Ibefa495be41c91957c2e8d797130e569bccc3765
This reverts commit a75014792a.
This is the second attempt to merge the patch after the previous one resulted
in revert due to multiple gate breakages in dependent projects (neutron-lbaas,
vmware-nsx, heat, networking-odl). This second attempt is validated with a set
of depends-on patches for all projects that were affected during the first
failed attempt.
The original commit message for the patch is included below for context.
===
Listen for foreign key changes and expire related relationships.
With this, we can remove OVO code that refreshes / detaches models on
each fetch. The patch also removes a bunch of expunge calls in plugin
code.
writer.using context manager is added to _get_subnets so that segment
plugin's _notify_subnet_updated handler that calls to _get_subnets
doesn't use the facade-less context.session that in specific cases may
cache old models from previous sessions when used in mixed
facade/facade-less environment.
This patch bumps SQLAlchemy minimal requirement to >= 1.2.0 because
pending_to_persistent event didn't exist before this version. It could be >=
1.1.0 if not for the fact that all 1.1.x releases have a bug that results in
breakage of test_update_with_none_and_own_mac_for_duplicate_ip due to obscure
import ordering issue in the library.
(The issue is fixed by https://github.com/zzzeek/sqlalchemy/commit/
63ff0140705207198545e3a0d7868a5ba8486e93)
Partially-Implements: blueprint enginefacade-switch
Partially-Implements: blueprint adopt-oslo-versioned-objects-for-db
Co-Authored-By: Michael Bayer <mike_mp@zzzcomputing.com>
Depends-On: If4b28110f460f6ac77ace1bbb02967ea986d4cab
Depends-On: I9f1e76cb24838533572b5fbe269ff96a24ce4af1
Change-Id: I0d65d19204da8ce30addfa5faff68544534b7853
Listen for foreign key changes and expire related relationships.
With this, we can remove OVO code that refreshes / detaches models on
each fetch. The patch also removes a bunch of expunge calls in plugin
code.
writer.using context manager is added to _get_subnets so that segment
plugin's _notify_subnet_updated handler that calls to _get_subnets
doesn't use the facade-less context.session that in specific cases may
cache old models from previous sessions when used in mixed
facade/facade-less environment.
This patch bumps SQLAlchemy minimal requirement to >= 1.2.0 because
pending_to_persistent event didn't exist before this version. It could be >=
1.1.0 if not for the fact that all 1.1.x releases have a bug that results in
breakage of test_update_with_none_and_own_mac_for_duplicate_ip due to obscure
import ordering issue in the library.
(The issue is fixed by https://github.com/zzzeek/sqlalchemy/commit/
63ff0140705207198545e3a0d7868a5ba8486e93)
Partially-Implements: blueprint enginefacade-switch
Partially-Implements: blueprint adopt-oslo-versioned-objects-for-db
Co-Authored-By: Michael Bayer <mike_mp@zzzcomputing.com>
Change-Id: I18c6794f99d2847c208dfd6e9eb187d53b657a05
Agent object has been merged [1].
This patch uses Agent object in agents_db and test_agents_db.
We also introduce a new function (get_agents_object) and keep
the old function (get_agents_db) for backward compatibility.
[1] https://review.openstack.org/#/c/297887/
Co-Authored-By: Nguyen Phuong An <AnNP@vn.fujitsu.com>
Change-Id: I4c4283cb1aa05d52dca00cc249e094ea7d55b1d3
Partially-Implements: blueprint adopt-oslo-versioned-objects-for-db
In reviews we usually check import grouping but it is boring.
By using flake8-import-order plugin, we can avoid this.
It enforces loose checking so it sounds good to use it.
This flake8 plugin is already used in tempest.
Note that flake8-import-order version is pinned to avoid unexpected
breakage of pep8 job.
Setup for unit tests of hacking rules is tweaked to disable
flake8-import-order checks. This extension assumes an actual file exists
and causes hacking rule unit tests.
Change-Id: Ib51bd97dc4394ef2b46d4dbb7fb36a9aa9f8fe3d
This patch set is for breaking the circular dependency between
Agent/AgentVersionedObject.
See:https://review.openstack.org/#/c/297887/ for details.
Change-Id: I7be4ce2513e49e6da46a7bdffb8538613f0be7c7
Partial-Bug: #1597913
Co-Authored-By: Victor Morales <victor.morales@intel.com>
Co-Authored-By: Sindhu Devale <sindhu.devale@intel.com>
The previous version depended on the AgentDbMixin to be loaded by
any plugin, and also introduced an __init__ on the mixin which
was problematic: mixins are expected to be classes which add methods
to another class, but to implement no constructor. One of the plugins
had one of the elements of MRO not calling to super().__init__ and
hence not triggering this __init__ method.
This change requires the plugins using the rpc callback mechanism
to provide the AgentDbMixin which is used to refresh cache of known
resource consumers (agents) and versions on demand, this way
we make it more clear that the rpc_callback api is currently designed
to be used with agents only, despite of its extensibility to other
areas.
Change-Id: Ie96b52dbe3a1f32cd4c11de8d8a5eff663fbf7f6
Related-Bug: #1584204
Since mitaka, agents can send a new report about the resource versions
they know about, and subscribe via rpc callback push mechanisms.
Some agents don't depend on versioned objects via push, and therefore
don't need to update neutron-server about such details, anyway server
side was complaining when that dictionary was missing on the state
report.
This patch avoids the warning log for missing 'resource_versions'
field in the agent status report.
Change-Id: Ief5186871515a5700afb56ac5e3fe493b4a05e8e
Closes-Bug: 1571544
resource_versions were included into agent state reports recently to
support rolling upgrades (commit 97a272a892)
The downside is that it brought additional processing when handling state
reports on server side: update of local resources versions cache and
more seriously rpc casts to all other servers to do the same.
All this led to a visible performance degradation at scale with hundreds
of agents constantly sending reports. Under load (rally test) agents
may start "blinking" which makes cluster very unstable.
In fact there is no need to send and update resource_versions in each state
report. I see two cases when it should be done:
1) agent was restarted (after it was upgraded);
2) agent revived - which means that server was not receiving or being able
to process state reports for some time (agent_down_time). During that
time agent might be upgraded and restarted.
So this patch makes agents include resource_versions info only on startup.
After agent revival server itself will update version_manager with
resource_versions taken from agent DB record - this is to avoid
version_manager being outdated.
Closes-Bug: #1567497
Change-Id: I47a9869801f4e8f8af2a656749166b6fb49bcd3b
Python 3 deprecated the logger.warn method, see:
https://docs.python.org/3/library/logging.html#logging.warning
so we prefer to use warning to avoid DeprecationWarning.
Closes-Bugs: #1529913
Change-Id: Icc01ce5fbd10880440cf75a2e0833394783464a0
Co-Authored-By: Gary Kotton <gkotton@vmware.com>
This is the second patch to allow upgrades on RPC versioned
objects callbacks.
This enables resource version notifications from agents to all
neutron servers via fanout for updating the version sets in
memory, and via agent status updates for DB storage, so any
neutron server can retrieve such information at boot.
Closes-Bug: #1535247
Change-Id: I67c1323267aaf7e49f4a359ff50b94e52dba4380
In addition to periodic checks of L3 and DHCP agents
add periodic checks of overall health of registered agents.
Log total count of agents at debug level so it can be
seen in logs of neutron-server.
In case some agents found dead - log detailed info about them:
Type of agent, last heartbeat, host.
Change-Id: I5db81dad4e9e8325ad3fa3a3e6d5d2d0deb297dd
Closes-Bug: #1453320
Neutron doesn't have a way to test a newly added network node
by deploying test resource before any customer resource on the node
is deployed. Nova and Cinder has the setting of “enable_new_services”
in each conf to disable the initial service status to achieve this.
This proposal adds enable_new_agents config.
DocImpact
Change-Id: Ie0d0b2dd4d95de95f3839d1c35f24b708e893801
Implements: blueprint enable-new-agents
Related-Bug: 1472076
When troubleshooting problems with cluster it would be
very convenient to have information about agent heartbeats
logged with some searchable identifier which could create
1-to-1 mapping between events in agent's logs and server's logs.
Currently agent's heartbeats are not logged at all on server side.
Since on a large cluster that could create too much logging
(even for troubleshooting cases), it might make sense to make
this configurable both on neutron-server side and on agent-side.
DocImpact
Change-Id: I0a127ef274a84bba5de47395d47b62f48bd4be16
Closes-Bug: #1452582
This change ensures that the structure of the unit test tree matches
that of the code tree to make it obvious where to find tests for a
given module. A check is added to the pep8 job to protect against
regressions.
The plugin test paths are relocated to neutron/tests/unit/plugins
but are otherwise ignored for now.
Change-Id: If307593259139171be21a71c58e3a34bf148cc7f
Partial-Bug: #1440834