The ML2 and OVN metadata agents have almost identical
code, as the former was copied to the latter and modified.
Instead, combine all the common parts and just have
each do any driver-specific operations separately.
Change-Id: Iff8bc8de16a8afc7c0195bf301d1b0643e17d7c6
In both neutron-metadata and neutron-ovn-metadata agents we should
ensure that haproxy service spawned for network/router is actually
active before moving on.
This patch adds that check and this is similar to what was already
implemented some time ago for the dnsmasq process spawned by the dhcp
agent.
Related-Bug: #2052787
Change-Id: Ic58640d89952fa03bd1059608ee6c9072fbaabf5
Both metadata agents (OVN and non-OVN) should handle
process exceptions when spawning haproxy processes
such that the agent can continue its operation for
other haproxy processes.
Closes-Bug: #2033305
Change-Id: I6da1b135c83ecfc41ec91e907ebf8500325a7a80
External processes, such as radvd, can refuse to start
and throw an exception such as:
"Unable to convert value in $pidfile"
because the given pidfile has more than one PID in it.
The situation can happen when the neutron node is reset
and the obsolete PID files are not cleaned before neutron
is started.
This commit adds PID file cleanup before external
process start.
Closes-bug: #2033980
Change-Id: Id62bf18067d0b144c3e8825c7603cc1e51dca052
A recent change suppressed the IPv6 DAD failure and
removed the address when multiple DHCP agents were
configured on the same network,
https://review.opendev.org/c/openstack/neutron/+/880957
But it also changed the behavior to not enable IPv4
metadata in this case. Restore the old behavior by
not returning early in the DAD failure case. The callback
that builds the config file was moved until after
the address was bound to make the two steps more obvious.
Related-bug: #1953165
Change-Id: I8436c6c9da9a2533ca27ff7312f5b2c7ea41e94f
Requests handled by the metadata-agents can now be rate-limited by
source-ip. This is done to protect the OpenStack control plane against
VMs querying the metadata endpoint in an overly enthusiastic way.
Co-authored-by: Miguel Lavalle <mlavalle@redhat.com>
Related-Bug: #1989199
Change-Id: I748ccfa8b50496dcbcbe41fd22f84249a4d46b11
IPv4 DAD is non-existent in Linux or its failure is silent, so we
never needed to catch and ignore it. On the other hand IPv6 DAD
failure is explicit, hence comes this change.
This of course leaves the metadata service dead on hosts where
duplicate address detection failed. But if we catch the
DADFailed exception and delete the address, at least other
functions of the dhcp-agent should not be affected.
With this the IPv6 isolated metadata service is not redundant, which
is the best we can do without a redesign.
Also document the promised service level of isolated metadata.
Added additional tests for the metadata driver as well.
Change-Id: I6b544c5528cb22e5e8846fc47dfb8b05f70f975c
Partial-Bug: #1953165
Added a new environment variable "PROCESS_TAG" in ``ProcessManager``.
This environment variable could be read by the process executed and
is unique per process. This environment variable can be used to tag
the running process; for example, a container manager can use this
tag to mark the a container.
This feature will be used by TripleO to identify the running containers
with a unique tag. This will make the "kill" process easier; it will
be needed just to find the container running with this tag.
Closes-Bug: #1991000
Change-Id: I234c661720a8b1ceadb5333181890806f79dc21a
Move common Exception class to one place. Move haproxy same
configuration to one place.
Partially-Implements: blueprint distributed-metadata-datapath
Change-Id: I3a0fc72da4520d6bc7193fb32a1bcf9a5585fbf4
Because of the fix for bug[1] and issue with linux_utils
get_process_count_by_name() L3 agent puts all it's HA ports down
during initialization phase. Unfortunately such operation can break
already working L3 communication. Rewiring ha-* port from down state to
up can takes few seconds and some VRRP packages could be lost then.
That triggers keepalived on other node so router HA state change
may be triggered.
This change prevents putting HA ports down when during initialization
phase L3 agent finds already configured own net namespaces. Existance
of such net namespace is a good proof that there is a network
configuration existing so host wasn't rebooted so most probably it is
just agent restart.
[1] https://bugs.launchpad.net/neutron/+bug/1597461
Closes-Bug: #1959151
Change-Id: Id9c906b2d141c3bedd80fb5f868190f8a4b66f54
This patch switches over to callback payloads for ROUTER
AFTER_CREATE, AFTER_UPDATE and AFTER_DELETE events.
Change-Id: Ie818ffbb1a291faa80501157b46ff6671d5c26ba
HAProxy supports hard stop [1] via SIGTERM signal. From the
documentation:
"""
... when the SIGTERM signal is sent to the haproxy process,
it immediately quits and all established connections are
closed.
"""
In case the process does not finish, the SIGKILL signal is sent.
The PID file created by the process is deleted.
[1]https://cbonte.github.io/haproxy-dconv/2.0/management.html#4
Closes-Bug: #1910691
Change-Id: Ifa3734e8eb4e52b1a132c3351ecc2e15463298bb
We push a v6 host route to make the guest send its metadata requests
in the direction of our router. We redirect it to haproxy which
mangles the headers and sends the request along to metadata-agent.
Apparently the supported list of dhcp options for dhcpv6 is quite
short in dnsmasq (cf. dnsmasq --help dhcp6) - not including anything
like classless-static-route for dhcpv4. So we must rely solely on
radvd to push host routes to the guest.
Metadata access over IPv6 is supposed to work both on dual-stack and
v6-only networks.
The following v6 subnet modes are supposed to work:
--ipv6-ra-mode slaac --ipv6-address-mode slaac
--ipv6-ra-mode dhcpv6-stateless --ipv6-address-mode dhcpv6-stateless
--ipv6-ra-mode dhcpv6-stateful --ipv6-address-mode dhcpv6-stateful
Change-Id: I28f2914b1b67659af2db7240eae730ac43daccd2
Partial-Bug: #1460177
Send IPv6 metadata traffic (dst=fe80::a9fe:a9fe) to the metadata-agent.
When running on IPv6 enabled system bind haproxy (i.e. the
metadata-proxy) to 169.254.169.254 and to fe80::a9fe:a9fe also.
We do not introduce new config options. The usual config options
(enable_isolated_metadata, force_metadata, enable_metadata_proxy)
now control the metadata service over both IPv4 and IPv6.
This change series only affects the guests' access to the metadata
service (over tenant networks). They change nothing about how the
metadata-agent talks to Nova's metadata service.
Metadata access over IPv6 is supposed to work both on dual-stack and
v6-only networks.
In order to enable the metadata service on pre-existing isolated
networks during an upgrade, this change makes each dhcp-agent restart
trigger a quick restart of dhcp-agent-controlled metadata-proxies,
so they can pick up their new config making them also bind to
fe80::a9fe:a9fe.
Change-Id: If35f00d1fc9e4ab7e232660362410ce7320c45ba
Partial-Bug: #1460177
Now that we are python3 only, we should move to using the built
in version of mock that supports all of our testing needs and
remove the dependency on the "mock" package.
This patch moves all references to "import mock" to
"from unittest import mock". It also cleans up some new line
inconsistency.
Fixed an inconsistency in the OVSBridge.deferred() definition
as it needs to also have an *args argument.
Fixed an issue where an l3-agent test was mocking
functools.partial, causing a python3.8 failure.
Unit tests only, removing from tests/base.py affects
functional tests which need additional work.
Change-Id: I40e8a8410840c3774c72ae1a8054574445d66ece
If a user specifies a header in their request for metadata,
it could override what the proxy would have inserted on their
behalf. Make sure to remove any headers we don't want, and
override something that might be present in the request.
If the agent somehow gets a request with both headers it will
silently drop it.
Change-Id: Id6c103b7bcebe441c27c6049d349d84ba7fd15a6
Closes-bug: #1865036
This patch switches the code over to use neutron-lib's test tools module
where appropriate rather than using neutron's.
This includes removing the following functions/classes from neutron and
using them from lib instead:
- get_random_EUI
- get_random_ip_network
- reset_random_seed
- OpenFixture
Change-Id: I0fbfcc7919f1b17b6bb0026fa9b98f157168255e
To fix bug 1722584 we inserted a checksum-fill rule for
metadata proxy replies. Recent kernels have disabled
this support for TCP because it was invalid, and
supposedly not doing anything, so let's get ahead of
things and remove the code.
Kernel mailing list discussion is at
https://lore.kernel.org/patchwork/patch/824819/
Partially reverts ed1c3b0217
Change-Id: Ib7cc8f82a91972f17987fb95130edc4069d9423f
Related-bug: #1722584
All of the externally consumed variables from neutron.common.constants
now live in neutron-lib. This patch removes neutron.common.constants
and switches all uses over to lib.
NeutronLibImpact
Depends-On: https://review.openstack.org/#/c/647836/
Change-Id: I3c2f28ecd18996a1cee1ae3af399166defe9da87
Currently the metadata proxy binds to default 0.0.0.0, which does not
add any advantage (metadata requests are not sent to random IP
addresses), and may allow access to cloud information from
third parties.
This changes the generated configuration to bind to METADATA_DEFAULT_IP
address instead.
This is not enabled in other metadata proxy configuration (in the L3
agent), as this would require net.ipv4.ip_nonlocal_bind everywhere
(currently only enabled for DVR) or transparent mode in haproxy (which
requires net.ipv4.ip_nonlocal_bind anyway)
Changed set_ip_nonlocal_bind_for_namespace() to support setting the
value in both the given and root namespace correctly, since it was
only used from inside the neutron codebase according to codesearch.
Change-Id: I388391cf697dade1a163d15ab568b33134f7b2d9
Co-Authored-By: Andrey Arapov <andrey.arapov@nixaid.com>
Closes-Bug: #1745618
It was added as temporary helper during migration process
and was marked to delete in Queens cycle.
Now we are in Rocky so I think we are fine to remove it
finally.
Change-Id: Iacf592841559d392b59864d507dc89ef028cbf05
By adding a log-tag line to the haproxy config file that contains
the network or router id, we will be able to differentiate which
proxy is logging what. This should help with debugging.
Change-Id: I5bb57b7682c00645e20cce69847dbb3b72165aa8
Partial-bug: #1744359
Move the iptables metadata marking rule earlier in
router init, that way any stray metadata requests
that arrive before the filter metadata redirect rule is
installed will just be dropped. We do this irregardless
of whether we will be running the metadata proxy.
Partial-bug: #1735724
Change-Id: I8982523dbb94a7c5b8a4db88a196fabc4dd2873f
Sometimes a proxied metadata reply can be dropped by
the hypervisor because of an invalid checksum. Always
fill-in the checksum just like we do for DHCP replies.
Change-Id: I46987da3bf05577ff0a51a490f26cf2be3c3c266
Closes-bug: #1722584
Without this commit, the run_as_root parameter is always True when
stopping a process, which leads to the usage of unnecessary sudo such as
in some functional tests, like the keepalived ones.
This commit fixes the aforemetioned problem by taking run_as_root into
account when stopping a process. However, run_as_root will still always
be True if the process is spawned in a netns.
Closes-Bug: #1491581
Change-Id: Ib40e1e3357b9a38e760f4e552bf615cdfd54ee5a
Signed-off-by: Hunt Xu <mhuntxu@gmail.com>
Refactoring Neutron configuration options for agent common config to be
in neutron/conf/agent/common. This will allow centralization of all
configuration options and provide an easy way to import.
Partial-Bug: #1563069
Change-Id: Iebac0cdd3bcfd0135349128921b7ad7a1a939ab8
Needed-By: Ib676003bbe909b5a9013a3178b12dbe291d936af
For a HA router, when it's updated, the l3 agents which are standby
always call the after_router_added method, then duplicate metadata
rules are added to iptables table. Althrough these rules will not be
applied to system because of the _weed_out_duplicates method, they will
grow linearly with router update operations.
Because these metadata rules are added once router is added to the agent
and will not be cleaned until router is removed, calling the add_rule
method in after_router_updated is a waste.
This patch removes adding metadata rules in after_router_updated.
Change-Id: I6650f1071499ed6cabd936bb0fb36b32a4b60bca
Closes-Bug: #1658460
Due to the high memory footprint of current Python ns-metadata-proxy,
it has to be replaced with a lighter process to avoid OOM conditions in
large environments.
This patch spawns haproxy through a process monitor using a pidfile.
This allows tracking the process and respawn it if necessary as it was
done before. Also, it implements an upgrade path which consists of
detecting any running Python instance of ns-metadata-proxy and
replacing them by haproxy. Therefore, upgrades will take place by
simply restarting neutron-l3-agent and neutron-dhcp-agent.
According to /proc/<pid>/smaps, memory footprint goes down from ~50MB
to ~1.5MB.
Also, haproxy is added to bindep in order to ensure that it's installed.
UpgradeImpact
Depends-On: I36a5531cacc21c0d4bb7f20d4bec6da65d04c262
Depends-On: Ia37368a7ff38ea48c683a7bad76f87697e194b04
Closes-Bug: #1524916
Change-Id: I5a75cc582dca48defafb440207d10e2f7b4f218b
agent object is a member of some sub classes of RouterInfo such as
HaRouter. This changeset makes it a member of the RouterInfo class
itself.
Prior to the change, the agent object has been passed in to some
methods of RouterInfo that requires it to access the agent object's
member information. The bugs in concern requires calling the PD object
that is a member of the agent object to get IPs that need to be
preserved in the gateway port. Without this change, signatures of the
methods external_gateway_added() and external_gateway_updated() have
to be modified to pass in the agent object. And any subclass of
RouterInfo that overwrites or uses the methods must be changed as
well. It doesn't seem to make sense considering the subclass such as
HaRouter has the agent object as one of its members already.
The changeset fixes the bugs by preserving the LLAs for prefix
delegation when the gateway port is being updated.
Closes-Bug: #1639042
Closes-Bug: #1640271
Change-Id: I61c6128ed1973deb8440c54234e77a66987d7e28
Refactoring neutron agent metadata config opts to be in
neutron/conf/agent/metadata so that all the configurations options
reside in a centralized location. This simplifies the process of looking
up the config opts and provides an easy way to import.
Change-Id: I8bae1facc58a4f9e21196f625478532403651545
Partial-Bug: #1563069
Refactoring l3 ha agent options to be in neutron/conf/agent/l3.
This would allow centralization of all configuration options and
provides an easy way to import.
Partial-Bug: #1563069
Change-Id: I2d6bd6beb0d1658baf88c49b954d2db3136e0c8d
This patch implements the callback handler for router update events;
This checks if the proxy process monitor is active, and if not, starts
the proxy.
This is particularly important if the metadata driver misses to receive
a create notification due to failures, which in turn generates an update
event because of a resync step.
Closes-bug: #1623732
Change-Id: I296a37daff1e5f018ae11eb8661c77ad346b8075
Refactoring neutron configuration options for l3 agent to be in
neutron/conf/agent/l3. This would allow centralization of all
configuration options in neutron/conf and provide an easy way to import.
Change-Id: Ie7533ea55eaa4d0f2c1919131a75f56e027c4d6e
Partial-Bug: #1563069
The option was deprecated a long time ago, and will be removed in one of
the next library releases, which will render neutron broken if we keep
using the option.
More details:
http://lists.openstack.org/pipermail/openstack-dev/2016-May/095166.html
Closes-Bug: #1586066
Change-Id: I884b4cc3ed04e4b5489e265c146666e04eb1bc27
This fixes the iptables rules generated by the L3 agent
(SNAT, DNAT, set-mark and metadata), and the DHCP agent
(checksum-fill) to match the format that will be returned
by iptables-save to prevent excessive extra replacement
work done by the iptables manager.
It also fixes the iptables test that was not passing the
expected arguments (-p PROTO -m PROTO) for block rules.
A simple test was added to the L3 agent to ensure that the
rules have converged during the normal lifecycle tests.
Closes-Bug: #1566007
Change-Id: I5e8e27cdbf0d0448011881614671efe53bb1b6a1
The use_namespaces option has been defined as a workaround to kernels
not properly supporting namespaces. This limitation is behind us, it's
time to remove use_namespaces after its deprecation in Kilo in order to
simplify code and remove a poorly tested case (use_namespaces=False).
This change prepares for removal pullup_route method[1] which was only
used when use_namespaces=False.
[1] neutron.agent.linux.ip_lib
DocImpact
UpgradeImpact
Closes-Bug: #1508188
Related-Bug: #1435382
Depends-On: I303038eec560a6d99421140c2822aed8b518470b
Depends-On: I4feb2a15c7e1e4bfdbed2531b18b8e7d798ab3cc
Change-Id: I2fbf65df1250d9f9f1656b3964ee3b6de1ef1118
Neutron[1] uses the option --metadata_proxy_watch_log=false to disable
log watch[2] in neutron-ns-metadata-proxy instances but should use the
option --nometadata_proxy_watch_log. It implies that
neutron-ns-metadata-proxy instances fail to start.
This changes updates neutron[1] to use the correct option.
The change also corrects associated functional tests[2], indeed
metadata_proxy_watch_log option has no effect if a log_file/dir is
defined for the agent running the neutron-ns-metadata-proxy.
[1] neutron.agent.common.config
[2] could be done by setting metadata_proxy_watch_log = false
[3] neutron.tests.functional.agent.test_l3_agent
Closes-Bug: #1490594
Change-Id: Iaec4a78847d802234c99514313440fd7c14bc554
Currently iptables rules set on L3 agent with metadata_proxy enabled
mark all packets coming from all interfaces including external interfaces.
This change updates PREROUTING rules from MANGLE table to mark packets
only from internal interfaces.
Change-Id: I01549df7b99be84cd46b6f97a5fd62aec1f43275
Closes-Bug: #1477553
Since a packet can only have one mark, and we will need to mark a
packet for multiple purposes, we need to use a coordinated bitmask for
the two cases of simple marking that we currently do in Neutron
leaving the other bits for address scopes.
DocImpact
Change-Id: Id0517758d06e036a36dc8b8772e41af55d986b4e
Partially-Implements: blueprint address-scopes
Removed use of contextlib.nested call from codebase, as it has been
deprecated since Python 2.7.
There are also known issues with contextlib.nested that were addressed
by the native support for multiple "with" variables. For instance, if
the first object is created but the second one throws an exception,
the first object's __exit__ is never called. For more information see
https://docs.python.org/2/library/contextlib.html#contextlib.nested
contextlib.nested is also not compatible with Python 3.
This is the first patch in a series for removing use of
contextlib.nested.
Added hacking check to catch if any new instances are added to
the codebase.
Line continuation markers (e.g. '\') had to be used or syntax
errors were thrown. While using parentheses is the preferred way
for multiple line statements, but in case of long with statements
backslashes are acceptable.
Partial-Bug: 1428424
Change-Id: I171fbdb89892a3d4548bf2ca52f4a7dd9ef8dccb
Currently metadata proxy cannot run with nobody user/group as metadata
proxy requires to connect to metadata_proxy_socket when queried.
This change allows to run metadata proxy with nobody user/group by
allowing to choose the metadata_proxy_socket mode with the new option
metadata_proxy_socket_mode (4 choices) in order to adapt socket
permissions to metadata proxy user/group.
This change refactors also where options are defined to enable
metadata_proxy_user/group options in the metadata agent.
In practice:
* if metadata_proxy_user is agent effective user or root, then:
* metadata proxy is allowed to use rootwrap (unsecure)
* set metadata_proxy_socket_mode = user (0o644)
* else if metadata_proxy_group is agent effective group, then:
* metadata proxy is not allowed to use rootwrap (secure)
* set metadata_proxy_socket_mode = group (0o664)
* set metadata_proxy_log_watch = false
* else:
* metadata proxy has lowest permissions (securest) but metadata proxy
socket can be opened by everyone
* set metadata_proxy_socket_mode = all (0o666)
* set metadata_proxy_log_watch = false
An alternative is to set metadata_proxy_socket_mode = deduce, in such
case metadata agent uses previous rules to choose the correct mode.
DocImpact
Closes-Bug: #1427228
Change-Id: I235a0cc4f0cbd55ae4ec1570daf2ebbb6a72441d
Regarding https://review.openstack.org/#/c/145829/
The old code of DnsMasq will always get root_helper from
neutron.agent.dhcp.agent.
However, new code will only set run_as_root when namespace
is used. That will cause permission error when namespace
is disabled and dnsmasq need to be started.
Change-Id: Ib00d6e54dba44dbbbec158b9e0518e6e42baceec
Closes-Bug: #1428007
Currently metadata proxy cannot run with nobody user/group as
metadata proxy (as other services) uses WatchedFileHandler handler to
log to file which does not support permissions drop (the process must
be able to r/w after permissions drop to "watch" the file).
This change allows to enable/disable log watch in metadata proxies with
the new option metadata_proxy_log_watch. It should be disabled when
metadata_proxy_user/group is not allowed to read/write metadata proxy
log files. Option default value is deduced from metadata_proxy_user:
* True if metadata_proxy_user is agent effective user id/name,
* False otherwise.
When log watch is disabled and logrotate is enabled on metadata proxy
logging files, 'copytruncate' logrotate option must be used otherwise
metadata proxy logs will be lost after the first log rotation.
DocImpact
Change-Id: I40a7bd82a2c60d9198312fdb52e3010c60db3511
Partial-Bug: #1427228
The L3 agent gets keepalived state change notifications via
a unix domain socket. These events are now batched and
send out as a single RPC to the server. In case the same
router got updated multiple times during the batch period,
only the latest state is sent.
Partially-Implements: blueprint report-ha-router-master
Change-Id: I36834ad3d9e8a49a702f01acc29c7c38f2d48833