In some fullstack tests it is expected that agent is DOWN in the Neutron
DB. It could happen sometimes that in almost the same time test's client
was doing GET /v2.0/agents/{agent_id} call and got result with
"alive=False" and in other thread rpc worker was processing heartbeat
from the agent so it was revived just after API request was finished.
That was causing test failures in some cases.
This patch adds second API call to get agent again after 2 seconds if it
was already marked as DEAD, just to make sure that it is really dead ;)
Closes-Bug: #2045757
Change-Id: I1c20c90b8abd760f3a53b24024f19ef2bd189b5a
Until now neutron fullstack tests which were run by the same worker were
using same DB but after test content of the DB was cleaned.
This could cause problems e.g. for default security group rules which
weren't created properly in second test run by the same worker.
To fix that issue patch [1] was proposed and merged some time ago. But
this didn't solve the problem so this patch is effectively reverting [1]
and proposing another solution which will make each fullstack test to
use own DB and run db migration script.
As running DB migration before every test makes this jobs to run a bit
longer than it took before, this patch also increases timeout for the
fullstack job(s) to 3h (10800 seconds).
[1] https://review.opendev.org/c/openstack/neutron/+/891040
Related-bug: #1983053
Change-Id: Ia261b4c62db9a99ef6eb161acb4609520e45d101
In fullstack tests like test_connectivity or test_securitygroup using
dhcp agent isn't really needed. We are testing dhcp agents and
configuration of the IP addresses using DHCP in the tests from the
test_dhcp_agent module.
So this patch disables usage of the dhcp agent where it's not really
needed to save some resources (less agents running during tests) and to
minimize potential failures in the tests.
Additionally this patch also adds method
block_until_all_dhcp_config_done() to the FakeFullstackMachinesList
class and uses it in tests when fake VMs are using DHCP. That will
hopefully help better understand where the issue is in case of failures
like described in the related bug.
Related-Bug: #1962854
Change-Id: Ib6ca18b5a0ae101ad6424637abff3d992737f6f4
This patch adds new fullstack test which spawns 2 "hosts" and 2 "VMs" on
those hosts. Both VMs are plugged to the vlan network with some
segmentation id. Next segmentation id of the network is updated and test
ensures that new vlan id is configured in the physical bridge on both
"hosts" and connectivity between VMs still works fine.
Test runs only with Openvswitch agents as Linuxbridge doesn't supports
live update of the segmentation_id in the network.
Change-Id: I459aac7f4e9afe679d8ece1c27d0be49cec8e4ff
_find_available_ips tried to find available ips based on the given
subnet's cidr field, which can be misleading if random selection goes
out-of allocation-pool. This patch changes this behaviour to use
cidr's allocation_pool field.
Closes-Bug: #1850292
Change-Id: Ied2ffb5ed58007789b0f5157731687dc2e0b9bb1
Instead of retrieving an IP address of the CIDR network in
ascending order, this patch randomizes the IP address selection.
Change-Id: I5971d078bd68c3c0b104eea0d443511e741540c4
Related-Bug: #1808595
This patch switches the code over to use neutron-lib's test tools module
where appropriate rather than using neutron's.
This includes removing the following functions/classes from neutron and
using them from lib instead:
- get_random_EUI
- get_random_ip_network
- reset_random_seed
- OpenFixture
Change-Id: I0fbfcc7919f1b17b6bb0026fa9b98f157168255e
This option is deprecated and marked to be deleted in Ocata. So
as we are now in Stein development cycle I think that it's good time
to remove it.
Change-Id: I07474713206c218710544ad98c08caaa37dbf53a
Unfortunately it still sometimes fails because restart was still happened in
very short pause between agents.
I will need to figure out some other possible solution for that issue.
This reverts commit bdd3540554.
Change-Id: Iaf9d1be3255e941c5fe227943535ab7c6905253c
In fullstack test
test_l3_agent.test_ha_router_restart_agents_no_packet_lost
restarts of L3 agents where done in 2 steps:
1. restart of all standby agents,
2. restart of all active agents.
It was done like that because of bug [1] and [2].
Now when those bugs are fixed, lets change this test to
some "more probable" scenario. So agents will be restarted
without checking which one is master and which is standby.
However agents will be restarted one by one instead of doing
restarts in (almost) exactly same time.
Restarting all agents in same time caused still some issue
on my local testing environment but I suspect that it might be
some problem related to the nature of fullstack tests and to the
fact that 2 different "nodes" are in fact simulated by namespaces only.
[1] https://bugs.launchpad.net/neutron/+bug/1776459
[2] https://bugs.launchpad.net/neutron/+bug/1798475
Change-Id: I731211b56a57d44636e741009721522f67c12368
If the update_port call failed with error IpAddressAlreadyAllocatedClient,
retry a few more times in order to find IP addresses that are available.
Change-Id: I7c5d51b01fa56083b1a689fa629a9a34c8b77012
Closes-Bug: #1808595
In some cases it may happen that port is "binding_failed"
because L2 agent running on destination host was down but
this is "temporary" issue.
It is like that for example in case when using L3 HA and when
master and backup network nodes were e.g. rebooted.
L3 agent might start running before L2 agent on host in such case
and if it's new master node, router ports will have "binding_failed"
state.
When agent sends heartbeat and is getting back to live,
ML2 plugin will try to bind all ports with "binding_failed"
from this host.
Change-Id: I3bedb7c22312884cc28aa78aa0f8fbe418f97090
Closes-Bug: #1794809
_assert_ping_during_agents_restart is used in tests of L2 and L3
agents. However, when it raises an exception due to a timeout, the
associated message assumes the agent under test is L2. This patch
fixes that
Change-Id: I3568c97a621e97699fcd93f09897e132d4db402a
During agents restart there is async ping run and there is
called function to wait until all async ping workers will
finish their job.
In TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost
there are 60 pings sent with 1 second timeout so default
wait_until_true timeout which is set to 60 seconds might not
be enough in some cases.
Because of that wait_until_true timeout is now set as
twice higher value than is needed to number of packets to send
with ping_timeout.
This should give enough time to finish all workers.
Change-Id: Ia7c3755c2ba5029bdab3c1dd30b305f3bde19740
Closes-Bug: #1775183
In case of HA routers IPv6 forwarding is not disabled by default and
then enabled only on master node.
Before this patch it was done in opposite way, so forwarding was
enabled by default and then disabled on backup nodes.
When forwarding was enabled/disabled for qg- port, MLDv2 packets are
sent and that might lead to temportary packets loss as packets to
FIP were sent to this backup node instead of master one.
Related-Bug: #1771841
Change-Id: Ia6b772e91c1f94612ca29d7082eca999372e60d6
Change network namespace add/delete/list code to use
pyroute2 library instead of calling /sbin/ip.
Also changed all in-tree callers to use the new calls.
Closes-bug: #1717582
Related-bug: #1492714
Change-Id: Id802e77543177fbb95ff15c2c7361172e8824633
We plan to switch to devstack-gate for fullstack job, and it revokes
direct sudo calls before executing tests, so we can't rely on sudo
working anymore.
This also moves functional-testing.filters to a more generic filename
(testing.filters) because the filters are now deployed and used by
fullstack target too.
Related-Bug: #1557168
Related-Bug: #1693689
Change-Id: I1718ea51836adbb8ef8dea79822a722dcf111127
It turned out dhcp tests work only because agents are considered dead
after 10 seconds while they report to server every 60 seconds. This led
to calling network resync after agent revival and hiding the fact dhcp
agent is not capable of receiving any amqp messages.
This patch sets the report interval of agents to the half of
agent_down_time on server side and uses eventlet dhcp agent in order to
trigger eventlet monkey patching code.
Eventlet was behind the failure with messages not getting processed. As
[1] notes: "Note: If the “eventlet” executor is used, the threading and
time library need to be monkeypatched."
Because each port calls dhclient to obtain IP address and each dhclient
instance overwrites /etc/resolv.conf there was added a script that
generates fullstack-dhclient-script from an existing dhclient-script
before starting fulltstack tests. This generated script is passed to
each dhclient process running in fake fullstack machine using -sf
parameter.
[1] https://docs.openstack.org/developer/oslo.messaging/server.html
Related-bug: 1453350
Change-Id: I0336176b9c364fe3a95be5cef9e7a3af1ef9d7e9
The oslo.db opportunistic test fixtures were not being
used effectively and the MySQL / PG databases were not
being used. This patch restores working patterns against
oslo.db. Additionally, the migration level tests have also
been updated to make use of oslo.db provisioning functionality
and unused methods have been removed.
The current approach makes use of oslo.db provisioning constructs
directly, as well as some semi-private attribute access within
oslo.db enginefacade, in order to work around some issues
that have arisen in oslo.db's test_base.
A new release of oslo.db will provide
public API points to resolve pending issues, and to
allow neutron's use cases here which will
also be of general applicability to openstack projects.
Closes-bug: #1594898
Change-Id: Ie27cf174fa24c2f479af47335d9ae139fb7d159a
Log files as .txt files, don't zip them, and put them where
they need to be instead of copy them there in the post gate
hook. The benefit to doing this is that we'll get logs
for tests even if the job timed out.
Change-Id: I4bfd27534c827aed3cbd7b43d7d1289480ea4806
Related-Bug: #1567668
We currently don't log everything being output from the test
runner, only what the processes themselves are logging.
Change-Id: Id5fb9cd44a0ed677a03da1d153ee3079fd5b7975
This will hopefully fix fullstack failures where different process
fixtures running in parallel test processes and relying on the same
random.choice() generator seeded by the same initial value could pick up
the same value as a service free port, and spawn their respective
resources using the same port.
Which made one of those unlucky services to fail.
Change-Id: I13cfa9392fd138c5e1b1b7d397b9ea91b2a47ed2
Closes-Bug: #1551288
Previously, we used create_all() based on models. We don't use
create_all() in production code and there is no guarantee models and
scripts are in sync even though we have a good functional test that
validates that. There are still pieces that can't be compared by
alembic.
Change-Id: I72fa67811f0763298416e6e084a8b9b86619795b
Closes-Bug: 1486528
* The EnvironmentDescription class describes an entire fullstack
environment (as opposed to the currently implemented host-only
descriptions). This will allow future patches to signify that a test
should set up an environment that supports tunneling, l2pop, QoS and
more.
* Now, most fullstack fixtures (config and process ones, at least),
expect both the EnvironmentDescription for the current test and the
HostDescription for the 'host' the config/process is on. This allows
for easier and most robust future changes, as now adding a new
parameter to one of the description objects doesn't mean adding that
argument to a number of other objects which are using it.
* Changed HostDescription's default argument of l3_agent to False, since
adding new configurations and defualting them to True forces the
author to go through ALL the tests and explicitly turn them on/off.
However, defaulting new configurations to False only requires
explicitly turning them on, which we ought to do anyway.
Change-Id: Ib2f12016ba4371bfda76c82e11d0794acc759955
mysql-python driver has been replaced by PyMySQL driver[1] in neutron
code but MySQL related functional/fullstack tests try to use
mysql-python driver because of MySQLOpportunisticTestCase[2] and tests
are skipped because mysql-python driver is no more available.
This change provides a backend implementation for mysql+pymysql, a base
base testcase MySQLTestCase[2] using mysql+pymysql implementation
(currently oslo.db provides none of them but will in the future) and
replaces MySQLOpportunisticTestCase with MySQLTestCase.
[1] I73e0fdb6eca70e7d029a40a2f6f17a7c0797a21d
[2] neutron.tests.common.base
Closes-Bug: #1463980
Change-Id: Ic5c1d12ab75443e1cc290a7447eeb4b452b4a9dd
As discussed in the Liberty Design Summit "Moving apps to Python 3"
cross-project workshop, the way forward in the near future is to
switch to the pure-python PyMySQL library as a default.
https://etherpad.openstack.org/p/liberty-cross-project-python3
Change-Id: I73e0fdb6eca70e7d029a40a2f6f17a7c0797a21d
Currently, it's up to the developer who wants to run full-stack on his
machine to make the directory in question (/opt/stack/logs). However,
this also means that the files don't get compressed at the end of a gate
run. Now, each full-stack test will have each own log directory in /tmp.
Once the logs are there, post_test_hook.sh can run 'gzip' on all the log
files before moving them to /opt/stack/logs on its own.
Change-Id: I5c04d0af0b9858722ae0c4baf0ee478ffb078e02
Currently, the full-stack framework has only one test which only uses
the neutron-server. This patch adds an actual test which makes sure that
once a router is created, an actual namespace is create for it. Since
this test requires 3 processes (neutron-server, l3-agent, ovs-agent),
existing full-stack code is modified to add more streamlined support for
such code.
Partially-Implements: blueprint integration-tests
Change-Id: Id5a8852d38543590b90e4bbed261a7a458071a9a
The full-stack framework overrides the database connection string before
every test is started, but after the test it doesn't revert the string
back to what it was originally. Since after the test the database is
deleted, the string is not actually valid once the test finished, and
this conflicts with tests which are ran on the same job (specifically
the retargetable tests - see associated bug). The proposed patch saves
the original connection string and reverts it after the test finishes.
Change-Id: I96c01483009084cbc2b81588a1283e84e6bcb4c4
Closes-bug: #1440797
This patch introduces the full-stack tests framework, as specified in
the blueprint. In short, this adds the neutron.tests.fullstack module,
which supports test-managed neutron daemons. Currently only
neutron-server is supported and follow-up patches will support for
multiple agents.
Implements: blueprint integration-tests
Co-Authored-By: Maru Newby <marun@redhat.com>
Change-Id: Iff24fc7cd428488e918c5f06bc7f923095760b07