6.3 KiB
Service VM Orchestration and Management
Astara Orchestrator
astara-orchestrator
is a multi-processed,
multithreaded Python process composed of three primary subsystems, each
of which are spawned as a subprocess of the main :pyastara-orchestrator
process:
L3 and DHCP Event Consumption
:pyastara.notifications
uses kombu and a Python :pymultiprocessing.Queue
to
listen for specific Neutron service events (e.g.,
router.interface.create
, subnet.create.end
,
port.create.end
, port.delete.end
) and
normalize them into one of several event types:
CREATE
- a router creation was requestedUPDATE
- services on a router need to be reconfiguredDELETE
- a router was deletedPOLL
- used by thehealth monitor<health>
for checking aliveness of a Service VMREBUILD
- a Service VM should be destroyed and recreated
As events are normalized and shuttled onto the :pymultiprocessing.Queue
,
:pyastara.scheduler
shards (by Tenant ID, by default) and distributes them amongst a pool of
worker processes it manages.
This system also consumes and distributes special :pyastara.command
events which
are published by the rug-ctl
operator tools<operator_tools>
.
State Machine Workers and Router Lifecycle
Each multithreaded worker process manages a pool of state machines (one per virtual router), each of which represents the lifecycle of an individual router. As the scheduler distributes events for a specific router, logic in the worker (dependent on the router's current state) determines which action to take next:
worker_diagram.dot
For example, let's say a user created a new Neutron network, subnet,
and router. In this scenario, a router-interface-create
event would be handled by the appropriate worker (based by tenant ID),
and a transition through the state machine might look something like
this:
sample_boot.dot
State Machine Flow
The supported states in the state machine are:
- CalcAction
The entry point of the state machine. Depending on the current status of the Service VM (e.g.,
ACTIVE
,BUILD
,SHUTDOWN
) and the current event, determine the first step in the state machine to transition to.- Alive
Check aliveness of the Service VM by attempting to communicate with it via its REST HTTP API.
- CreateVM
Call
nova boot
to boot a new Service VM. This will attempt to boot a Service VM up to a (configurable) number of times before placing the router intoERROR
state.- CheckBoot
Check aliveness (up to a configurable number of seconds) of the router until the VM is responsive and ready for initial configuration.
- ConfigureVM
Configure the Service VM and its services. This is generally the final step in the process of booting and configuring a router. This step communicates with the Neutron API to generate a comprehensive network configuration for the router (which is pushed to the router via its REST API). On success, the state machine yields control back to the worker thread and that thread handles the next event in its queue (likely for a different Service VM and its state machine).
- ReplugVM
Attempt to hot-plug/unplug a network from the router via
nova interface-attach
ornova-interface-detach
.- StopVM
Terminate a running Service VM. This is generally performed when a Neutron router is deleted or via explicit operator tools.
- ClearError
After a (configurable) number of
nova boot
failures, Neutron routers are automatically transitioned into a cool downERROR
state (so that :pyastara
will not continue to boot them forever; this is to prevent further exasperation of failing hypervisors). This state transition is utilized to add routers back into management after issues are resolved and signal to :pyastara-orchestrator
that it should attempt to manage them again.- STATS
Reads traffic data from the router.
- CONFIG
Configures the VM and its services.
- EXIT
Processing stops.
ACT(ion) Variables are:
- Create
Create router was requested.
- Read
Read router traffic stats.
- Update
Update router configuration.
- Delete
Delete router.
- Poll
Poll router alive status.
- rEbuild
Recreate a router from scratch.
VM Variables are:
- Down
VM is known to be down.
- Booting
VM is booting.
- Up
VM is known to be up (pingable).
- Configured
VM is known to be configured.
- Restart Needed
VM needs to be rebooted.
- Hotplug Needed
VM needs to be replugged.
- Gone
The router definition has been removed from neutron.
- Error
The router has been rebooted too many times, or has had some other error.
state_machine.dot
Health Monitoring
astara.health
is a subprocess which (at a configurable
interval) periodically delivers POLL
events to every known
virtual router. This event transitions the state machine into the
Alive
state, which (depending on the availability of the
router), may simply exit the state machine (because the router's status
API replies with an HTTP 200
) or transition to the
CreateVM
state (because the router is unresponsive and must
be recreated).
High Availability
Astara supports high-availability (HA) on both the control plane and data plane.
The astara-orchestrator
service may be deployed in a
configuration that allows multiple service processes to span nodes to
allow load-distribution and HA. For more information on clustering, see
the install docs<cluster_astara>
.
It also supports orchestrating pairs of virtual appliances to provide
HA of the data path, allowing pairs of virtual routers to be clustered
among themselves using VRRP and connection tracking. To enable this,
simply create Neutron routers with the ha=True
parameter or
set this property on existing routers and issue a rebuild command via
astara-ctl
for that router.