Add spec on refactor the inventory
A very hard work with very little rewards. Change-Id: If484020420ff17df8ea682c0ceb7450acd0e4325
This commit is contained in:
parent
eb670d5e25
commit
88fe4478a8
|
@ -0,0 +1,378 @@
|
|||
Refactoring OSA inventory
|
||||
#########################
|
||||
:date: 2018-04-12 22:00
|
||||
:tags: osa, inventory
|
||||
|
||||
The inventory as it stands today has been growing in complexity
|
||||
and has only grown organically since its first implementation
|
||||
in icehouse. Given that Ansible has changed a lot and has added
|
||||
capabilities which were not available in those early versions,
|
||||
it is time to take a step back and look at how it can be re-worked
|
||||
to reduce technical debt and make it easier to maintain.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The current OpenStack-Ansible inventory provides the following
|
||||
features:
|
||||
|
||||
* Assignment of hosts into groups
|
||||
* Generating the group structure
|
||||
* Assigning host variables
|
||||
* Generating container inventory_hostnames
|
||||
* Assigning and tracking container IPs based on cidr_networks,
|
||||
reserved IPs, and already allocated IPs.
|
||||
|
||||
All these features are included into a single dynamic inventory script,
|
||||
because at the time of its creation, only one inventory was allowed
|
||||
at a time in an ansible cli call.
|
||||
|
||||
The dynamic inventory shipped by OSA is core of the functionality of
|
||||
OpenStack-Ansible, yet it is not well understood, neither by the core
|
||||
maintainers nor by new contributors.
|
||||
|
||||
As a result, the inventory has grown organically, both in code and
|
||||
in memory usage (changes in the way we deploy things, adding new
|
||||
groups, adding edge cases), and has not seen much maintenance
|
||||
to reduce its scope or the technical debt.
|
||||
|
||||
At this point, due to a lack of tests and the complexity of the code,
|
||||
it is difficult to work on without causing hidden breakages
|
||||
which are often only found months later. Adding tests is
|
||||
unrealisticly hard for this legacy code.
|
||||
|
||||
The problems can therefore be summarized in a few points:
|
||||
|
||||
* The inventory needs to be cleaned up of unnecessary groups and
|
||||
assignments, but it is difficult to clean up effectively
|
||||
without causing hidden breakages.
|
||||
* We have to carry code in openstack-ansible that is not actively
|
||||
maintained
|
||||
* We have to execute code that's not actively audited, while
|
||||
it would be technically possible to avoid the execution of
|
||||
code with very few limitations for the end-user.
|
||||
* Introducing tests to verify regressions was attempted during
|
||||
the Newton, Ocata and Pike development cycles - but that
|
||||
has done nothing more than increase the code complexity
|
||||
and has done nothing to improve the reliability.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Now that we are using Ansible 2.4, we can:
|
||||
|
||||
* Stack inventories together, and therefore we can split inventories
|
||||
into smaller inventories if necessary
|
||||
* Import, and convert inventories to a more readable format.
|
||||
|
||||
What I am proposing is to use static files for inventory.
|
||||
It is easier for people to edit the inventory, and review it.
|
||||
It's easier to manipulate, and doesn't require our code to
|
||||
run or edit it.
|
||||
|
||||
Host vars, group vars, and inventory structure would be
|
||||
static files, and slimmed down to the minimum.
|
||||
|
||||
Here are two example of slimming down (hosts vars, and inventory):
|
||||
|
||||
* For me, the features to track proper IP assignment is the
|
||||
scope of a CMDB/IPAM. We shouldn't reinvent the wheel there.
|
||||
Instead this should be spun out of the inventory.
|
||||
People should either:
|
||||
|
||||
* use the old inventory to keep the same features, but
|
||||
we add a warning that the code is deprecated
|
||||
* provide their own IP addresses in a static file
|
||||
* provide their own dynamic inventory script or use a lookup
|
||||
to fetch data from their IPAM.
|
||||
|
||||
With the generation of IPs outside the scope of the inventory,
|
||||
we could simplify the dynamic inventory further.
|
||||
|
||||
* For me, the groups like haproxy, haproxy_all, haproxy_hosts
|
||||
or haproxy_containers are all confusing. Some are used
|
||||
interchangeably, which led to bugs. The proliferation of
|
||||
groups is only due to our inventory.
|
||||
These can all be consolidated into a single
|
||||
group, by changing the playbooks and roles. This is
|
||||
not only restricted to haproxy, and this pattern of
|
||||
group reduction should be extended to all our inventory.
|
||||
|
||||
So, at first we need to keep the same configuration style
|
||||
(conf.d/env.d/openstack_user_config). The generated json
|
||||
would then go through a script to generate and clean
|
||||
the static files.
|
||||
|
||||
That script would be part of the deploy and upgrade
|
||||
process.
|
||||
|
||||
Later, we could re-think the conf.d/env.d/openstack_user_config,
|
||||
or keep it the same but completely change the underlying code.
|
||||
That wouldn't be a problems, because it could be done on the
|
||||
side, as a different inventory system. We would have, on the
|
||||
way, documented the input and outputs of the inventory,
|
||||
which could then be used for building test cases.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Do nothing
|
||||
|
||||
Playbook/Role impact
|
||||
--------------------
|
||||
|
||||
Removing references to old inventory data like old groups.
|
||||
Use lookups or ansible_facts better to reduce the amount of hostvars.
|
||||
|
||||
Upgrade impact
|
||||
--------------
|
||||
|
||||
Because our inventories are already in a bad state, we already have
|
||||
hosts in the wrong groups.
|
||||
|
||||
Upgrade would need to run the tool to migrate the groups to the new
|
||||
groups presented in the playbooks.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
By ultimately shipping less code, we would marginally
|
||||
improve our security.
|
||||
|
||||
Performance impact
|
||||
------------------
|
||||
|
||||
* Moving from dynamic to static file with the same format doesn't
|
||||
change performance
|
||||
* Moving from static json to static yaml may or may not improve
|
||||
performance in your deployment by reducing memory usage.
|
||||
It fully depends on the inventory.
|
||||
Large inventories are more likely to lose performance
|
||||
by switching to yaml for the same input.
|
||||
* Cleaning up the inventory have a positive performance impact.
|
||||
|
||||
End user impact
|
||||
---------------
|
||||
|
||||
The end users will not notice any change.
|
||||
|
||||
Deployer impact
|
||||
---------------
|
||||
|
||||
The deployer will have a different user configuration to deal with
|
||||
(static files)
|
||||
|
||||
Hopefully it shouldn't be too hard to understand for an existing
|
||||
openstack-ansible user, or a experienced ansible user.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
No change for the development of roles or playbooks.
|
||||
|
||||
At the same time we are removing technical debt, we are adding new
|
||||
technical debt by adding these new tools.
|
||||
|
||||
With the hope this tools would be easier to understand, read, review,
|
||||
and having more tests, it would overall reduce risks for the project.
|
||||
|
||||
Dependencies
|
||||
------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
evrardjp
|
||||
|
||||
Other contributors:
|
||||
None for now.
|
||||
|
||||
Work items
|
||||
----------
|
||||
|
||||
Use static files is not without downsides:
|
||||
We are losing some key features if we "just use" a
|
||||
static inventory which is created by the user, like the
|
||||
dynamic hostname generation, the dynamic IP allocations.
|
||||
|
||||
So I propose the following path:
|
||||
|
||||
#. We list the groups required for a successful ansible deploy,
|
||||
and document those in the reference guide.
|
||||
|
||||
Positive improvements:
|
||||
|
||||
* For deployers that don't want to use our inventory, we
|
||||
would now have an "explicit" contract of what they should
|
||||
do to run openstack-ansible with their own inventory groups
|
||||
|
||||
Drawbacks:
|
||||
|
||||
* All changes in groups now needs proper documentation
|
||||
* That's not enough to come with your own inventory
|
||||
|
||||
#. Keep the conf.d/env.d, and dynamic inventory script for now.
|
||||
We use it for generating a json that stays static during the
|
||||
lifecycle of the cloud, or until re-generated manually. The
|
||||
env.d/conf.d/openstack_user_config.yml are used as input
|
||||
for this "one-off" run of the dynamic inventory.
|
||||
|
||||
To make sure deployers don't misunderstand the "static" json
|
||||
file or confuse it with the current openstack_inventory.json,
|
||||
we should move the current files to a "cache" folder, and
|
||||
generated the "static" inventory into a ``inventory`` folder.
|
||||
|
||||
Positive improvements:
|
||||
|
||||
* No hidden failures, the generation of the inventory becomes
|
||||
a part of the deploy. We can add health checks easily.
|
||||
* Our code run only once, during the generation. Therefore we
|
||||
are not vulnerable to issues appearing when running
|
||||
multiple ansible simulatenously, or other side effects.
|
||||
* We keep the container name generation, provider networks,
|
||||
and IP assignments for free.
|
||||
|
||||
Drawbacks:
|
||||
|
||||
* Edition of static file will not be in sync with
|
||||
conf.d/env.d, but that was already the case with a manual
|
||||
change to openstack_inventory.json
|
||||
* The inventory_manage script becomes useless
|
||||
|
||||
#. We provide default child mapping: we create the x_all groups
|
||||
in an easy to read .ini file in the openstack-ansible repo.
|
||||
|
||||
Positive improvements:
|
||||
|
||||
* All our users with their own inventory won't have to
|
||||
create EXACTLY the same code to do child group mapping.
|
||||
Sharing is caring.
|
||||
* We would cary a lot of empty groups, and maybe people don't
|
||||
need them.
|
||||
* The mapping could then be used to partially replace the
|
||||
documentation of step 1, and will fully replace the
|
||||
step 1 documentation when the groups will be cleaned
|
||||
in the playbooks and roles.
|
||||
|
||||
#. We export the host vars into a static files inside the
|
||||
userspace inventory folder.
|
||||
|
||||
Positive improvements:
|
||||
|
||||
* Having static yaml files will make it easy to
|
||||
see repetitions, and things that can move to
|
||||
group vars
|
||||
|
||||
Drawbacks:
|
||||
* More static files to maintain by the deployer.
|
||||
If we change a host var, we could change the
|
||||
inventory and it was applied everywhere.
|
||||
It would not be the case anymore.
|
||||
|
||||
#. We write a tool manipulating the inventory json.
|
||||
By default, that tool would:
|
||||
|
||||
* discard all the groups that aren't listed
|
||||
in the reference guide
|
||||
* discard all the _all groups from the inventory,
|
||||
as they would not be required in the json anymore
|
||||
(handled at a previous step)
|
||||
* discard all the host variables (handled at a previous step)
|
||||
* discard groups that can be generated from facts/host
|
||||
variables, like all_containers
|
||||
(using group_by would provide the same result).
|
||||
|
||||
Positive improvements:
|
||||
|
||||
* The inventory would be lighter, and therefore require
|
||||
less memory to run. It would also run faster and require
|
||||
less computing power.
|
||||
|
||||
Drawbacks:
|
||||
|
||||
* All the changes in groups now require a modification of said
|
||||
tool, so a good design is necessary to make it easy to change.
|
||||
|
||||
#. We document a list of the expected and required
|
||||
host/groups variables.
|
||||
|
||||
#. We remove all the unnecessary group and host variables
|
||||
that were part of the inventory but aren't important anymore
|
||||
by using/providing a tool manipulating variable files (yaml),
|
||||
or by providing release notes.
|
||||
|
||||
#. We document how to export the cleaned up inventory into
|
||||
a new YAML file.
|
||||
|
||||
#. The generation of conf.d, env.d, and
|
||||
openstack_user_config becomes totally optional at
|
||||
this point: We know what is required in a build, and
|
||||
ask deployers to provide their own group/host mapping.
|
||||
|
||||
At this point it's optional because:
|
||||
|
||||
#. Assignment of hosts into groups can be done by the user
|
||||
with a simple .ini/.yaml file + documentation
|
||||
#. Standard group structure is provided by default
|
||||
#. We have documented the list of host variables, so they
|
||||
can be provided by the user
|
||||
#. Generating container with their inventory_hostnames
|
||||
can be done by the user.
|
||||
It's just a series of host variables:
|
||||
ansible_host, container_name, container_tech, physical_host.
|
||||
It can even be done with a add_hosts and a loop based
|
||||
on a new variable like container_names (property of the host).
|
||||
#. Assigning and tracking container IPs based on
|
||||
cidr_networks, reserved IPs, and already allocated IPs are
|
||||
also host variables. Deployers are responsible to
|
||||
provide an IP for their containers.
|
||||
Example, the lxc_container_create role creates
|
||||
IP, network, and interfaces configuration based on
|
||||
lxc_container_networks_combined, which a variable taking
|
||||
information from the inventory, by combining default
|
||||
lxc_container_networks with the "container_networks"
|
||||
variable, which is part of the inventory.
|
||||
Note: this part can be later replaced by a lookup.
|
||||
By using a lookup, we would simplify the inventory,
|
||||
by completely removing its container networks of
|
||||
the host vars.
|
||||
|
||||
#. We provide a script that runs all these actions for the
|
||||
user, but also allow step by step editions and manipulations.
|
||||
|
||||
#. We provide a new tool to generate a new kind of
|
||||
inventory based on what we learned from users, which
|
||||
won't necessary use the openstack_user_config, conf.d, or
|
||||
env.d. But we have all the time we need to do it better,
|
||||
because the expected inventory is not the same as the
|
||||
one we did the past.
|
||||
|
||||
#. We spin the old inventory out.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
All the work items would be separately tested in the integrated gates.
|
||||
|
||||
|
||||
Documentation impact
|
||||
====================
|
||||
|
||||
Large. The inventory would need a refactor to explain the expectations for
|
||||
people coming with their inventory, and for people that will use our generation
|
||||
tool. At the last step, if another tool is provided, it would also require
|
||||
documenting.
|
||||
|
||||
Each step would require modifying the reference, and maybe the operations
|
||||
guide.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
None
|
Loading…
Reference in New Issue