Merge "Add spec on refactor the inventory"

This commit is contained in:
Zuul 2018-07-28 09:49:14 +00:00 committed by Gerrit Code Review
commit 011a072908
1 changed files with 378 additions and 0 deletions

View File

@ -0,0 +1,378 @@
Refactoring OSA inventory
#########################
:date: 2018-04-12 22:00
:tags: osa, inventory
The inventory as it stands today has been growing in complexity
and has only grown organically since its first implementation
in icehouse. Given that Ansible has changed a lot and has added
capabilities which were not available in those early versions,
it is time to take a step back and look at how it can be re-worked
to reduce technical debt and make it easier to maintain.
Problem description
===================
The current OpenStack-Ansible inventory provides the following
features:
* Assignment of hosts into groups
* Generating the group structure
* Assigning host variables
* Generating container inventory_hostnames
* Assigning and tracking container IPs based on cidr_networks,
reserved IPs, and already allocated IPs.
All these features are included into a single dynamic inventory script,
because at the time of its creation, only one inventory was allowed
at a time in an ansible cli call.
The dynamic inventory shipped by OSA is core of the functionality of
OpenStack-Ansible, yet it is not well understood, neither by the core
maintainers nor by new contributors.
As a result, the inventory has grown organically, both in code and
in memory usage (changes in the way we deploy things, adding new
groups, adding edge cases), and has not seen much maintenance
to reduce its scope or the technical debt.
At this point, due to a lack of tests and the complexity of the code,
it is difficult to work on without causing hidden breakages
which are often only found months later. Adding tests is
unrealisticly hard for this legacy code.
The problems can therefore be summarized in a few points:
* The inventory needs to be cleaned up of unnecessary groups and
assignments, but it is difficult to clean up effectively
without causing hidden breakages.
* We have to carry code in openstack-ansible that is not actively
maintained
* We have to execute code that's not actively audited, while
it would be technically possible to avoid the execution of
code with very few limitations for the end-user.
* Introducing tests to verify regressions was attempted during
the Newton, Ocata and Pike development cycles - but that
has done nothing more than increase the code complexity
and has done nothing to improve the reliability.
Proposed change
===============
Now that we are using Ansible 2.4, we can:
* Stack inventories together, and therefore we can split inventories
into smaller inventories if necessary
* Import, and convert inventories to a more readable format.
What I am proposing is to use static files for inventory.
It is easier for people to edit the inventory, and review it.
It's easier to manipulate, and doesn't require our code to
run or edit it.
Host vars, group vars, and inventory structure would be
static files, and slimmed down to the minimum.
Here are two example of slimming down (hosts vars, and inventory):
* For me, the features to track proper IP assignment is the
scope of a CMDB/IPAM. We shouldn't reinvent the wheel there.
Instead this should be spun out of the inventory.
People should either:
* use the old inventory to keep the same features, but
we add a warning that the code is deprecated
* provide their own IP addresses in a static file
* provide their own dynamic inventory script or use a lookup
to fetch data from their IPAM.
With the generation of IPs outside the scope of the inventory,
we could simplify the dynamic inventory further.
* For me, the groups like haproxy, haproxy_all, haproxy_hosts
or haproxy_containers are all confusing. Some are used
interchangeably, which led to bugs. The proliferation of
groups is only due to our inventory.
These can all be consolidated into a single
group, by changing the playbooks and roles. This is
not only restricted to haproxy, and this pattern of
group reduction should be extended to all our inventory.
So, at first we need to keep the same configuration style
(conf.d/env.d/openstack_user_config). The generated json
would then go through a script to generate and clean
the static files.
That script would be part of the deploy and upgrade
process.
Later, we could re-think the conf.d/env.d/openstack_user_config,
or keep it the same but completely change the underlying code.
That wouldn't be a problems, because it could be done on the
side, as a different inventory system. We would have, on the
way, documented the input and outputs of the inventory,
which could then be used for building test cases.
Alternatives
------------
Do nothing
Playbook/Role impact
--------------------
Removing references to old inventory data like old groups.
Use lookups or ansible_facts better to reduce the amount of hostvars.
Upgrade impact
--------------
Because our inventories are already in a bad state, we already have
hosts in the wrong groups.
Upgrade would need to run the tool to migrate the groups to the new
groups presented in the playbooks.
Security impact
---------------
By ultimately shipping less code, we would marginally
improve our security.
Performance impact
------------------
* Moving from dynamic to static file with the same format doesn't
change performance
* Moving from static json to static yaml may or may not improve
performance in your deployment by reducing memory usage.
It fully depends on the inventory.
Large inventories are more likely to lose performance
by switching to yaml for the same input.
* Cleaning up the inventory have a positive performance impact.
End user impact
---------------
The end users will not notice any change.
Deployer impact
---------------
The deployer will have a different user configuration to deal with
(static files)
Hopefully it shouldn't be too hard to understand for an existing
openstack-ansible user, or a experienced ansible user.
Developer impact
----------------
No change for the development of roles or playbooks.
At the same time we are removing technical debt, we are adding new
technical debt by adding these new tools.
With the hope this tools would be easier to understand, read, review,
and having more tests, it would overall reduce risks for the project.
Dependencies
------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
evrardjp
Other contributors:
None for now.
Work items
----------
Use static files is not without downsides:
We are losing some key features if we "just use" a
static inventory which is created by the user, like the
dynamic hostname generation, the dynamic IP allocations.
So I propose the following path:
#. We list the groups required for a successful ansible deploy,
and document those in the reference guide.
Positive improvements:
* For deployers that don't want to use our inventory, we
would now have an "explicit" contract of what they should
do to run openstack-ansible with their own inventory groups
Drawbacks:
* All changes in groups now needs proper documentation
* That's not enough to come with your own inventory
#. Keep the conf.d/env.d, and dynamic inventory script for now.
We use it for generating a json that stays static during the
lifecycle of the cloud, or until re-generated manually. The
env.d/conf.d/openstack_user_config.yml are used as input
for this "one-off" run of the dynamic inventory.
To make sure deployers don't misunderstand the "static" json
file or confuse it with the current openstack_inventory.json,
we should move the current files to a "cache" folder, and
generated the "static" inventory into a ``inventory`` folder.
Positive improvements:
* No hidden failures, the generation of the inventory becomes
a part of the deploy. We can add health checks easily.
* Our code run only once, during the generation. Therefore we
are not vulnerable to issues appearing when running
multiple ansible simulatenously, or other side effects.
* We keep the container name generation, provider networks,
and IP assignments for free.
Drawbacks:
* Edition of static file will not be in sync with
conf.d/env.d, but that was already the case with a manual
change to openstack_inventory.json
* The inventory_manage script becomes useless
#. We provide default child mapping: we create the x_all groups
in an easy to read .ini file in the openstack-ansible repo.
Positive improvements:
* All our users with their own inventory won't have to
create EXACTLY the same code to do child group mapping.
Sharing is caring.
* We would cary a lot of empty groups, and maybe people don't
need them.
* The mapping could then be used to partially replace the
documentation of step 1, and will fully replace the
step 1 documentation when the groups will be cleaned
in the playbooks and roles.
#. We export the host vars into a static files inside the
userspace inventory folder.
Positive improvements:
* Having static yaml files will make it easy to
see repetitions, and things that can move to
group vars
Drawbacks:
* More static files to maintain by the deployer.
If we change a host var, we could change the
inventory and it was applied everywhere.
It would not be the case anymore.
#. We write a tool manipulating the inventory json.
By default, that tool would:
* discard all the groups that aren't listed
in the reference guide
* discard all the _all groups from the inventory,
as they would not be required in the json anymore
(handled at a previous step)
* discard all the host variables (handled at a previous step)
* discard groups that can be generated from facts/host
variables, like all_containers
(using group_by would provide the same result).
Positive improvements:
* The inventory would be lighter, and therefore require
less memory to run. It would also run faster and require
less computing power.
Drawbacks:
* All the changes in groups now require a modification of said
tool, so a good design is necessary to make it easy to change.
#. We document a list of the expected and required
host/groups variables.
#. We remove all the unnecessary group and host variables
that were part of the inventory but aren't important anymore
by using/providing a tool manipulating variable files (yaml),
or by providing release notes.
#. We document how to export the cleaned up inventory into
a new YAML file.
#. The generation of conf.d, env.d, and
openstack_user_config becomes totally optional at
this point: We know what is required in a build, and
ask deployers to provide their own group/host mapping.
At this point it's optional because:
#. Assignment of hosts into groups can be done by the user
with a simple .ini/.yaml file + documentation
#. Standard group structure is provided by default
#. We have documented the list of host variables, so they
can be provided by the user
#. Generating container with their inventory_hostnames
can be done by the user.
It's just a series of host variables:
ansible_host, container_name, container_tech, physical_host.
It can even be done with a add_hosts and a loop based
on a new variable like container_names (property of the host).
#. Assigning and tracking container IPs based on
cidr_networks, reserved IPs, and already allocated IPs are
also host variables. Deployers are responsible to
provide an IP for their containers.
Example, the lxc_container_create role creates
IP, network, and interfaces configuration based on
lxc_container_networks_combined, which a variable taking
information from the inventory, by combining default
lxc_container_networks with the "container_networks"
variable, which is part of the inventory.
Note: this part can be later replaced by a lookup.
By using a lookup, we would simplify the inventory,
by completely removing its container networks of
the host vars.
#. We provide a script that runs all these actions for the
user, but also allow step by step editions and manipulations.
#. We provide a new tool to generate a new kind of
inventory based on what we learned from users, which
won't necessary use the openstack_user_config, conf.d, or
env.d. But we have all the time we need to do it better,
because the expected inventory is not the same as the
one we did the past.
#. We spin the old inventory out.
Testing
=======
All the work items would be separately tested in the integrated gates.
Documentation impact
====================
Large. The inventory would need a refactor to explain the expectations for
people coming with their inventory, and for people that will use our generation
tool. At the last step, if another tool is provided, it would also require
documenting.
Each step would require modifying the reference, and maybe the operations
guide.
References
==========
None