Added lessons learnt document

The lessions learnt document has been created on etherpad, this
patch took all the content from that document, edited, reformatted,
and also added more few items I learnt from my own experiences of
developing these workload scripts.

Change-Id: I3ab9cacd6369c3cbcfc912707a9e844ae7c4fc2d
This commit is contained in:
Tong Li 2016-12-12 15:04:27 -05:00
parent 4e8f7cb7e2
commit bdaa492657
1 changed files with 112 additions and 0 deletions

112
doc/source/lessonslearnt.rst Executable file
View File

@ -0,0 +1,112 @@
Tooling
-------
For interoperable automated deployment, Ansible + Ansible OpenStack cloud
modules (based on OpenStack Shade) provided the best results.
Terraform and its OpenStack cloud modules (based on OpenStack Shade) has
also been tried, but various issues such as not supporting multiple same
service endpoints renders many clouds which support multiple endpoints for
Nova, Neutron versions rendered failed deployment on these clouds, but
supporting multiple endpoints is necessary for different versions of
OpenStack client applications. Terraform also does not allow apply (creation)
and destroy (remove) action to be used at the same time. But it is often
needed, for example, during a deployment, you may need a floating IP (that is
the apply action in Terraform) but at the end of the deployment you may want
to remove that floating IP (that is the destroy action in Terraform), so the
floating IP which is a resource in Terraform only existed in a short period
of time, Terraform can not really handle the situation. This is probably the
most unforgiven restriction of the Terraform. The Interop Challenge working
group can not seem find a work around to overcome the restriction. Also it
appears that these issues have been identified but have not been actively
addressed by Terraform community.
OpenStack Heat has also been discussed but since the adoption of HEAT is
still not wide spread, this tool was not used. Similar reasons for other
tools like Murano and Juju.
It's perhaps worth noting that both the Ansible OpenStack cloud modules and
Terraform OpenStack cloud modules based on OpenStack Shade, which is
a library that was written explicitlly to work around some Interop
problems. So we can essentially have some degree of interop as long as
there is an interop layer between us and the cloud (the aim should be not
to need this library), tooling in interop challenge is a very important
subject.
Shade seems to be missing AZ parameter for create_keypair (Ansible's
os_keypair) and other functions which can cause problems on clouds with
multi-AZs per region.
Networking
----------
Network virtualization features are where most interoperability issues become
visible. OpenStack Neutron support very large number of plugins, these plugins
can behave very differently. For example, private IP and floating IP
supporting can vary, some clouds make public accessable IP address as private
IP address when returned from client library, some clouds make the same thing
as public IP address, the later seems to be the right behavior, but clouds
implement them differently. Layer 2 and layer 3 functions can be also
challenge, some clouds won't expose the functions for customers to create
routers, or networks. Releasing the alocated floating IPs is completely
missing from all OpenStack cloud modules tools like Ansible and Terraform.
This problem results in the alocated floating IPs hanging around, it is
especially bad for clouds which do not have small public IP address segment.
Not all clouds provide tenant networks by default. Be prepared to have to
configure your own if netnant network can be had.
Can not assume the first NIC on the guest is going to be eth0 (this is common
on older guest OS's prior to the arrival of Predictable Network Interface
Names and systemd, and likely isn't true on newer guest OS's). Instead, allow
the user to set those as parameters to the workload or try to detect these
names in the workload when the network nic is needed.
Not all clouds support floating IP or private IP. You may want to structure
your workload so that it can adapt to either attach instances to a routeable
network or use floating IP's based on the parameters it's given.
The tenant network has its advantages when the communications are server to
server on the same network. For example, when your deployment scenario
involves multiple backend servers such as database and application servers,
the commuincation between these servers can be placed on the tenant network
to improve security and performance.
Provisiong
----------
It makes a real difference not only the HW that the cloud is running on but
also if the backend is ceph or something else, if it is co-located, if the
images have any sort of overhead checks, etc.
If you don't assume a particular guest OS image, be careful with
storage/networking. We encountered one example in which a particular
guestOS/virtual adapter pair needed to rescan the SCSI bus before it would
recognize a newly attached Cinder volume. Rescanning the bus is generally
harmless if not needed and ensures that images built with adapter types that
need it run successfully, so it's an example of something you can do to make
your workloads more interoperable.
Parameterize things that are likely to change across different cloud/guest
OS setups. For example: don't assume the first volume attached to a guest
will always be /dev/vdb (this is common but not guaranteed on libvirt, often
untrue on other hypervisors).
Metadata
--------
Not all cloud support cloud-init, when develop workloads which heavily rely
on metadata services, the clouds without metadata support will fail.
Conclusion
----------
With best practices it is possible to create enterprise applications (with
enterprise characteristics such as load balancer, multiple web application
servers, distributed database, security groups, block storage to provide
enterprise level networking safeguards) can be created such that they are
portable to numerous (over 18) private and public OpenStack Clouds.