From bdaa492657a7c41857dea05e20d1996f26b2a523 Mon Sep 17 00:00:00 2001 From: Tong Li Date: Mon, 12 Dec 2016 15:04:27 -0500 Subject: [PATCH] Added lessons learnt document The lessions learnt document has been created on etherpad, this patch took all the content from that document, edited, reformatted, and also added more few items I learnt from my own experiences of developing these workload scripts. Change-Id: I3ab9cacd6369c3cbcfc912707a9e844ae7c4fc2d --- doc/source/lessonslearnt.rst | 112 +++++++++++++++++++++++++++++++++++ 1 file changed, 112 insertions(+) create mode 100755 doc/source/lessonslearnt.rst diff --git a/doc/source/lessonslearnt.rst b/doc/source/lessonslearnt.rst new file mode 100755 index 0000000..bf2760c --- /dev/null +++ b/doc/source/lessonslearnt.rst @@ -0,0 +1,112 @@ +Tooling +------- + +For interoperable automated deployment, Ansible + Ansible OpenStack cloud +modules (based on OpenStack Shade) provided the best results. + +Terraform and its OpenStack cloud modules (based on OpenStack Shade) has +also been tried, but various issues such as not supporting multiple same +service endpoints renders many clouds which support multiple endpoints for +Nova, Neutron versions rendered failed deployment on these clouds, but +supporting multiple endpoints is necessary for different versions of +OpenStack client applications. Terraform also does not allow apply (creation) +and destroy (remove) action to be used at the same time. But it is often +needed, for example, during a deployment, you may need a floating IP (that is +the apply action in Terraform) but at the end of the deployment you may want +to remove that floating IP (that is the destroy action in Terraform), so the +floating IP which is a resource in Terraform only existed in a short period +of time, Terraform can not really handle the situation. This is probably the +most unforgiven restriction of the Terraform. The Interop Challenge working +group can not seem find a work around to overcome the restriction. Also it +appears that these issues have been identified but have not been actively +addressed by Terraform community. + +OpenStack Heat has also been discussed but since the adoption of HEAT is +still not wide spread, this tool was not used. Similar reasons for other +tools like Murano and Juju. + +It's perhaps worth noting that both the Ansible OpenStack cloud modules and +Terraform OpenStack cloud modules based on OpenStack Shade, which is +a library that was written explicitlly to work around some Interop +problems. So we can essentially have some degree of interop as long as +there is an interop layer between us and the cloud (the aim should be not +to need this library), tooling in interop challenge is a very important +subject. + +Shade seems to be missing AZ parameter for create_keypair (Ansible's +os_keypair) and other functions which can cause problems on clouds with +multi-AZs per region. + + +Networking +---------- + +Network virtualization features are where most interoperability issues become +visible. OpenStack Neutron support very large number of plugins, these plugins +can behave very differently. For example, private IP and floating IP +supporting can vary, some clouds make public accessable IP address as private +IP address when returned from client library, some clouds make the same thing +as public IP address, the later seems to be the right behavior, but clouds +implement them differently. Layer 2 and layer 3 functions can be also +challenge, some clouds won't expose the functions for customers to create +routers, or networks. Releasing the alocated floating IPs is completely +missing from all OpenStack cloud modules tools like Ansible and Terraform. +This problem results in the alocated floating IPs hanging around, it is +especially bad for clouds which do not have small public IP address segment. + +Not all clouds provide tenant networks by default. Be prepared to have to +configure your own if netnant network can be had. + +Can not assume the first NIC on the guest is going to be eth0 (this is common +on older guest OS's prior to the arrival of Predictable Network Interface +Names and systemd, and likely isn't true on newer guest OS's). Instead, allow +the user to set those as parameters to the workload or try to detect these +names in the workload when the network nic is needed. + +Not all clouds support floating IP or private IP. You may want to structure +your workload so that it can adapt to either attach instances to a routeable +network or use floating IP's based on the parameters it's given. + +The tenant network has its advantages when the communications are server to +server on the same network. For example, when your deployment scenario +involves multiple backend servers such as database and application servers, +the commuincation between these servers can be placed on the tenant network +to improve security and performance. + + +Provisiong +---------- + +It makes a real difference not only the HW that the cloud is running on but +also if the backend is ceph or something else, if it is co-located, if the +images have any sort of overhead checks, etc. + +If you don't assume a particular guest OS image, be careful with +storage/networking. We encountered one example in which a particular +guestOS/virtual adapter pair needed to rescan the SCSI bus before it would +recognize a newly attached Cinder volume. Rescanning the bus is generally +harmless if not needed and ensures that images built with adapter types that +need it run successfully, so it's an example of something you can do to make +your workloads more interoperable. + +Parameterize things that are likely to change across different cloud/guest +OS setups. For example: don't assume the first volume attached to a guest +will always be /dev/vdb (this is common but not guaranteed on libvirt, often +untrue on other hypervisors). + + +Metadata +-------- + +Not all cloud support cloud-init, when develop workloads which heavily rely +on metadata services, the clouds without metadata support will fail. + + +Conclusion +---------- + +With best practices it is possible to create enterprise applications (with +enterprise characteristics such as load balancer, multiple web application +servers, distributed database, security groups, block storage to provide +enterprise level networking safeguards) can be created such that they are +portable to numerous (over 18) private and public OpenStack Clouds. \ No newline at end of file