Add running trove in production guide

Change-Id: I18654091cc39a0a5de17ff9822f14d6c41facf42
2020-01-13 18:37:23 +13:00 · 2020-01-13 18:37:23 +13:00 · d57354942e
parent 8356f849f0
commit d57354942e
3 changed files with 329 additions and 5 deletions
--- a/doc/source/admin/datastore.rst
+++ b/doc/source/admin/datastore.rst
@ -1,8 +1,8 @@
 .. _database:

-========
-Database
-========
+=========
+Datastore
+=========

 The Database service provides database management features.

--- a/doc/source/admin/index.rst
+++ b/doc/source/admin/index.rst
@ -5,7 +5,8 @@
 .. toctree::
   :maxdepth: 2

-   basics
+   run_trove_in_production
+   datastore
   building_guest_images
-   database_module_usage
   secure_oslo_messaging
+   database_module_usage
--- a/doc/source/admin/run_trove_in_production.rst
+++ b/doc/source/admin/run_trove_in_production.rst
@ -0,0 +1,323 @@
+..
+      Copyright (c) 2020 Catalyst Cloud
+
+      Licensed under the Apache License, Version 2.0 (the "License"); you may
+      not use this file except in compliance with the License. You may obtain
+      a copy of the License at
+
+          http://www.apache.org/licenses/LICENSE-2.0
+
+      Unless required by applicable law or agreed to in writing, software
+      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+      License for the specific language governing permissions and limitations
+      under the License.
+
+===========================
+Running Trove in production
+===========================
+
+This document is not a definitive guide for deploying Trove in every production
+environment. There are many ways to deploy Trove depending on the specifics and
+limitations of your situation. We hope this document provides the cloud
+operator or distribution creator with a basic understanding of how the Trove
+components fit together practically. Through this, it should become more
+obvious how components of Trove can be divided or duplicated across physical
+hardware in a production cloud environment to aid in achieving scalability and
+resiliency for the database as a service software.
+
+In the interest of keeping this guide somewhat high-level and avoiding
+obsolescence or operator/distribution-specific environment assumptions by
+specifying exact commands that should be run to accomplish the tasks below, we
+will instead just describe what needs to be done and leave it to the cloud
+operator or distribution creator to "do the right thing" to accomplish the task
+for their environment. If you need guidance on specific commands to run to
+accomplish the tasks described below, we recommend reading through the
+``plugin.sh`` script in devstack subdirectory of this project. The devstack
+plugin exercises all the essential components of Trove in the right order, and
+this guide will mostly be an elaboration of this process.
+
+
+Environment Assumptions
+-----------------------
+The scope of this guide is to provide a basic overview of setting up all
+the components of Trove in a production environment, assuming that the
+default in-tree drivers and components are going to be used.
+
+For the purposes of this guide, we will therefore assume the following core
+components have already been set up for your production OpenStack environment:
+
+* RabbitMQ
+* MySQL
+* Keystone
+* Nova
+* Cinder
+* Neutron
+* Glance
+* Swift
+
+
+Production Deployment Walkthrough
+---------------------------------
+
+
+Create Trove Service User
+~~~~~~~~~~~~~~~~~~~~~~~~~
+By default Trove will use the 'trove' user with 'admin' role in 'service'
+tenant for both keystone authentication and interactions with all other
+services.
+
+
+Service Tenant Deployment
+~~~~~~~~~~~~~~~~~~~~~~~~~
+In production, almost all the cloud resources(except the Swift objects for
+backup data) created for a Trove instance should be only visible to the Trove
+service user. As DBaaS users, they should only see a Trove instance after
+creating, and know nothing about the Nova VM, Cinder volume, Neutron management
+network and security groups under the hood. The only way to operate Trove
+instance is to interact with `Trove API
+<https://docs.openstack.org/api-ref/database/>`_.
+
+Service tenant deployment is the default configuration in Trove since Ussuri
+release.
+
+
+Install Trove Controller Software
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Trove controller services should be put somewhere that has access to the
+database, the oslo messaging system, and other OpenStack services. Trove uses
+the standard python setuptools, so installation of the software itself should
+be straightforward.
+
+Running multiple instances of the individual Trove controller components on
+separate physical hosts is recommended in order to provide scalability and
+availability of the controller software.
+
+
+Management Network
+~~~~~~~~~~~~~~~~~~
+Trove makes use of a "Management Network" exclusively that the controller uses
+to talk to guest agent running inside Trove instance and vice versa. All the
+instances that Trove deploys will have interfaces on this network. Therefore,
+it's important that the subnet deployed on this network be sufficiently large
+to allow for the maximum number of instances and controllers likely to be
+deployed throughout the lifespan of the cloud installation.
+
+Usually, after a Trove instance is created, there are 2 nics attached to the
+instance VM, one for the database traffic on user-defined network, one for
+management purpose. Trove will check if the user's subnet conflicts with the
+management network.
+
+You can also create a management Neutron security group that will be applied to
+the management port. Basically, nothing needs to be allowed to access the
+management port, most of the network communication within the Trove instance is
+egress traffic(e.g. the guest agent initiates connection with RabbitMQ).
+However, It can be helpful to allow SSH access to the Trove instance from the
+controller for troubleshooting purposes (ie. TCP port 22), though this is not
+strictly necessary in production environments.
+
+In order to SSH into the Trove instance(as mentioned above, it's helpful but
+not necessary), the cloud administrators need to create and config a Nova
+keypair.
+
+Finally, you need to add routing or interfaces to this network so that the
+Trove guest agent running inside the instance is able to connect with RabbitMQ.
+
+
+RabbitMQ Considerations
+~~~~~~~~~~~~~~~~~~~~~~~
+Both trove-taskmanager and trove-conductor talk to guest agent inside Trove
+instance via the messaging system, ie. RabbitMQ. Once the guest agent is up and
+running, it's listening on a message queue named ``guestagent.<guest ID>``
+specifically set up for that particular instance, receiving requests from
+trove-taskmanager for operations like set up the database software, create
+databases and users, restart database service etc. At the mean while,
+trove-guestagent periodically sends status update information to
+trove-conductor through the messaging system.
+
+With all that said, a proper RabbitMQ user name and password need to be
+configured in the trove-guestagent config file, which may bring security
+concern for the cloud deployers. If the guest instance is compromised, then
+guest credentials are compromised, which means the messaging system is
+compromised.
+
+As part of the solution, Trove introduced a `security enhancement
+<https://docs.openstack.org/trove/latest/admin/secure_oslo_messaging.html>`_ in
+Ocata release, using encryption keys to protect the messages between the
+control plane and the guest instances, which guarantees that one compromised
+guest instance doesn't affect other instances nor other cloud users.
+
+
+Configuring Trove
+~~~~~~~~~~~~~~~~~
+The default Trove configuration file location is ``/etc/trove/trove.conf``. The
+typical config options (not a full list) are:
+
+DEFAULT group
+  enable_secure_rpc_messaging
+    Should RPC messaging traffic be secured by encryption.
+
+  taskmanager_rpc_encr_key
+    The key (OpenSSL aes_cbc) used to encrypt RPC messages sent to
+    trove-taskmanager, used by trove-api.
+
+  instance_rpc_encr_key
+    The key (OpenSSL aes_cbc) used to encrypt RPC messages sent to guest
+    instance from trove-taskmanager and the messages sent from guest instance
+    to trove-conductor. This key is generated by trove-taskmanager
+    automatically and is injected into the guest instance when creating.
+
+  inst_rpc_key_encr_key
+    The database encryption key to encrypt per-instance PRC encryption key
+    before storing to Trove database.
+
+  management_networks
+    The management network, currently only one management network is allowed.
+
+  management_security_groups
+    List of the management security groups that are applied to the management
+    port of the database instance.
+
+  cinder_volume_type
+    Cinder volume type used to create volume that is attached to Trove
+    instance.
+
+  nova_keypair
+    Name of a Nova keypair to inject into a database instance to enable SSH
+    access.
+
+  default_datastore
+    The default datastore id or name to use if one is not provided by the user.
+    If the default value is None, the field becomes required in the instance
+    create request.
+
+  max_accepted_volume_size
+    The default maximum volume size (in GB) for an instance.
+
+  max_instances_per_tenant
+    Default maximum number of instances per tenant.
+
+  max_backups_per_tenant
+    Default maximum number of backups per tenant.
+
+  transport_url
+    The messaging server connection URL, e.g.
+    ``rabbit://stackrabbit:password@10.0.119.251:5672/``
+
+  control_exchange
+    The Trove exchange name for the messaging service, could be overridden by
+    an exchange name specified in the transport_url option.
+
+  reboot_time_out
+    Maximum time (in seconds) to wait for a server reboot.
+
+  usage_timeout
+    Maximum time (in seconds) to wait for Trove instance to become ACTIVE for
+    creation.
+
+  restore_usage_timeout
+    Maximum time (in seconds) to wait for Trove instance to become ACTIVE for
+    restore.
+
+  agent_call_high_timeout
+    Maximum time (in seconds) to wait for Guest Agent 'slow' requests (such as
+    restarting the instance server) to complete.
+
+keystone_authtoken group
+  Like most of other OpenStack services, Trove uses `Keystone Authentication
+  Middleware
+  <https://docs.openstack.org/keystonemiddleware/latest/middlewarearchitecture.html>`_
+  for authentication and authorization.
+
+service_credentials group
+  Options in this section are pretty much like the options in
+  ``keystone_authtoken``, but you can config another service user for Trove to
+  communicate with other OpenStack services like Nova, Neutron, Cinder, etc.
+
+  * auth_url
+  * region_name
+  * project_name
+  * username
+  * password
+  * project_domain_name
+  * user_domain_name
+
+database group
+  connection
+    The SQLAlchemy connection string to use to connect to the database, e.g.
+    ``mysql+pymysql://root:password@127.0.0.1/trove?charset=utf8``
+
+The cloud administrator also needs to provide a policy file
+``/etc/trove/policy.json`` if the default API access policies don't satisfy the
+requirement. To generate a sample policy file with all the default policies,
+run ``tox -egenpolicy`` in the repo folder and the new file will be located in
+``etc/trove/policy.yaml.sample``.
+
+
+Initialize Trove Database
+~~~~~~~~~~~~~~~~~~~~~~~~~
+This is controlled through `sqlalchemy-migrate
+<https://code.google.com/archive/p/sqlalchemy-migrate/>`_ scripts under the
+trove/db/sqlalchemy/migrate_repo/versions directory in this repository. The
+script ``trove-manage`` (which should be installed together with Trove
+controller software) could be used to aid in the initialization of the Trove
+database. Note that this tool looks at the ``/etc/trove/trove.conf`` file for
+its database credentials, so initializing the database must happen after Trove
+is configured.
+
+
+Launching the Trove Controller
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We recommend using upstart / systemd scripts to ensure the components of the
+Trove controller are all started and kept running.
+
+
+Preparing the Guest Images
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Now that the Trove system is installed, the next step is to build the images
+that we will use for the DBaaS to function properly. This is possibly the most
+important step as this will be the gold standard that Trove will use for a
+particular data store.
+
+.. note::
+
+    For the sake of simplicity and especially for testing, we can use the
+    prebuilt images that are available from OpenStack itself. These images
+    should strictly be used for testing and development use and should not be
+    used in a production environment. The images are available for download and
+    are located at http://tarballs.openstack.org/trove/images/.
+
+For use with production systems, it is recommended to create and maintain your
+own images in order to conform to standards set by the company's security team.
+In Trove community, we use `Disk Image Builder(DIB)
+<https://docs.openstack.org/diskimage-builder/latest/>`_ to create Trove
+images, all the elements are located in ``integration/scripts/files/elements``
+folder in the repo.
+
+Trove provides a script named ``trovestack`` to help build the image, refer to
+`Build images using trovestack
+<https://docs.openstack.org/trove/latest/admin/building_guest_images.html#build-images-using-trovestack>`_
+for more information. Make sure to use ``dev_mode=false`` for production
+environment.
+
+After image is created successfully, the cloud administrator needs to upload
+the image to Glance and make it only accessible to service users.
+
+
+Preparing the Datastore
+~~~~~~~~~~~~~~~~~~~~~~~
+After image is uploaded, the cloud administrator should create datastores,
+datastore versions and the configuration parameters for the particular version.
+
+It's recommended to config a default version for each datastore.
+
+
+Trove Deployment Verfication
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If all of the above instructions have been followed, it should now be possible
+to deploy Trove instances using the OpenStack CLI, communicating with the Trove
+V1 API.
+
+Refer to `Create and access a database
+<https://docs.openstack.org/trove/latest/user/create-db.html>`_ for detailed
+steps.