From 7d09c8b7bd7028f397f874c7009ac1cb1920e41d Mon Sep 17 00:00:00 2001 From: d34dh0r53 Date: Wed, 13 Dec 2017 13:32:28 -0600 Subject: [PATCH] Add spec for ELK stack in openstack-ansible * With edits and clarifications * Addional edits, formatting and redo the links Change-Id: Ib08c7a6e3cb229c81f4c65aba856ec01db359c1a --- specs/queens/elk-stack.rst | 200 +++++++++++++++++++++++++++++++++++++ 1 file changed, 200 insertions(+) create mode 100644 specs/queens/elk-stack.rst diff --git a/specs/queens/elk-stack.rst b/specs/queens/elk-stack.rst new file mode 100644 index 0000000..44409d0 --- /dev/null +++ b/specs/queens/elk-stack.rst @@ -0,0 +1,200 @@ +ELK Stack +######### +:date: 2017-12-11 11:00 +:tags: logging, monitoring, operations + +Blueprint on Launchpad: + + * https://blueprints.launchpad.net/openstack-ansible/+spec/elk-stack + +Log file analysis is an important part of maintaining and troubleshooting +OpenStack clouds, but using traditional single server methodology to analyze +the logs on clouds with tens, hundreds or thousands of servers can become +problematic and unwieldy. By leveraging the search, collation and analysis +features of the ELK (Elasticsearch`[1]`_, Logstash`[2]`_ and Kibana`[3]`_) stack +we can provide a cloud level view of all of the log files. The ELK stack also +provides the ability to correlate log messages across various services, perform +detailed log analysis and do trending based on metrics derived from log +messages. + +Problem description +=================== + +For deployers and operators findings specific events in the myriad log files +produced by the various OpenStack, system and ancillary services can be tedious +and error prone. With traditional tools the possibility of missing critical log +entries grows as the size of the cluster increases. Log file analysis provides +vital information about the state of the OpenStack services as well as the +underlying hardware. Currently there are no tools provided by OpenStack-Ansible +to detailed log analysis, correlation and trending. + +Proposed change +=============== + +Utilizing the logging/utility node we install the ELK stack in containers, logs +are shipped from the individual nodes/containers using the Filebeat package. +Using Filebeat to perform the initial log shipping allows us to do initial +multiline parsing distributing the load away from a single Logstash container. +Version requirements of the ELK packages will be maintained in the ELK roles and +barring security fixes the major version of those packages should not change +during the release cycle of Openstack. The ELK roles are consumed via Ansible +Galaxy pointing to specific SHAs. + +Notable changes: + * Create 3 containers on the logging/utility node, one each for Elasticsearch, + Logstash and Kibana. (Additional containers can be created to facilitate HA if + needed.) + * Install the Filebeat package on all nodes/containers + * ELK and Filebeats galaxy role SHAs added to `ansible-requirements.yml` + + +Alternatives +------------ + +Logs are currently shipped to a centralized rsyslog-server container on the +logging/utility server allowing for some sort of centralized log parsing using +command line utilities. There are other 3rd party solutions with various levels +of cost, adoption and support. + +Playbook/Role impact +-------------------- + +The changes required are located in stand alone playbooks. Additional roles +will need to be created for Logstash, Kibana and Filebeat, the +`ansible-elasticsearch``[4]`_ maintained by elastic.co provides Elasticsearch. +Configuration can be stand-alone or integrated into the `user-variables.yml` and +`user-secrets.yml` files. + + +Upgrade impact +-------------- + +As this is the initial implementation there is no upgrade impact. Future +versions will require upgrade planning as it may be necessary to upgrade +versions of the ELK packages, OpenJDK packages and possibly the Elasticsearch +database itself. + + +Security impact +--------------- + +This software provides a web based front end as well as API access to any +information contained in the Openstack, service and system logs that are +shipped to it. As such it will need to be only visible to authenticated users. +All access can be secured through the traditional hardening that is applied to +any standard web service, namely TLS and an authentication mechanism. +Furthermore since the ELK stack is behind a VIP we can limit access to certain +IPs and/or networks via a number of ACLs. + +By default logs are shipped in plaintext, it is possible, however, to enable +SSL encryption on this transport should it be needed. + + +Performance impact +------------------ + +Based on testing and real-world analysis the largest performance impact will be +on the logging/utility server. As this devices original intent was to perform +log processing this is expected and not unusual. The filebeat service running +in each node/container has demonstrated a negligible performance impact, but +certain best practices such as limiting logging levels and eliminating +tracebacks in the logs will help maintain the light footprint. Filebeat should +not impact the operation of any Openstack services as it is simply a log file +processor/shipper, although network utilization could be a concern should debug +logging be enabled on a particularly busy service. + +Elastic.co is the maintainer of all of the software other than Java, which is +maintained by Oracle corporation. Both of these entities provide enterprise +software and thus follow strict release schedules and have reliable upstream +repositories for their software. + + +End user impact +--------------- + +End users should not notice the changes from this work. This is primarily +intended for deployers and operators. This change does give operations teams +more insight into the environment and will hopefully facilitate a more +performant and stable deployment. + + +Deployer impact +--------------- + +The ELK stack is an optional component and does not directly interact with any +Openstack services. All of the ELK packages are provided via apt/yum +repositories. An additional secret will need to be created for the `kibana` +user. The filebeat package will be installed in all containers and on all +nodes but it is extremely lightweight, with configuration stored in +`/etc/filebeat`. Java is required for ELK so the `openjdk` (default) or JDK +implementation of the deployers choosing will need to be installed in three +containers on the logging/utility node. + + +Developer impact +---------------- + +This should be a minimal change for developers, the one thing that they will +need to keep in mind is if additional log files are added they will need to be +added to the filebeat configuration, this can be handled by re-running the +filebeat play against the containers with the new logs. + + +Dependencies +------------ + +There are no dependencies. + +Implementation +============== + +Assignee(s) +----------- + +Primary Assignee: + David Wilde (d34dh0r53) + + +Work items +---------- + +1. Create ELK and filebeats roles in openstack-ansible, these roles will be + generic enough to be published to ansible-galaxy so that they are usable + by the Ansible community at large. +2. Create playbook(s) to install the ELK stack and filebeats, these playbooks + will install the OpenStack specific configuration and parsing files. +3. Create testing procedures for the stack +4. Documentation + +Testing +======= + +The ELK stack should be tested on each commit by ensuring that the services +start and that logs are flowing into the system and being parsed correctly. +This can be acomplished by injecting a line into a services log file and then +using the elasticsearch API via curl to verify that the line was correctly +inserted into the database with the expected fields parsed. + + +Documentation impact +==================== + +Along with the general installation procedures and configuration the key points +of documentation will be: + * Filebeats parsing rules + * Logstash parsing rules + * Kibana dashboard configuration + * The default Kibana dashboard + * Performance impact and tuning of the ELK stack + + +References +========== + +[1] https://elastic.co/products/elasticearch + +[2] https://elastic.co/products/logstash + +[3] https://elastic.co/products/kibana + +[4] https://github.com/elastic/ansible-elasticsearch