Add spec for ELK stack in openstack-ansible
* With edits and clarifications * Addional edits, formatting and redo the links Change-Id: Ib08c7a6e3cb229c81f4c65aba856ec01db359c1a
This commit is contained in:
parent
5ec2dde0da
commit
7d09c8b7bd
|
@ -0,0 +1,200 @@
|
|||
ELK Stack
|
||||
#########
|
||||
:date: 2017-12-11 11:00
|
||||
:tags: logging, monitoring, operations
|
||||
|
||||
Blueprint on Launchpad:
|
||||
|
||||
* https://blueprints.launchpad.net/openstack-ansible/+spec/elk-stack
|
||||
|
||||
Log file analysis is an important part of maintaining and troubleshooting
|
||||
OpenStack clouds, but using traditional single server methodology to analyze
|
||||
the logs on clouds with tens, hundreds or thousands of servers can become
|
||||
problematic and unwieldy. By leveraging the search, collation and analysis
|
||||
features of the ELK (Elasticsearch`[1]`_, Logstash`[2]`_ and Kibana`[3]`_) stack
|
||||
we can provide a cloud level view of all of the log files. The ELK stack also
|
||||
provides the ability to correlate log messages across various services, perform
|
||||
detailed log analysis and do trending based on metrics derived from log
|
||||
messages.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
For deployers and operators findings specific events in the myriad log files
|
||||
produced by the various OpenStack, system and ancillary services can be tedious
|
||||
and error prone. With traditional tools the possibility of missing critical log
|
||||
entries grows as the size of the cluster increases. Log file analysis provides
|
||||
vital information about the state of the OpenStack services as well as the
|
||||
underlying hardware. Currently there are no tools provided by OpenStack-Ansible
|
||||
to detailed log analysis, correlation and trending.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Utilizing the logging/utility node we install the ELK stack in containers, logs
|
||||
are shipped from the individual nodes/containers using the Filebeat package.
|
||||
Using Filebeat to perform the initial log shipping allows us to do initial
|
||||
multiline parsing distributing the load away from a single Logstash container.
|
||||
Version requirements of the ELK packages will be maintained in the ELK roles and
|
||||
barring security fixes the major version of those packages should not change
|
||||
during the release cycle of Openstack. The ELK roles are consumed via Ansible
|
||||
Galaxy pointing to specific SHAs.
|
||||
|
||||
Notable changes:
|
||||
* Create 3 containers on the logging/utility node, one each for Elasticsearch,
|
||||
Logstash and Kibana. (Additional containers can be created to facilitate HA if
|
||||
needed.)
|
||||
* Install the Filebeat package on all nodes/containers
|
||||
* ELK and Filebeats galaxy role SHAs added to `ansible-requirements.yml`
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Logs are currently shipped to a centralized rsyslog-server container on the
|
||||
logging/utility server allowing for some sort of centralized log parsing using
|
||||
command line utilities. There are other 3rd party solutions with various levels
|
||||
of cost, adoption and support.
|
||||
|
||||
Playbook/Role impact
|
||||
--------------------
|
||||
|
||||
The changes required are located in stand alone playbooks. Additional roles
|
||||
will need to be created for Logstash, Kibana and Filebeat, the
|
||||
`ansible-elasticsearch``[4]`_ maintained by elastic.co provides Elasticsearch.
|
||||
Configuration can be stand-alone or integrated into the `user-variables.yml` and
|
||||
`user-secrets.yml` files.
|
||||
|
||||
|
||||
Upgrade impact
|
||||
--------------
|
||||
|
||||
As this is the initial implementation there is no upgrade impact. Future
|
||||
versions will require upgrade planning as it may be necessary to upgrade
|
||||
versions of the ELK packages, OpenJDK packages and possibly the Elasticsearch
|
||||
database itself.
|
||||
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
This software provides a web based front end as well as API access to any
|
||||
information contained in the Openstack, service and system logs that are
|
||||
shipped to it. As such it will need to be only visible to authenticated users.
|
||||
All access can be secured through the traditional hardening that is applied to
|
||||
any standard web service, namely TLS and an authentication mechanism.
|
||||
Furthermore since the ELK stack is behind a VIP we can limit access to certain
|
||||
IPs and/or networks via a number of ACLs.
|
||||
|
||||
By default logs are shipped in plaintext, it is possible, however, to enable
|
||||
SSL encryption on this transport should it be needed.
|
||||
|
||||
|
||||
Performance impact
|
||||
------------------
|
||||
|
||||
Based on testing and real-world analysis the largest performance impact will be
|
||||
on the logging/utility server. As this devices original intent was to perform
|
||||
log processing this is expected and not unusual. The filebeat service running
|
||||
in each node/container has demonstrated a negligible performance impact, but
|
||||
certain best practices such as limiting logging levels and eliminating
|
||||
tracebacks in the logs will help maintain the light footprint. Filebeat should
|
||||
not impact the operation of any Openstack services as it is simply a log file
|
||||
processor/shipper, although network utilization could be a concern should debug
|
||||
logging be enabled on a particularly busy service.
|
||||
|
||||
Elastic.co is the maintainer of all of the software other than Java, which is
|
||||
maintained by Oracle corporation. Both of these entities provide enterprise
|
||||
software and thus follow strict release schedules and have reliable upstream
|
||||
repositories for their software.
|
||||
|
||||
|
||||
End user impact
|
||||
---------------
|
||||
|
||||
End users should not notice the changes from this work. This is primarily
|
||||
intended for deployers and operators. This change does give operations teams
|
||||
more insight into the environment and will hopefully facilitate a more
|
||||
performant and stable deployment.
|
||||
|
||||
|
||||
Deployer impact
|
||||
---------------
|
||||
|
||||
The ELK stack is an optional component and does not directly interact with any
|
||||
Openstack services. All of the ELK packages are provided via apt/yum
|
||||
repositories. An additional secret will need to be created for the `kibana`
|
||||
user. The filebeat package will be installed in all containers and on all
|
||||
nodes but it is extremely lightweight, with configuration stored in
|
||||
`/etc/filebeat`. Java is required for ELK so the `openjdk` (default) or JDK
|
||||
implementation of the deployers choosing will need to be installed in three
|
||||
containers on the logging/utility node.
|
||||
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
This should be a minimal change for developers, the one thing that they will
|
||||
need to keep in mind is if additional log files are added they will need to be
|
||||
added to the filebeat configuration, this can be handled by re-running the
|
||||
filebeat play against the containers with the new logs.
|
||||
|
||||
|
||||
Dependencies
|
||||
------------
|
||||
|
||||
There are no dependencies.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary Assignee:
|
||||
David Wilde (d34dh0r53)
|
||||
|
||||
|
||||
Work items
|
||||
----------
|
||||
|
||||
1. Create ELK and filebeats roles in openstack-ansible, these roles will be
|
||||
generic enough to be published to ansible-galaxy so that they are usable
|
||||
by the Ansible community at large.
|
||||
2. Create playbook(s) to install the ELK stack and filebeats, these playbooks
|
||||
will install the OpenStack specific configuration and parsing files.
|
||||
3. Create testing procedures for the stack
|
||||
4. Documentation
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
The ELK stack should be tested on each commit by ensuring that the services
|
||||
start and that logs are flowing into the system and being parsed correctly.
|
||||
This can be acomplished by injecting a line into a services log file and then
|
||||
using the elasticsearch API via curl to verify that the line was correctly
|
||||
inserted into the database with the expected fields parsed.
|
||||
|
||||
|
||||
Documentation impact
|
||||
====================
|
||||
|
||||
Along with the general installation procedures and configuration the key points
|
||||
of documentation will be:
|
||||
* Filebeats parsing rules
|
||||
* Logstash parsing rules
|
||||
* Kibana dashboard configuration
|
||||
* The default Kibana dashboard
|
||||
* Performance impact and tuning of the ELK stack
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[1] https://elastic.co/products/elasticearch
|
||||
|
||||
[2] https://elastic.co/products/logstash
|
||||
|
||||
[3] https://elastic.co/products/kibana
|
||||
|
||||
[4] https://github.com/elastic/ansible-elasticsearch
|
Loading…
Reference in New Issue