summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRick Aulino <rick.aulino@hp.com>2016-06-20 16:47:58 -0600
committerRick Aulino <rick.aulino@hp.com>2016-08-19 11:04:57 -0600
commita93189f0f319d3b74256229608246628c4be8af2 (patch)
tree6a559152628d928a85e4840a2510076baa4b5059
parent0f964956c0be8ae19d6fe4b3720f0095b533dd25 (diff)
Spec for performance enhancements for Searchlight indexing.
Notes
Notes (review): Code-Review+2: Steve McLellan <steven.j.mclellan@gmail.com> Code-Review+2: Travis Tripp <travis.tripp@hpe.com> Workflow+1: Travis Tripp <travis.tripp@hpe.com> Verified+2: Jenkins Submitted-by: Jenkins Submitted-at: Mon, 12 Sep 2016 17:15:30 +0000 Reviewed-on: https://review.openstack.org/331879 Project: openstack/searchlight-specs Branch: refs/heads/master
-rw-r--r--specs/newton/index-performance-enhancement.rst187
1 files changed, 187 insertions, 0 deletions
diff --git a/specs/newton/index-performance-enhancement.rst b/specs/newton/index-performance-enhancement.rst
new file mode 100644
index 0000000..203a80c
--- /dev/null
+++ b/specs/newton/index-performance-enhancement.rst
@@ -0,0 +1,187 @@
1
2..
3 c) Copyright 2016 Hewlett-Packard Enterprise Development Company, L.P.
4
5 Licensed under the Apache License, Version 2.0 (the "License"); you may
6 not use this file except in compliance with the License. You may obtain
7 a copy of the License at
8
9 http://www.apache.org/licenses/LICENSE-2.0
10
11 Unless required by applicable law or agreed to in writing, software
12 distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
13 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
14 License for the specific language governing permissions and limitations
15 under the License.
16
17=============================
18Index Performance Enhancement
19=============================
20
21https://blueprints.launchpad.net/searchlight/+spec/index-performance-enhancement.
22
23This feature will improve the performance of indexing resource types within Searchlight.
24
25Problem Description
26===================
27
28If the above link is too troublesome to follow, please indulge us while we
29plagiarize from the blueprint.
30
31When indexing (first time or re-indexing) we will index all resource group types
32sequentially. We loop through all plugins, indexing each one in turn. The result
33is that the time it takes to re-index is equal to the sum of the time for all
34plugins. This may take longer than it should. In some cases a lot longer.
35
36The time it takes to complete the full index is::
37
38 n
39 O( ∑ T(p) )
40 p=0
41
42When n is the number of plugins and T(p) is the time it takes for plugin p to
43index.
44
45We should change the algorithm to index in parallel, rather than in serial. As we
46are looping through each plugin to re-index, we should spin each indexing task
47into it's own thread. This way the time it takes to index is the time it takes
48the longest plugin to re-index.
49
50With this enhancement, the time it takes to complete the index is::
51
52 n
53 O( MAX( T(p) ) )
54 p=0
55
56To provide context for the design, we will review the current design for
57re-indexing. A re-indexing starts when the admin runs the command:
58
59``searchlight-manage index sync``
60
61Under the cover, ``searchlight-manage`` is doing the following:
62
63 * Determine which resource groups need to be re-indexed.
64 * Determine which resource types within each resource group needs to be
65 re-indexed.
66 * For each resource type that *does* need to be re-indexed,
67 ``searchlight-manage`` will call the plugin associated with that resource type.
68 The plugin will make API calls to that service and re-index the information.
69 * For each resource type that *does not* need to be re-indexed,
70 ``searchlight-manage`` will call ElasticSearch directly and re-index from the
71 old index into the new index.
72 * Once all re-indexing is complete, the ES aliases are adjusted and
73 ``searchlight-manage`` returns to the user.
74
75This implies the following:
76 * The admin must wait for all of the re-indexing to complete before
77 ``searchlight-manage`` finishes.
78 * When ``searchlight-manage`` finishes, the admin knows the exact state of the
79 re-index. Whether it completed successfully or if there was an error.
80
81Proposed Change
82===============
83
84As described in the blueprint, we would like to reduce the time to complete the
85re-index. Based on discussions with the blueprint and this spec, we will be
86implementing only the first enhancement in the blueprint. We will be using python
87threads to accomplish this task. We need to understand the design issues
88associated with implementing a multi-thread approach.
89
901. **Are the indexing plugins thread-safe?**
91
92If there are a lot of inter-dependencies within the plugins, it may not pay off
93to try to multi-thread the plugins. Reviewing the code and functionality of the
94plugins, they appear to be separate enough that they are good candidates to be
95moved into their own threads. The plugins are isolated from each other and do not
96depend on any internal structures to handle the actual indexing.
97
98**Design Proposal:** The individual plugins can be successfully threaded.
99
1002. **At what level should we create the indexing threads?**
101
102The obvious candidates are the resource type (e.g. OS::Nova::Server) or the
103resource type group (e.g. the index "searchlight"). The main reason that we are
104considering this enhancement is due to the large amount of time for a particular
105resource type, but not for a particular resource type group.
106
107Internal to ``searchlight-manage``, this distinction fades rather quickly. We use
108the resource type groups to only determine which resource types need to be
109re-indexed. We also have an existing enhancement within ``searchlight-manage``
110where we re-index through the plugin API only the resource types that were
111explicitly demanded by the user. All other resource types are re-indexed directly
112within ElasticSearch. We need to keep this enhancement.
113
114Keeping the current design intact means we will want to thread on the fine
115resource type level and not at the gross resource type group level. Based on the
116parent/child relationship that exists between some of the resource types, this is
117the "fine" level we will be considering.
118
119Since we are already using bulk commands for Elasticsearch re-indexing, we will
120place all of the Elasticsearch re-indexing into a single thread. Considering
121that this iwll be I/O bound on Elasticsearch's side, There does not appear
122to be any advantage of doing an Elasticsearch re-indexing for each resource type
123in a separate thread.
124
125**Design Proposal:** Whenever the indexing code currently calls the plugin API,
126it will create a worker in the thread pool.
127
128**Design Proposal:** All of the calls to ElasticSearch to re-index an existing
129index, will be placed in a single worker in the thread pool.
130
1313. **Mapping of plugins to threads**
132
133There may be a large number of plugins used with Searchlight. If each plugin
134has its own thread, we may be using a lot of threads. Instead of having a single
135thread map to a single plugin, we will use a thread pool. This will keep the
136number of threads to a managable level while still allowing for an appropriate
137level of asynchronous re-indexing. The size of the thread pool can be changed
138through a configuration option.
139
140**Design Proposal:** Use a thread pool.
141
1424. **When will we know to switch the ElasticSearch aliases?**
143
144In the serial model of re-indexing, it is trivial to know when to switch the
145ElasticSearch alias to the use the new index. It's when the last index finishes!
146Switching over to a model of asynchronous threads running in parallel potentially
147complicates the alias update.
148
149The indexing code will wait for all the threads to complete. When all threads
150have completed, the indexing code can continue with updating the aliases.
151
152**Design Proposal:** The alias switching code will be run after all of the
153threads have completed.
154
1555. **How do we clean up from a failed thread?**
156
157The indexing code will need to have the threads communicate if a catastrophic
158failure occurred. After all workers have been placed into the Thread pool, the
159main program will wait for all of the threads to finish. If any thread fails,
160it will raise an exception. The exception will be caught and the normal
161clean-up call will commence. All threads that are still waiting to run will be
162cancelled.
163
164**Design Proposal:** Catch exceptions thrown by a failing thread.
165
166For those following along with the code (searchlight/cmd/manage.py::sync), here
167is a rough guide to the changes. We will reference the sections as mentioned in
168the large comment blocks:
169
170* First pass: No changes.
171* Second pass: No changes.
172* Step #1: No changes.
173* Step #2: No changes.
174* Step #3: No changes.
175* Step #4: Use threads. Track thread usage.
176* Step #5: No changes.
177* Step #6: No changes.
178
179Alternatives
180------------
181
182We can always choose to not perform any enhancements. Or we can go back to the
183first draft of this spec.
184
185References
186==========
187