Commit Graph

136 Commits

Author SHA1 Message Date
Guillaume Chauvel 66ba8442dc Create SSL context using PROTOCOL_TLS, fallback to highest supported version
Zuul test: tests.unit.test_scheduler.TestSchedulerSSL.test_jobs_executed
fails on ubuntu focal with the following exception:

Traceback (most recent call last):
  File "/home/gchauvel/zuul/zuul/.tox/py38/lib/python3.8/site-packages/gear/__init__.py", line 2835, in _doConnectLoop
    self.connectLoop()
  File "/home/gchauvel/zuul/zuul/.tox/py38/lib/python3.8/site-packages/gear/__init__.py", line 2865, in connectLoop
    c = context.wrap_socket(c, server_side=True)
  File "/usr/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/usr/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL] internal error (_ssl.c:1108)

This is due to libssl1.1 being compiled with
"-DOPENSSL_TLS_SECURITY_LEVEL=2" and gear forcing TLSv1.0

Extracted from ubuntu source package:

  The default security level for TLS connections was increased from
  level 1 to level 2. This moves from the 80 bit security level to the
  112 bit security level and will require 2048 bit or larger RSA and
  DHE keys, 224 bit or larger ECC keys, SHA-2, TLSv1.2 or DTLSv1.2.

Allowing to negotiate TLS to the highest available version between
server and client solves the issue, provided that TLSv1.2 is useable.
The option is supported by in the latest version of all pythons >=3.5
[1][2][3]. Unfortunately Xenial doesn't have latest 3.5 and lacks the
ssl.PROTOCOL_TLS definition. We provide a fallback to select the highest
version of TLS supported in that case.

There is some risk using the fallback beacuse both the client and server
need to agree on the version supported in this case. Xenial python 3.5
does support TLSv1_2 which means that for all practical purposes TLS
v1.2 should be available on all platforms that gear runs avoiding this
problem.

Disable TLSv1.3:
According to https://bugs.python.org/issue43622#msg389497, an event on
ssl socket can happen without data being available at application level.
As gear is using a polling loop with multiple file descriptors and ssl
socket used as a blocking one, a blocked state could happen.
This is highlighted by Zuul SSL test: TestSchedulerSSL, where such
blocked state appears consistently.
note: gear tests and zuul tests are ok when using TLSv1.2 but the
previous behavior could also happen

[1] https://docs.python.org/2.7/library/ssl.html?highlight=protocol_tls#ssl.PROTOCOL_TLS
[2] https://docs.python.org/3.5/library/ssl.html?highlight=protocol_tls#ssl.PROTOCOL_TLS
[3] https://docs.python.org/3/library/ssl.html?highlight=protocol_tls#ssl.PROTOCOL_TLS

Change-Id: I5efb6c0576987815c5b93f8bc4020cdee2898d04
2021-03-30 14:56:38 +02:00
Zuul 29e9d1fa99 Merge "wakeConnections: Randomize connections before scanning them" 2021-03-10 19:41:24 +00:00
Zuul edcc55b1e9 Merge "Move handleDisconnect into BaseClientServer" 2021-03-10 19:41:21 +00:00
Zuul 3f6fa2da29 Merge "Modify connection timeout process" 2021-03-10 19:05:27 +00:00
Ahmon Dancy a160d10735 wakeConnections: Randomize connections before scanning them
gear/__init__.py:
 Modified Server.wakeConnections() so that it randomizes the
 list of active connections before sending out NOOP's to them.
 This will hopefully spread workload across machines more evenly
 when there are multiple workers per machine.

Reference: https://phabricator.wikimedia.org/T258630

Change-Id: I05dcb9fa383f3aefc8b5b1bb9dd8b3ff6ff7f37d
2021-03-10 18:15:07 +01:00
Fabien Boucher 79e1c3009e Bump crypto requirement to accomodate security standards
Depends-on: https://review.opendev.org/742165

On Fedora rawhide the gear package no longer build.
https://koschei.fedoraproject.org/package/python-gear?

This patch ensures that the ssl engine does not complains about:
- ssl.SSLError: [SSL: EE_KEY_TOO_SMALL] ee key too small (_ssl.c:2951)
- ssl.SSLError: [SSL: CA_MD_TOO_WEAK] ca md too weak (_ssl.c:2951)

To reproduce the issue:
podman run -it --root fedora:rawhide
dnf install git libffi-devel python-devel tox gcc
git clone https://opendev.org/opendev/gear.git && cd gear
tox -epy39
tox -epy38

Change-Id: I57cd9c4750f27b7b76e92a0eef03e7de70c13dd5
2020-07-21 12:16:31 +00:00
Tobias Henkel 9261c8f4a7
Move handleDisconnect into BaseClientServer
It's called from there using self.handleDisconnect so define it there
as well.

Change-Id: I90fef55a79168e082e69f81a717c08badd4163a9
2020-03-24 17:43:46 +01:00
Tobias Henkel 9933f06821
Revert "Add BSD/Darwin support."
This floods the zuul test logs with poll messages which indicates that
something changed to a busy loop.

This reverts commit 103ad3e8ed.

Change-Id: Id3347136507e7e65ccde937f1c2fd303aa3dfbbe
2020-02-17 22:36:01 +01:00
Tobias Henkel c9f23749c6
Ignore keepalive on unsupported platforms
Gear currently supports keepalive only on linux platforms. On mac the
socket must be configured differently. For now just ignore the
keepalive flag in this case and emit a warning.

Change-Id: I276967b720742fa64e5bc6eb769c75590141275c
2020-01-16 09:11:51 +01:00
Tobias Henkel 103ad3e8ed Add BSD/Darwin support.
Switch between Epoll and Poll depending on the OS capabilities.

Change-Id: Iaf1324d0c9d43c76e3f228f1a176e453a82a71a4
Co-Authored-By: Jan Kubovy <jan.kubovy@bmw.de>
Co-Authored-By: Tobias Henkel <tobias.henkel@bmw.de>
2019-10-15 10:45:22 +02:00
Zuul 483ab492a8 Merge "Add support for server name indication" 2019-06-17 16:17:30 +00:00
Zuul 8d3ea4bcff Merge "add missing str to bytes conversion for Python3" 2019-05-06 15:58:05 +00:00
Tobias Henkel 58b2f277b7
Add support for server name indication
According to the python docs [1] it is recommended to use
SSLContext.wrap_socket to create an ssl connection since Python 3.2
and 2.7.9. This enables us to also leverage server name indication
(SNI).

One use case where SNI is beneficial is an easy and standard way to
route traffic into an Openshift cluster. The most common way to get
traffic into an Openshift cluster is using a routes. The routes in an
openshift cluster work with either HTTP, HTTPS with SNI or TLS with
SNI [2]. TLS with SNI in this case also works with non-http
connections like gearman is using.

[1] https://docs.python.org/3/library/ssl.html#socket-creation
[2] https://docs.okd.io/3.11/dev_guide/expose_service/expose_internal_ip_router.html#overview

Change-Id: I19c1edc4a14a303d2a91894e0065c8d31f89ce24
2019-05-04 12:11:41 +02:00
Zuul 6a7417702d Merge "Prefer ipv6 for listen addrs if available" 2019-01-28 16:21:02 +00:00
Clark Boylan ca733f3e07 Prefer ipv6 for listen addrs if available
If the listen address allows for ipv4 or ipv6 values we want to prefer
ipv6 if the host is configured with working ivp6. We add the ai_flag
AF_ADDRCONFIG to filter out ipv6 (and ipv4) if the host isn't configure
for this AF_INET version. Then we sort based on the family to prefer
ipv6 over ipv4.

The reason for this is clients will prefer ipv6 before falling back to
ipv4 when attempting to connect to a hostname. If the server isn't
listening on ipv6 this makes new connections happen slowly.

Change-Id: I9f7a235b04068856c6cceeb2c230f3b56945572e
2018-10-19 14:33:37 -07:00
Tobias Henkel 71dbac070c
Add support for keepalive to client
A gearman client only waiting for jobs will wait indefinitely if the
gearman server vanishes (e.g. due to a VM crash). In this case there
is no traffic on the connection and the client blocks forever if there
is nothing in between that forcefully terminates the connection.

Adding tcp keepalive can mitigate that and the connection will be
terminated by the kernel in this situation which then triggers a
reconnect.

Change-Id: I8589cd45450245a25539c051355b38d16ee9f4b9
2018-09-04 13:50:04 +02:00
Benoit Bayszczak f75a3c8509 add missing str to bytes conversion for Python3
Using Python3, Gearman unexpectedly closed the socket with a client when typing the 'workers' command.

gearman-debug.log:
    File "/usr/local/lib/python3.5/dist-packages/gear/__init__.py", line 3250, in handleWorkers
      (fd, ip, client_id.decode('utf8'),
  AttributeError: 'str' object has no attribute 'decode'

Change-Id: I610bd44c76a0e52f8d4e8f24c82c636d4ebef0ae
2018-08-01 11:06:00 +02:00
Paul Belanger c00ca944db
Add --listen-address flag to geard
Add the ability for an operator to control which interface to listen
on. By default we use None to maintain backwards compatibility.

Change-Id: I14c13ff500317d5a7b580e1b2a7f798a8db5de1d
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2018-04-16 16:27:49 -04:00
James E. Blair eee504755d Automatically send GRAB_JOB after CAN_DO
After registering a function, if a connection is sleeping (ie,
waiting for a NOOP to wake it up), cause it to send another
grab_job in case the newly registered job is something it can
handle.

Change-Id: Ibea13726f4a451ebc67850b17e168bd4accfbc0b
2018-02-02 11:19:14 -08:00
James E. Blair da802f2828 Make workers admin command py3 safe
Change-Id: I05598ceea20169a625bbb47f15288e6eadbc88d2
2018-02-01 12:02:14 -08:00
shangxdy 8a76de29e1 Modify connection timeout process
When connection to gearman server is timeout, it's necessary to
release lock before raising a exception, otherwise the client may
be dead locked.

Change-Id: Idc9c9c4bd439b122dc7855ca05d962c4e6687829
Signed-off-by: shangxdy <shang.xiaodong@zte.com.cn>
2018-01-26 15:31:37 +08:00
James E. Blair 49d4bef593 server: make stats more efficient
Keep track of the three main queue stats we report, so that we don't
have to iterate over all the queued jobs to determine whether they
are running.

Also, drop the workers stat because it's not very useful and not
entirely trivial to calculate.

Change-Id: Id42a05e5626096d1100a9cb9e8166c8ec5103b41
2017-10-30 11:46:08 -07:00
Jenkins c4a5decdc6 Merge "Add a send lock to the base Connection class" 2017-10-09 15:40:40 +00:00
James E. Blair b24b4d2f6e Add a send lock to the base Connection class
The sendRaw method (and therefore sendPacket) was originally
thread safe by virtue of consisting of a single socket.send()
call, but when we added SSL, we added a loop within the method
to handle the increased likelihood that not all data would be
sent in one call.  Of course, the method should have been written
this way to start with.

However, this means that we can end up with partial packets being
sent before a context switch to another thread which may also
want to send a packet.  To handle that case, place the entire
method in a lock.

Note, this doesn't affect server connections as they use a
non-blocking connection which has a send queue, so only one thread
ever actuall transmits.

Change-Id: I3bda6fda5f762d18f28b56a43b7dc28f37dbc427
2017-09-30 09:18:27 -07:00
James E. Blair ca4666f005 Server: support background jobs
Change-Id: Ic15ab05c16f143f1d557d935aecb6d0afe419d59
2017-09-13 18:21:39 -06:00
James E. Blair bf8d96cb77 Fix exception setter
This method had a typo.

Change-Id: I49b745387626f664235998de52d4c58927149b4d
2017-05-18 17:19:46 -07:00
Jenkins d76e998959 Merge "Replace assertEquals with assertEqual" 2017-05-17 16:59:54 +00:00
Clint Byrum afb9001c28 Provide TextJob and TextWorker for convenience
These classes will provide a smoother path from python2 -> python3 when
using gear, as it will relieve the need for encoding/decoding.

Change-Id: I93bfe33f898294f30a82c0a24a18a081f9752354
2017-05-15 10:49:51 -07:00
Luong Anh Tuan dfd79cdd1f Replace assertEquals with assertEqual
The method assertEquals has been deprecated since python 2.7.
http://docs.python.org/2/library/unittest.html#deprecated-aliases

Also in Python 3, a deprecated warning is raised when using assertEquals
therefore we should use assertEqual instead.

Change-Id: Ic99ea3f57bb6fd986ce2626e800e3d310a2e3df7
Closes-Bug: #1218185
2016-11-22 09:50:33 +07:00
Clint Byrum 6d479a0196 Make Job.name assume utf-8.
This breaks the API in subtle ways, but only really for people who want
to use Python 3 and/or not utf-8 function names.

Change-Id: If6bfc35d916cfb84d630af59f4fde4ccae5187d4
2016-11-16 12:14:16 -08:00
Monty Taylor 616d56bc62
Re-enable flake8
The line in tox.ini "select=H231" did not add H231 to the list of flake8
checks. It made us only run H231.

Remove the H231 line and then fix the flake8 issues. The most
substantial of these was the shadowing of the queue module with local
variables named queue. Made that one import queue as queue_mod because
changing the variable names in the code was yuckier to look at.

Change-Id: I4a26a20889132f7f4525b17392088dda6bd5bbd2
2016-11-08 11:57:04 -06:00
Jenkins 2f7e5b6e61 Merge "Allow setting a timeout for Client.waitForServer()" 2016-07-21 16:20:00 +00:00
Jenkins b604bc9fb6 Merge "Handle SIGINT correctly in Client.waitForServer()" 2016-07-21 16:19:54 +00:00
James E. Blair 1f8889595b Add test for AdminRequest.isComplete
Change-Id: I16fd6b6debc3171ec9a346be079c32018e5b5f64
2016-07-18 10:32:00 -07:00
James E. Blair 2defbeaa1b Parse admin requests 2880 times faster
Both returning the incomplete data from the isComplete method, and
scanning the entire incomplete string for the terminator each time
are O(n^2).

Also, returning the string is *very* slow, accounting for 99.6% of
the speedup; scanning the string each time accounts for 0.6%.

Both changes together find the terminus of a 481MiB response in
about 1.6 seconds compared to 76 minutes prior to this change.

Change-Id: I3614e8f3ff8ad7c3d9ca6da1b520b89cd9d5d603
2016-07-17 08:05:57 -07:00
James E. Blair 550873b76a Geard: Handle connections closed while sending
If a connection is closed while sending a large amount of data,
the send() call may return EAGAIN, even though the socket is in
CLOSE_WAIT.  In that case, just continue queuing the data, and
let the main loop handle the disconnect (which it will detect
via the poll call followed by a null read).

Change-Id: Ib15eae81077b58d2f95fb0989ed1139c6d542f49
2016-07-15 14:56:42 -07:00
Sagi Shnaidman a3ee71cc53 Support TCP keepalives in geard
Support configuration of TCP keepalives in gear server.
When connections are closed not gracefully they stays opened
for a long time, which leads to file descriptors leak.
Default values for interval, probe and count are such as in
most Linux distributions.
By default is disabled to keep compatibility with older versions.

Change-Id: I335f68a24dda409b1a48c6e4e520ec08dfd38078
2016-07-12 15:37:31 +03:00
Morgan Fainberg 59d29104cb Do not encode the type (b) in the job name
Don't encode string type in the job name in handleRequest(). Without
the explicit .decode() before passing to the string formatter it
would end up looking like b"b'<jobname'\t\n...." which made for
working with gear on the other side near impossible because a
decode of the job data would not result in a consumable string
matching expected values.

Now there is an explicit decode prior to formatting the string
data.

Change-Id: Ib2996b84bce719a2f91c166aaa4278c18f89f88f
2016-06-08 13:06:09 -07:00
Jenkins d0520c2754 Merge "Switch to six for configparser" 2016-06-02 07:19:28 +00:00
Paul Belanger 1d772e70ff
Switch to six for configparser
I noticed fedora doesn't yet ship python3-gear so this is the main
reason to add six support.

Change-Id: I3d8d37870a0ab865dc0a32e123bfd16e91fa5b72
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-06-02 10:02:59 +03:00
Jenkins f3d1c06c37 Merge "Add a test for worker termination" 2016-06-02 07:02:10 +00:00
Jenkins 9e98e23c86 Merge "Remove shebang as the script is managed by an entry_point" 2016-06-02 06:55:34 +00:00
Morgan Fainberg 46fcc461b6 Do not change object size when iterating
In `gear.Server._lostConnection()` the jobs list pulls .values() from
`conn.related_jobs`, in `gear.server._removeJob()` it is possible to
change the conn.related_jobs dictionary which will cause a
`RuntimeError: dictionary changed size during iteration` exception to
be raised in python 3 since .values() is an iterator instead of a
rendered list() like in python 2. This RuntimeError is eliminated
by forcing the python 2 behavior by wrapping the `.values()` call
to be wrapped into an explicit list.

Change-Id: I730473f3200a427cce774df534f1ff4958866dac
2016-05-29 23:08:05 +00:00
Sam Thursfield b7107e2e62 Allow setting a timeout for Client.waitForServer()
Change-Id: I614de364a668b1ae01ad361254fd4afcdfe48051
2016-04-12 13:48:20 +01:00
Fabien Boucher c2f861b491 Remove shebang as the script is managed by an entry_point
The shebang is not needed in cmd/geard.py. The file
is not intended to be used directly as it is managed
by an entry_point.

Change-Id: I4e2a8715cca81a8626f6bcd86a368a537305bc0b
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-02-16 13:19:19 -05:00
Gregory Haynes c313be6e18 Guard against double shutdown in client
If we have already performed a shutdown we can prevent exploding when
writing to self.wake_write if we short circuit.

Change-Id: If5352a373d5fa61dd1ee661f4a37976b3447dd9d
2016-01-26 20:15:08 +00:00
James E. Blair f7a2b658c9 Add a test for worker termination
This was an attempt to find a code path that could cause
stopWaitingForJobs to raise an exception.  So far, that has failed,
but add this test anyway to exercise some of that code.

Change-Id: I39955fc7250af9b93f336c7c3e0315528bd8d4a9
2015-12-03 10:48:32 -08:00
James E. Blair 5491e93d95 Do not raise RetryIOError on blocked write
When EAGAIN is returned on a socket write, do not raise RetryIOError.
This exception is used principally to indicate that we have encountered
a blocked read, and a method farther up the call stack may need to
perform some cleanup.  However, within the sendQueuedData method,
all necessary cleanup is performed by the finally handler itself.
Therefore, it is safe to ignore a blocked write, and count on either
a subsequent write (which will append data to the queue and retry
sending the oldest data) or the poll edge trigger to indicate that
we should retry the write.

There is nothing in the call stack above this method that has an
exception handler for RetryIOError.  Therefore, in the current state,
the raised exception was causing connections to be disconnected
due to the error.

Change-Id: I9211fb6365f8f3b6dd0310430cf97926ce1f5d07
2015-06-18 16:55:01 -07:00
Sam Thursfield 0558e173e9 Handle SIGINT correctly in Client.waitForServer()
The Python 2.7 threading.Condition.wait() function will block the
process from responding to any signals until the condition triggers.

This was especially bad for commandline applications. Before this
commit, if an application didn't manage to connect to the Gearman server
it would block forever and ignore CTRL+C.

Adding a 'timeout' works around this issue. Behaviour is unchanged
except that the 'Waiting for at least one active connection' debug
message will now be logged once per second.

This problem is fixed upstream in Python 3.2:
<https://bugs.python.org/issue8844>

Change-Id: Ib1043948b1b37a4a6732176314b8a243aad73397
2015-05-19 15:05:25 +00:00
James E. Blair bb681360fc Fix regression in wakeConnections
In d5e03ac930, an optional argument
to wakeConnections was added so that only connections related to
a job would be awoken.  However, the default value, None, meant
that no connection would be awoken.  This doesn't make much sense,
and indeed, Zuul, in its unit tests, was relying on all connections
being awoken.  Fix this by awakening all connections if the job
argument is None (which never happens within gear itself).

Change-Id: Ib7632ae0353c414cc1b68dbe430cb57292058012
2015-05-07 10:29:26 -07:00