Merge "S3 data source URL format change"

This commit is contained in:
Zuul 2018-08-06 18:03:33 +00:00 committed by Gerrit Code Review
commit ee7964f830
4 changed files with 18 additions and 7 deletions

View File

@ -135,7 +135,7 @@ share will be automatically mounted to your cluster's nodes as needed to
access the data source.
Finally, Sahara supports data sources referring to S3-like object stores. The
URL should be of the form ``s3a://{bucket}/{path}``. Also, the following
URL should be of the form ``s3://{bucket}/{path}``. Also, the following
credentials/configs are understood: ``accesskey``, ``secretkey``,
``endpoint``, ``bucket_in_path``, and ``ssl``. These credentials are specified
through the ``credentials`` attribute of the body of the request when creating
@ -632,13 +632,13 @@ Manila NFS filesystem reference URLS take the form:
This format should be used when referring to a job binary or a data source
stored in a manila NFS share.
For job binaries only, S3 urls take the form:
For both job binaries and data sources, S3 urls take the form:
``s3://bucket/path/to/object``
For data sources, S3 urls take the standard Hadoop form:
``s3a://bucket/path/to/object``
Despite the above URL format, the current implementation of EDP will still
use the Hadoop ``s3a`` driver to access data sources. Botocore is used to
access job binaries.
EDP Requirements
================

View File

@ -0,0 +1,4 @@
---
other:
- |
The URL of an S3 data source may have `s3://` or `s3a://`, equivalently.

View File

@ -55,8 +55,9 @@ class S3Type(DataSourceType):
raise ex.InvalidDataException(_("S3 url must not be empty"))
url = urlparse.urlparse(url)
if url.scheme != "s3a":
raise ex.InvalidDataException(_("URL scheme must be 's3a'"))
if url.scheme not in ["s3", "s3a"]:
raise ex.InvalidDataException(
_("URL scheme must be 's3' or 's3a'"))
if not url.hostname:
raise ex.InvalidDataException(_("Bucket name must be present"))
@ -80,3 +81,6 @@ class S3Type(DataSourceType):
if job_conf.get(s3a_cfg_name, None) is None: # no overwrite
if creds.get(config_name, None) is not None:
job_conf[s3a_cfg_name] = creds[config_name]
def get_runtime_url(self, url, cluster):
return url.replace("s3://", "s3a://", 1)

View File

@ -35,6 +35,9 @@ class TestSwiftType(base.SaharaTestCase):
}
self.s_type.validate(data)
data["url"] = "s3://mybucket/myobject"
self.s_type.validate(data)
creds = {}
data["credentials"] = creds
self.s_type.validate(data)