Merge "S3 data source URL format change"
This commit is contained in:
commit
ee7964f830
|
@ -135,7 +135,7 @@ share will be automatically mounted to your cluster's nodes as needed to
|
||||||
access the data source.
|
access the data source.
|
||||||
|
|
||||||
Finally, Sahara supports data sources referring to S3-like object stores. The
|
Finally, Sahara supports data sources referring to S3-like object stores. The
|
||||||
URL should be of the form ``s3a://{bucket}/{path}``. Also, the following
|
URL should be of the form ``s3://{bucket}/{path}``. Also, the following
|
||||||
credentials/configs are understood: ``accesskey``, ``secretkey``,
|
credentials/configs are understood: ``accesskey``, ``secretkey``,
|
||||||
``endpoint``, ``bucket_in_path``, and ``ssl``. These credentials are specified
|
``endpoint``, ``bucket_in_path``, and ``ssl``. These credentials are specified
|
||||||
through the ``credentials`` attribute of the body of the request when creating
|
through the ``credentials`` attribute of the body of the request when creating
|
||||||
|
@ -632,13 +632,13 @@ Manila NFS filesystem reference URLS take the form:
|
||||||
This format should be used when referring to a job binary or a data source
|
This format should be used when referring to a job binary or a data source
|
||||||
stored in a manila NFS share.
|
stored in a manila NFS share.
|
||||||
|
|
||||||
For job binaries only, S3 urls take the form:
|
For both job binaries and data sources, S3 urls take the form:
|
||||||
|
|
||||||
``s3://bucket/path/to/object``
|
``s3://bucket/path/to/object``
|
||||||
|
|
||||||
For data sources, S3 urls take the standard Hadoop form:
|
Despite the above URL format, the current implementation of EDP will still
|
||||||
|
use the Hadoop ``s3a`` driver to access data sources. Botocore is used to
|
||||||
``s3a://bucket/path/to/object``
|
access job binaries.
|
||||||
|
|
||||||
EDP Requirements
|
EDP Requirements
|
||||||
================
|
================
|
||||||
|
|
|
@ -0,0 +1,4 @@
|
||||||
|
---
|
||||||
|
other:
|
||||||
|
- |
|
||||||
|
The URL of an S3 data source may have `s3://` or `s3a://`, equivalently.
|
|
@ -55,8 +55,9 @@ class S3Type(DataSourceType):
|
||||||
raise ex.InvalidDataException(_("S3 url must not be empty"))
|
raise ex.InvalidDataException(_("S3 url must not be empty"))
|
||||||
|
|
||||||
url = urlparse.urlparse(url)
|
url = urlparse.urlparse(url)
|
||||||
if url.scheme != "s3a":
|
if url.scheme not in ["s3", "s3a"]:
|
||||||
raise ex.InvalidDataException(_("URL scheme must be 's3a'"))
|
raise ex.InvalidDataException(
|
||||||
|
_("URL scheme must be 's3' or 's3a'"))
|
||||||
|
|
||||||
if not url.hostname:
|
if not url.hostname:
|
||||||
raise ex.InvalidDataException(_("Bucket name must be present"))
|
raise ex.InvalidDataException(_("Bucket name must be present"))
|
||||||
|
@ -80,3 +81,6 @@ class S3Type(DataSourceType):
|
||||||
if job_conf.get(s3a_cfg_name, None) is None: # no overwrite
|
if job_conf.get(s3a_cfg_name, None) is None: # no overwrite
|
||||||
if creds.get(config_name, None) is not None:
|
if creds.get(config_name, None) is not None:
|
||||||
job_conf[s3a_cfg_name] = creds[config_name]
|
job_conf[s3a_cfg_name] = creds[config_name]
|
||||||
|
|
||||||
|
def get_runtime_url(self, url, cluster):
|
||||||
|
return url.replace("s3://", "s3a://", 1)
|
||||||
|
|
|
@ -35,6 +35,9 @@ class TestSwiftType(base.SaharaTestCase):
|
||||||
}
|
}
|
||||||
self.s_type.validate(data)
|
self.s_type.validate(data)
|
||||||
|
|
||||||
|
data["url"] = "s3://mybucket/myobject"
|
||||||
|
self.s_type.validate(data)
|
||||||
|
|
||||||
creds = {}
|
creds = {}
|
||||||
data["credentials"] = creds
|
data["credentials"] = creds
|
||||||
self.s_type.validate(data)
|
self.s_type.validate(data)
|
||||||
|
|
Loading…
Reference in New Issue