S3 data source URL format change
The old way will still work, but prefer s3:// now. Change-Id: Ia1f8eba22016044aa5ffe50b2ab898908aef1890
This commit is contained in:
parent
b5f7491540
commit
a37dfac06b
|
@ -135,7 +135,7 @@ share will be automatically mounted to your cluster's nodes as needed to
|
|||
access the data source.
|
||||
|
||||
Finally, Sahara supports data sources referring to S3-like object stores. The
|
||||
URL should be of the form ``s3a://{bucket}/{path}``. Also, the following
|
||||
URL should be of the form ``s3://{bucket}/{path}``. Also, the following
|
||||
credentials/configs are understood: ``accesskey``, ``secretkey``,
|
||||
``endpoint``, ``bucket_in_path``, and ``ssl``. These credentials are specified
|
||||
through the ``credentials`` attribute of the body of the request when creating
|
||||
|
@ -632,13 +632,13 @@ Manila NFS filesystem reference URLS take the form:
|
|||
This format should be used when referring to a job binary or a data source
|
||||
stored in a manila NFS share.
|
||||
|
||||
For job binaries only, S3 urls take the form:
|
||||
For both job binaries and data sources, S3 urls take the form:
|
||||
|
||||
``s3://bucket/path/to/object``
|
||||
|
||||
For data sources, S3 urls take the standard Hadoop form:
|
||||
|
||||
``s3a://bucket/path/to/object``
|
||||
Despite the above URL format, the current implementation of EDP will still
|
||||
use the Hadoop ``s3a`` driver to access data sources. Botocore is used to
|
||||
access job binaries.
|
||||
|
||||
EDP Requirements
|
||||
================
|
||||
|
|
|
@ -0,0 +1,4 @@
|
|||
---
|
||||
other:
|
||||
- |
|
||||
The URL of an S3 data source may have `s3://` or `s3a://`, equivalently.
|
|
@ -55,8 +55,9 @@ class S3Type(DataSourceType):
|
|||
raise ex.InvalidDataException(_("S3 url must not be empty"))
|
||||
|
||||
url = urlparse.urlparse(url)
|
||||
if url.scheme != "s3a":
|
||||
raise ex.InvalidDataException(_("URL scheme must be 's3a'"))
|
||||
if url.scheme not in ["s3", "s3a"]:
|
||||
raise ex.InvalidDataException(
|
||||
_("URL scheme must be 's3' or 's3a'"))
|
||||
|
||||
if not url.hostname:
|
||||
raise ex.InvalidDataException(_("Bucket name must be present"))
|
||||
|
@ -80,3 +81,6 @@ class S3Type(DataSourceType):
|
|||
if job_conf.get(s3a_cfg_name, None) is None: # no overwrite
|
||||
if creds.get(config_name, None) is not None:
|
||||
job_conf[s3a_cfg_name] = creds[config_name]
|
||||
|
||||
def get_runtime_url(self, url, cluster):
|
||||
return url.replace("s3://", "s3a://", 1)
|
||||
|
|
|
@ -35,6 +35,9 @@ class TestSwiftType(base.SaharaTestCase):
|
|||
}
|
||||
self.s_type.validate(data)
|
||||
|
||||
data["url"] = "s3://mybucket/myobject"
|
||||
self.s_type.validate(data)
|
||||
|
||||
creds = {}
|
||||
data["credentials"] = creds
|
||||
self.s_type.validate(data)
|
||||
|
|
Loading…
Reference in New Issue