Commit Graph

21 Commits

Author SHA1 Message Date
Jeremy Freudberg a449558ac0 S3 data source
* Create S3 data source type for EDP
* Support storing S3 secret key in Castellan
* Unit tests for new data source type
* Document new data source type and related ideas
* Add support of S3 configs into Spark and Oozie workflows
* Hide S3 credentials in job execution info, like for Swift
* Release note

Change-Id: I3ae5b9879b54f81d34bc7cd6a6f754347ce82f33
2018-07-02 14:27:46 -04:00
Jenkins 8f6df19921 Merge "Add a common Hive and Pig config in workflow_factory" 2016-01-20 19:05:37 +00:00
luhuichun 809ffb610e Add a common Hive and Pig config in workflow_factory
in sahara/service/edp/oozie/workflow_creator/workflow_factory.py.
function get_possible_job_config().This function is for all plugin,
we should not use the specific vanilla v2.6.0 config file. Actually
every plugin has his own config file for Hive and PIG, and they deal
with it in plugin's own get_possible_job_config() method, so we should
make a common Hive and Pig config file here.

Change-Id: Ieaee3bcc7a21344face14a6f13d4c30fdc631038
Closes-bug: 1526391
2016-01-14 01:04:35 +08:00
Michael McCune 423d80498b add helper functions for key manager
this change adds a utils module to the castellan service package. this
module contains 3 wrapper functions to help reduce the overhead for
working with castellan.

* add sahara.service.castellan.utils module
* fixup previous usages of the castellan key manager

Change-Id: I6ad4e98ab41788022104ad2886e0ab74e4061ec3
Partial-Implements: blueprint improved-secret-storage
2016-01-11 10:12:01 -05:00
Michael McCune d148dd4d55 Initial key manager implementation
This change adds the sahara key manager and converts the proxy passwords
and swift passwords to use the castellan interface.

* adding sahara key manager
* adding castellan to requirements
* removing barbicanclient from requirements
* removing sahara.utils.keymgr and related tests
* adding castellan wrapper configs to sahara list_opts
* creating a castellan validate_config to help setup
* updating documentation for castellan usage
* fixing up tests to work with castellan
* converting all proxy password usages to use castellan
* converting job binaries to use castellan when user credentials are
  applied
* converting data source to use castellan when user credentials are
  applied

Change-Id: I8cb08a365c6175744970b1037501792fe1ddb0c7
Partial-Implements: blueprint improved-secret-storage
Closes-Bug: #1431944
2015-12-22 15:07:12 -05:00
Nikita Konovalov 6b5f0d0b1b Drop Vanilla Hadoop 1
Dropping support of unpopular and unused Hadoop v1 in Vanilla plugin.

Partially-implements bp: drop-hadoop-1

Change-Id: I0a322fb7b8db50941c4854f45077fe6232e2c766
2015-09-03 16:02:29 +03:00
Jenkins b79e9606f8 Merge "Implemented support of placeholders in datasource URLs" 2015-05-18 11:12:02 +00:00
Andrew Lazarev 7bae4261d0 Implemented support of placeholders in datasource URLs
Added ability to use placeholders in datasource URLs. Currently
supported placeholders:
* %RANDSTR(len)% - will be replaced with random string of
  lowercase letters of length `len`.
* %JOB_EXEC_ID% - will be replaced with the job execution ID.

Resulting URLs will be stored in a new field at job_execution
table. Using 'info' field doesn't look as good solution since it
is reserved for oozie status.

Next steps:
* write documentation
* update horizon

Implements blueprint: edp-datasource-placeholders

Change-Id: I1d9282b210047982c062b24bd03cf2331ab7599e
2015-05-06 20:50:03 +00:00
Nikita Konovalov 156d637113 Improved coverage for workflow_creator
The Shell action is now covered.
Java action coverage improved.

Change-Id: Ie91eac4730a1ca2766f499213312c80f22facc6a
2015-04-30 16:42:37 +03:00
Michael McCune 4ca232ce72 Adding config hints for HDP plugin
This changes adds HDP plugin configuration hints for both versions
(1.3.2 and 2.0.6).

Changes
* adding confighints_helper module to hold utility functions for
  creating the config hints
* adding specific config hints functions for both HDP edp_engine
  versions
* fixing inconsistencies in the oozie workflow_factory possible hints
  function
* adding tests for hdp config hints and hints helper

Change-Id: I7f85e47a4f9dfc7ccba0a5678701e5a4fb6742bb
Partial-Implements: bp edp-job-types-endpoint
2015-04-13 14:02:26 -04:00
ChangBo Guo(gcb) 4dc5dfdada Leverage dict comprehension in PEP-0274
PEP-0274 introduced dict comprehensions to replace dict constructor
with a sequence of key-pairs[1], these are twobenefits:
  First, it makes the code look neater.
  Second, it gains a micro-optimization.

sahara dropped python 2.6 support in Kilo, we can leverage this now.

Note: This commit doesn't handle dict constructor with kwargs.
This commit also adds a hacking rule.

[1]http://legacy.python.org/dev/peps/pep-0274/

Closes-Bug: #1430786
Change-Id: I507f2c520ddab1ae3d8487bf7aea497306eb6eb2
2015-04-02 01:51:41 +00:00
Ethan Gafford 36881a9cba [EDP] Add Oozie Shell Job Type
This change adds the Shell job type, currently implemented for the
Oozie engine (per spec).

Oozie shell actions provide a great deal of flexibility and will
empower users to easily customize and extend the features of Sahara
EDP as needed. For example, a shell action could be used to manage
hdfs on the cluster, do pre or post processing for another job
launched from Sahara, or run a data processing job from a
specialized launcher that does extra configuration not otherwise
available from Sahara (ie, setting a special classpath for a Java job).

Change-Id: I0d8b59cf55cf583f0d24c2c8c2e487813d8ec716
Implements: blueprint add-edp-shell-action
2015-03-04 11:06:39 -05:00
Trevor McKay 7e4693ebe8 Config parameters beginning with "oozie." should be in job properties file
In the Oozie EDP engine, look for configs beginning with 'oozie.' and
pass them to _get_oozie_job_params() as additional values to be written
to the job properties file. Do not allow the workflow application path
to be overwritten (oozie.wf.application.path) since it is generated by
EDP. Prevent configs beginning with 'oozie.' from being written to the
workflow.xml file in workflow_factory.py

Closes-Bug: 1419923
Change-Id: I75b60e5bc3d1afadac7c2b209e3ea68e4ba9e88b
2015-02-09 16:25:00 -05:00
Andrey Pavlov 5c5491f9de Using oslo_* instead of oslo.*
Changes:
* using oslo_config instead of oslo.config
* using oslo_concurrency instead of oslo.concurrency
* using oslo_db instead of oslo.db
* using oslo_i18n instead of oslo.i18n
* using oslo_messaging instead of oslo.messaging
* using oslo_middleware instead of oslo.middleware
* using oslo_serialization instead of oslo.serialization
* using oslo_utils instead of oslo.utils

Change-Id: Ib0f18603ca5b0885256a39a96a3620d05260a272
Closes-bug: #1414587
2015-02-04 13:19:28 +03:00
Trevor McKay bfe01ead79 Add Swift integration with Spark
This change allows Spark jobs to access Swift URLs without
any need to modify the Spark job code itself. There are a
number of things necessary to make this work:

* add a "edp.spark.adapt_for_swift" config value to control the
  feature
* generate a modified spark-submit command when the feature is
  enabled
* add the hadoop-swift.jar to the Spark classpaths for the
  driver and executors (cluster launch)
* include the general Swift configs in the Hadoop core-site-xml
  and make Spark read the Hadoop core-site.xml (cluster launch)
* upload an xml file containing the Swift authentication configs
  for Hadoop
* run a wrapper class that reads the extra Hadoop configuration
  and adds it to the configuration for the job

Changes in other CRs:
* add the hadoop-swift.jar to the Spark images
* add the SparkWrapper code to sahara-extra

Partial-Implements: blueprint edp-spark-swift-integration
Change-Id: I03dca4400c832f3ba8bc508d4fb2aa98dede8d80
2015-02-03 10:34:32 -05:00
Kazuki OIKAWA 981a4906a9 Add edp.java.adapt_for_oozie config for Java Action
This config option is improving compatibility for MapReduce App.
In Java job type, MapReduce app has to be modified to load oozie
configuration. If enable this config option, no modifications are
necessary because wrapped class will load the configuration before
invoking MapReduce app's main method.

Change-Id: I07f2467942eed7d526e991f2ad8e58e4bb644a82
Implements: blueprint edp-improve-compatibility
2015-01-26 17:09:11 +09:00
Trevor McKay 8750ddc121 Add options supporting DataSource identifiers in job_configs
This change adds options that allow DataSource objects to be
referenced by name or uuid in the job_configs dictionary of a
job_execution. If a reference to a DataSource is found, the path
information replaces the reference.

Note, references are partially resolved in early processing to
determine whether or not a proxy user must be created.  References
are fully resolved in run_job().

Implements: blueprint edp-data-sources-in-job-configs
Change-Id: I5be62b798b86a8aaf933c2cc6b6d5a252f0a8627
2015-01-14 18:20:05 +00:00
Sergey Reshetnyak 0e965bf586 Fix working EDP jobs with non-string configs
Change-Id: I35435b56f588089b23b0e2f23c4cca9ce701f4af
Closes-bug: #1387723
2014-10-30 18:21:39 +03:00
Michael McCune 02b292a459 Refactoring DataSources to use proxy user
Changes
* adding constants for trust id and domain name to swift_helper
* refactoring oozie edp workflows to use proxy configs when necessary
* adding tests to exercise proxy domain usage in workflows
* removing credentials requirement for DataSource models when using
  proxy domain
* pruning duplicate MapReduce test
* adding tests for DataSource creation without credentials
* adding Keystone v3 token endpoint to Hadoop core-site.xml when domain
  requested

Partial-implements: blueprint edp-swift-trust-authentication
Change-Id: I38fd1c470d608c3de9d8c140228d7c9666523b23
2014-09-09 09:12:16 -04:00
Andrew Lazarev 42526b808b Made EDP engine plugin specific
+ Moved 'get_hdfs_user' method from plugin SPI to EDP engine

Futher steps: move other EDP-specific method to EDP engine

Change-Id: I0537397894012f496ea4abc2661aa8331fbf6bd3
Partial-Bug: #1357512
2014-08-21 12:45:43 -07:00
Trevor McKay 9198e31187 Refactor the job manager to allow multiple execution engines
This change creates an abstract base class that defines three
simple operations on jobs -- run, check status, and cancel. The
existing Oozie implementation becomes one implementation of this
class, and a stub for Spark clusters has been added.

The EDP job engine will be chosen based on information in the
cluster object.

Implements: blueprint edp-refactor-job-manager

Change-Id: I725688b0071b2c2a133cd167ae934f59e488c734
2014-07-09 10:18:29 -04:00