sahara

Commit Graph

Author	SHA1	Message	Date
Telles Nobrega	abc8f57055	Python 3 fixes String to Bytes compatibility. Story: #2006258 Task: #35875 Change-Id: Id0ad0f3c644af52f41217105b249df78d0b722cc	2019-10-02 08:29:03 -03:00
Jeremy Freudberg	a449558ac0	S3 data source * Create S3 data source type for EDP * Support storing S3 secret key in Castellan * Unit tests for new data source type * Document new data source type and related ideas * Add support of S3 configs into Spark and Oozie workflows * Hide S3 credentials in job execution info, like for Swift * Release note Change-Id: I3ae5b9879b54f81d34bc7cd6a6f754347ce82f33	2018-07-02 14:27:46 -04:00
Shu Yingya	d02e61aa68	Dynamically add python version into launch_command Ubuntu Xenial or later server won't install Python2 anymore by default. Sahara should have the ability to dynamically edit the remotely executed python script based on what python is available. Change-Id: Ie0fdd829d1b0ff019329957fbdbbfd150320b8ab Closes-Bug: #1739009	2018-01-24 10:55:15 +08:00
Marianne Linhares Monteiro	2def30f412	Code integration with the abstractions Changes to make the integration of the existing code with the data source and job binary abstractions possible. Change-Id: I524f25ac95bb634b0583113792460c17217acc34 Implements: blueprint data-source-plugin	2017-03-19 19:43:40 -03:00
Luong Anh Tuan	158bd893b9	Replaces uuid.uuid4 with uuidutils.generate_uuid() Change-Id: Ib72d9e74c70437678c72cebc31aee60a9e140e23 Closes-Bug: #1082248	2016-11-07 13:13:57 +07:00
Vitaly Gridnev	e577f12f8e	Inject drivers to jars in Ambari Spark engine We have faced with issues that passing jars to driver classpath has no affect in case of ambari clusters. Passing drivers to jars should solve this issue. Change-Id: I17828fee9d17b6bddbbf6d3e9bdcf7d40c2d28a1 Closes-bug: #1535106	2016-03-14 17:29:49 +03:00
Michael Ionkin	47d9e68d6e	Added support of Spark 1.6.0 Spark 1.6.0 is available now for deployment. Also added the current working directory to the driver class path for proper reading of the spark.xml file Change-Id: I9a46a503c7e52d756c7de8c8694dbfc51f80f2be Co-Authored-By: Vitaly Gridnev <vgridnev@mirantis.com> bp: support-spark-160	2016-02-08 19:51:25 +03:00
Jenkins	11ffd666e2	Merge "[EDP] Add scheduling EDP jobs in sahara(oozie engine implementation)"	2016-01-20 16:34:03 +00:00
Michael McCune	423d80498b	add helper functions for key manager this change adds a utils module to the castellan service package. this module contains 3 wrapper functions to help reduce the overhead for working with castellan. * add sahara.service.castellan.utils module * fixup previous usages of the castellan key manager Change-Id: I6ad4e98ab41788022104ad2886e0ab74e4061ec3 Partial-Implements: blueprint improved-secret-storage	2016-01-11 10:12:01 -05:00
luhuichun	330157c299	[EDP] Add scheduling EDP jobs in sahara(oozie engine implementation) Add run_scheduled_job in base_engine, and implement it in oozie engine. Implements bp: enable-scheduled-edp-jobs Change-Id: I2a0b3724396b4bed5cd2a4bc1392f849eb902e3e	2015-12-24 22:09:33 +08:00
Michael McCune	d148dd4d55	Initial key manager implementation This change adds the sahara key manager and converts the proxy passwords and swift passwords to use the castellan interface. * adding sahara key manager * adding castellan to requirements * removing barbicanclient from requirements * removing sahara.utils.keymgr and related tests * adding castellan wrapper configs to sahara list_opts * creating a castellan validate_config to help setup * updating documentation for castellan usage * fixing up tests to work with castellan * converting all proxy password usages to use castellan * converting job binaries to use castellan when user credentials are applied * converting data source to use castellan when user credentials are applied Change-Id: I8cb08a365c6175744970b1037501792fe1ddb0c7 Partial-Implements: blueprint improved-secret-storage Closes-Bug: #1431944	2015-12-22 15:07:12 -05:00
Vitaly Gridnev	14cece1ead	Support overriding of driver classpath in Spark jobs It might be useful to have ability to override default value for cluster of driver-class-path option for the particular job execution. The change add new config option: "edp.spark.driver.classpath" that can override driver-class-path for users' needs. Implements blueprint: spark-override-classpath Change-Id: I94055c2ccb70c953620b62ed18c27895c3588327	2015-11-06 13:56:49 +03:00
Jenkins	b5bc56cac4	Merge "Rename oozie_job_id"	2015-09-18 05:19:33 +00:00
Trevor McKay	bad0476f2a	Only add current directory to classpath for client deploy mode We added a ":" to the spark driver-classpath to fix spark/swift integration for spark 1.3.1. However, the ":" is only necessary for client deploy mode so we should leave it out in other cases. Note, this is a refinement to the original bug fix. The ":" in other deployment modes does not break anything. However, it is better not to include it since it is unnecessary. Partial-bug: #1486544 Change-Id: Iaacbb090d0065922fab034d9d4a1f765ad7e05e3	2015-09-09 10:29:56 -04:00
Sergey Gotliv	e7d6799155	Adding support for the Spark Shell job Implements: blueprint edp-add-spark-shell-action Change-Id: I6d2ec02f854ab2eeeab2413bb56f1a359a3837c1	2015-08-27 13:29:36 +03:00
Jenkins	99d7629e9e	Merge "Ensure working dir is on driver class path for Spark/Swift"	2015-08-27 05:18:35 +00:00
Li, Chen	6c0db8e84a	Rename oozie_job_id The "oozie_job_id" column in table "job_executions" represents oozie_job_id only when the edp engine is oozie. When it is spark engin, oozie_job_id = pid@instance_id, when it is storm engine, oozie_job_id = topology_name@instance_id. Rename oozie_job_id to engine_job_id to aviod confusing. Change-Id: I2671b91a315b2c7a2b805ce4d494252860a7fe6c Closes-bug: 1479575	2015-08-26 14:58:48 +08:00
Trevor McKay	1018a540a5	Ensure working dir is on driver class path for Spark/Swift For Spark/Swift integration, we use a wrapper class to set up the hadoop environment. For this to succeed, the current working directory must be on the classpath. Newer versions of Spark have changed how the default classpath is generated, so Sahara must ensure explicitly that the working dir will be included. Change-Id: I6680bf8736cada93e87821ef37de3c3b4202ead4 Close-Bug: #1486544	2015-08-21 21:52:54 +03:00
Trevor McKay	7e10f34cad	Add manila nfs data sources This change will allow data sources with urls of the form "manila://share-id/path", similar to manila urls for job binaries. The Sahara native url will be logged in the JobExecution, but the true runtime url (file:///mnt/path) for manila shares will be used in the cluster. Partial-implements: blueprint manila-as-a-data-source Change-Id: I0b43491decbe6cb0ec0b84314cf9b407b9e3fb4a	2015-08-18 17:12:05 -04:00
Jenkins	033e2228e5	Merge "Allow Sahara native urls and runtime urls to differ for datasources"	2015-08-18 15:51:29 +00:00
Jenkins	7e6df0b0c7	Merge "Support manila shares as binary store"	2015-08-17 18:49:52 +00:00
Trevor McKay	660eb7f295	Allow Sahara native urls and runtime urls to differ for datasources When we consider things like Manila nfs shares as datasources, the possibility arises that the native form of a datasource url in Sahara may not be the same as the form of the url needed at runtime in the cluster. Allow them to differ, so that we can still record the Sahara native url form in JobExecution objects for accurate reference while passing the correct runtime url in job arguments, etc. This is a base CR that will be further built on later. In this change, native urls and runtime urls are always identical. Change-Id: I53f4cf11320e112ffd0c4ae93b7d1f300df86878 Partial-implements: blueprint manila-as-a-data-source	2015-08-11 09:36:27 -04:00
Chad Roberts	6761a01b09	Support manila shares as binary store Changes to support manila shares as a binary store. Oozie, Spark and Storm jobs can now run with job binaries stored in manila shares. Change-Id: I2f5fbe3d36ef4b87e5cadd337854e95ed95ebaa0 Implements: bp manila-as-binary-store	2015-08-11 09:36:27 -04:00
Li, Chen	e1f5bcf08c	Add CLUSTER_STATUS We should define a set of CLUSTER_STATUS in stead of using direct string in code. 1. Add cluster.py in utils/ 2. Add cluster status. 3. move cluster operation related methods from general.py to cluster.py Change-Id: Id95d982a911ab5d0f789265e03bff2256cf75856	2015-08-03 09:12:36 +08:00
Oleg Borisenko	1bc9ec4656	EDP Spark jobs work with Swift 1) Fixed the path to hadoop-swift.jar - in cloudera it's named as hadoop-openstack.jar 2) Fixed the options for launch wrapper with yarn-cluster (more details at http://spark.apache.org/docs/latest/running-on-yarn.html 'Important notes' section). 3) Fixed the issue of swift credentials visibility in Yarn cluster. 4) Fixed related unit-test with the same error in it. Change-Id: I5e8c72f0e362792f06245b3744a32342abc42389 Closes-bug: 1474128	2015-07-28 17:57:41 +03:00
Alexander Aleksiyants	74159dfdd2	Spark job for Cloudera 5.3.0 and 5.4.0 added Spark jobs in Cloudera 5.3.0 and 5.4.0 plugins are now supported. Required unit tests have been added. Merged with current master HEAD. Change-Id: Ic8fde97e424e45c6f31f7794749793b26c844915 Implements: blueprint spark-jobs-for-cdh-5-3-0	2015-07-10 17:45:11 +03:00
Nikita Konovalov	f7d1ec55a8	Removed dependency on Spark plugin in edp code The EDP Spark engine was importing a config helper from the Spark plugin. The helper was moved to common plugin utils and now is imported from there by both the plugin and the engine. This is the part of sahara and plugins split. Partially-implements bp: move-plugins-to-separate-repo Change-Id: Ie84cc163a09bf1e7b58fcdb08e0647a85492593b	2015-06-17 09:18:22 +00:00
Andrew Lazarev	7bae4261d0	Implemented support of placeholders in datasource URLs Added ability to use placeholders in datasource URLs. Currently supported placeholders: * %RANDSTR(len)% - will be replaced with random string of lowercase letters of length `len`. * %JOB_EXEC_ID% - will be replaced with the job execution ID. Resulting URLs will be stored in a new field at job_execution table. Using 'info' field doesn't look as good solution since it is reserved for oozie status. Next steps: * write documentation * update horizon Implements blueprint: edp-datasource-placeholders Change-Id: I1d9282b210047982c062b24bd03cf2331ab7599e	2015-05-06 20:50:03 +00:00
Ethan Gafford	f197165e82	[EDP][Spark] Configure cluster for external hdfs Adding configuration of the hosts file for HDFS access (already added to Oozie engine) to Spark. Change-Id: I3d2a372d3f4a4e502e2c0e111a1e29fb4f9b9fcf Partially-implements: blueprint edp-spark-external-hdfs	2015-03-05 17:30:34 -05:00
Andrey Pavlov	5c5491f9de	Using oslo_* instead of oslo.* Changes: * using oslo_config instead of oslo.config * using oslo_concurrency instead of oslo.concurrency * using oslo_db instead of oslo.db * using oslo_i18n instead of oslo.i18n * using oslo_messaging instead of oslo.messaging * using oslo_middleware instead of oslo.middleware * using oslo_serialization instead of oslo.serialization * using oslo_utils instead of oslo.utils Change-Id: Ib0f18603ca5b0885256a39a96a3620d05260a272 Closes-bug: #1414587	2015-02-04 13:19:28 +03:00
Trevor McKay	bfe01ead79	Add Swift integration with Spark This change allows Spark jobs to access Swift URLs without any need to modify the Spark job code itself. There are a number of things necessary to make this work: * add a "edp.spark.adapt_for_swift" config value to control the feature * generate a modified spark-submit command when the feature is enabled * add the hadoop-swift.jar to the Spark classpaths for the driver and executors (cluster launch) * include the general Swift configs in the Hadoop core-site-xml and make Spark read the Hadoop core-site.xml (cluster launch) * upload an xml file containing the Swift authentication configs for Hadoop * run a wrapper class that reads the extra Hadoop configuration and adds it to the configuration for the job Changes in other CRs: * add the hadoop-swift.jar to the Spark images * add the SparkWrapper code to sahara-extra Partial-Implements: blueprint edp-spark-swift-integration Change-Id: I03dca4400c832f3ba8bc508d4fb2aa98dede8d80	2015-02-03 10:34:32 -05:00
Trevor McKay	7eac9f188d	Follow the argument order specified in spark-submit help The command issued by Sahara to run jobs with spark-submit does not put the application jar in the right place according to the help text of spark-submit. This does not make jobs fail, but it is good to be consistent in case something changes. Change-Id: I50c2a969e4f747820c06d5dba39b6a8442bb5c30 Closes-Bug: #1410247	2015-01-26 12:43:44 -05:00
Trevor McKay	8750ddc121	Add options supporting DataSource identifiers in job_configs This change adds options that allow DataSource objects to be referenced by name or uuid in the job_configs dictionary of a job_execution. If a reference to a DataSource is found, the path information replaces the reference. Note, references are partially resolved in early processing to determine whether or not a proxy user must be created. References are fully resolved in run_job(). Implements: blueprint edp-data-sources-in-job-configs Change-Id: I5be62b798b86a8aaf933c2cc6b6d5a252f0a8627	2015-01-14 18:20:05 +00:00
Andrey Pavlov	89fbce96f1	Fixed problem with canceling during pending Change-Id: Icbe3cd39fa28d6561607e679e3e7cd5b2a64751a Closes-bug: #1369979	2014-10-10 18:07:41 +04:00
Alexander Ignatov	1a9bf1f24e	Moved exceptions.py and utils.py up to plugins dir Plugins dir contains 'general' module which looks like yet another plugin along with vanilla, fake, hdp, spark and cdh. But it doesn't and contains two files only. Moved them one level up to avoid such confusion. Closes-Bug: #1378178 Change-Id: Ia600e4c584d48a3227552f0051cc3bf906206bed	2014-10-07 09:00:19 +04:00
Jenkins	ed4e658522	Merge "Moved validate_edp from plugin SPI to edp_engine"	2014-09-16 15:09:49 +00:00
Jenkins	40b4772fd8	Merge "Added missed translation for service.edp.spark"	2014-09-12 05:28:50 +00:00
Andrew Lazarev	e55238a881	Moved validate_edp from plugin SPI to edp_engine Now EDP engine is fully responsible on validation of data for job execution. Other changes: * Removed API calls from validation to remove circular dependancy * Removed plugins patching in validation to allow non-vanilla plugins testing * Renamed job_executor to job_execution Change-Id: I14c86f33b355cb4317e96a70109d8d72d52d3c00 Closes-Bug: #1357512	2014-09-10 10:10:41 -07:00
Andrey Pavlov	2904333584	Added missed translation for service.edp.spark Change-Id: I3d4edcd1715578abc6f582bb085c09544707bd8d	2014-09-09 17:41:20 +04:00
Michael McCune	f3b2a30309	Updating JobBinaries to use proxy for Swift access Changes * refactoring get_raw_binary to accept proxy configs * refactoring get_raw_data to use proxy Swift connection when necessary * adding function to get a Swift Connection object from proxy user * refactoring upload_job_files_to_hdfs and upload_job_files to use proxy user when necessary * changing JobBinary JSON schema to allow blank username/password if proxy domains are being used * adding function to get the Swift public endpoint for the current project * adding test for JobBinary creation without credentials Partial-implements: blueprint edp-swift-trust-authentication Change-Id: I02e76016194fbbb62b8ab7b304eecc53d580a79c	2014-09-09 09:11:58 -04:00
Andrew Lazarev	42526b808b	Made EDP engine plugin specific + Moved 'get_hdfs_user' method from plugin SPI to EDP engine Futher steps: move other EDP-specific method to EDP engine Change-Id: I0537397894012f496ea4abc2661aa8331fbf6bd3 Partial-Bug: #1357512	2014-08-21 12:45:43 -07:00
Jenkins	ddc8482ac9	Merge "Add a Spark job type for EDP"	2014-08-06 11:13:34 +00:00
Michael McCune	67f60dae57	Adding job execution status constants To help standardize using job statuses across modules this patch introduces a set of constants in sahara.utils.edp for the statuses currently in use. Changes * add job status constants for DONEWITHERROR, FAILED, KILLED, PENDING, RUNNING, and SUCCEEDED * add a list constant for the terminated statuses * update references from string variables to constants Partial-implements: blueprint edp-swift-trust-authentication Change-Id: Ib0c47a5c002e135f2e2eed0a9066144c830926b3	2014-08-01 15:55:51 -04:00
Trevor McKay	cb60f67a50	Add a Spark job type for EDP * Added support for JOB_TYPE_SPARK * Modifed create_workflow_dir to not use job_workflow_postfix (only for hdfs) * Rewrote unit tests in test_job.py to be oriented around validation behavior for classes of job types rather than features of validation, and included JOB_TYPE_SPARK. * Moved test_job_executor_java.py into test_job_executor and added JOB_TYPE_SPARK * Added tests for create_workflow_dir and upload_job_files Partial-implements: blueprint edp-spark-job-type Change-Id: Ifd91123afea9e921ac441751a37aa6afae0bbd66	2014-08-01 13:42:54 -04:00
Trevor McKay	5698799ee3	Implement EDP for a Spark standalone cluster This change adds an EDP engine for a Spark standalone cluster. The engine uses the spark-submit script and various linux commands via ssh to run, monitor, and terminate Spark jobs. Currently, the Spark engine can launch "Java" job types (this is the same type used to submit Oozie Java action on Hadoop clusters) A directory is created for each Spark job on the master node which contains jar files, the script used to launch the job, the job's stderr and stdout, and a result file containing the exit status of spark-submit. The directory is named after the Sahara job and the job execution id so it is easy to locate. Preserving these files is a big help in debugging jobs. A few general improvements are included: * engine.cancel_job() may return updated job status * engine.run_job() may return job status and fields for job_execution.extra in addition to job id Still to do: * create a proper Spark job type (new CR) * make the job dir location on the master node configurable (new CR) * add something to clean up job directories on the master node (new CR) * allows users to pass some general options to spark-submit itself (new CR) Partial implements: blueprint edp-spark-standalone Change-Id: I2c84e9cdb75e846754896d7c435e94bc6cc397ff	2014-07-30 17:13:42 -04:00
Trevor McKay	9198e31187	Refactor the job manager to allow multiple execution engines This change creates an abstract base class that defines three simple operations on jobs -- run, check status, and cancel. The existing Oozie implementation becomes one implementation of this class, and a stub for Spark clusters has been added. The EDP job engine will be chosen based on information in the cluster object. Implements: blueprint edp-refactor-job-manager Change-Id: I725688b0071b2c2a133cd167ae934f59e488c734	2014-07-09 10:18:29 -04:00

46 Commits