Updated hdp_plugin features to align with current capabilties

* Minor re-write of Swift integration feature
* Extracted plugin capabilities into an easy to update/view table

Co-Authored-By: Alexander Ignatov <aignatov@mirantis.com>

Partially Implements: blueprint  update-docs-icehouse

Change-Id: I4014e9549467b4ad3d3fe888aad98e1079796013
This commit is contained in:
Erik Bergenholtz 2014-04-02 12:54:32 -04:00 committed by Alexander Ignatov
parent 6c1116fa6c
commit 158e897e1f
2 changed files with 46 additions and 63 deletions

View File

@ -9,15 +9,15 @@ User may change number of instances in existing Node Groups or add new Node Grou
If cluster fails to scale properly, all changes will be rolled back.
Currently only Vanilla plugin supports this feature. Visit :doc:`vanilla_plugin` for info about cluster topology limitations.
Swift Integration
-----------------
If you want to work with Swift, e.g. to run jobs on data located in Swift or put jobs` result into it, you need to use patched Hadoop and Swift.
For more info about this patching and configuring see :doc:`hadoop-swift`. There is a number of possible configs for Swift which can be set, but
currently Sahara automatically set information about swift filesystem implementation, location awareness, URL and tenant name for authorization.
The only required information that is still needed to be set are username and password to access Swift. So you need to explicitly specify these parameters while launching the job.
In order to leverage Swift within Hadoop, including using Swift data sources from within EDP, Hadoop requires the application of a patch.
For additional information about this patch and configuration, please refer to :doc:`hadoop-swift`. Sahara automatically sets information
about the Swift filesystem implementation, location awareness, URL and tenant name for authorization.
The only required information that is still needed to be set is username and password to access Swift. These parameters need to be
explicitly set prior to launching the job.
E.g. :
@ -39,12 +39,8 @@ with name "sahara".
Currently user can only enable/disable Swift for a Hadoop cluster. But there is a blueprint about making Swift access
more configurable: https://blueprints.launchpad.net/sahara/+spec/swift-configuration-through-rest-and-ui
Currently this feature is supported only by :doc:`vanilla_plugin`.
Cinder support
--------------
This feature is supported only by :doc:`vanilla_plugin`.
Cinder is a block storage service that can be used as an alternative for an ephemeral drive. Using Cinder volumes increases reliability of data which is important for HDFS service.
User can set how many volumes will be attached to each node in a Node Group and the size of each volume.
@ -115,8 +111,6 @@ This feature is supported by all plugins out of the box.
Data-locality
-------------
This feature is supported only by :doc:`vanilla_plugin`.
It is extremely important for data processing to do locally (on the same rack,
openstack compute node or even VM) as much work as
possible. Hadoop supports data-locality feature and can schedule jobs to
@ -212,3 +206,25 @@ The following features are supported in the new Heat engine:
+-----------------------------------------+-------------------------+-----------------------------------------+
| Elastic Data Processing | Not affected | |
+-----------------------------------------+-------------------------+-----------------------------------------+
Plugin Capabilities
-------------------
The below tables provides a plugin capability matrix:
+--------------------------+---------+--------------+-----+
| | Plugin |
| +---------+--------------+-----+
| Feature | Vanilla | HDP | IDH |
+==========================+=========+==============+=====+
| Nova and Neutron network | x | x | x |
+--------------------------+---------+--------------+-----+
| Cluster Scaling | x | Scale Up | x |
+--------------------------+---------+--------------+-----+
| Swift Integration | x | x | x |
+--------------------------+---------+--------------+-----+
| Cinder Support | x | x | x |
+--------------------------+---------+--------------+-----+
| Data Locality | x | x | N/A |
+--------------------------+---------+--------------+-----+
| EDP | x | x | x |
+--------------------------+---------+--------------+-----+

View File

@ -1,17 +1,17 @@
Hortonworks Data Plaform Plugin
===============================
The Hortonworks Data Platform (HDP) Sahara plugin provides a way to provision HDP clusters on OpenStack using templates in a single click and in an easily repeatable fashion. As seen from the architecture diagram below, the Sahara controller serves as the glue between Hadoop and OpenStack. The HDP plugin mediates between the Sahara controller and Apache Ambari in order to deploy and configure Hadoop on OpenStack. Core to the HDP Plugin is Apache Ambari that is used as the orchestrator for deploying the HDP stack on OpenStack.
The Hortonworks Data Platform (HDP) Sahara plugin provides a way to provision HDP clusters on OpenStack using templates in a single click and in an easily repeatable fashion. As seen from the architecture diagram below, the Sahara controller serves as the glue between Hadoop and OpenStack. The HDP plugin mediates between the Sahara controller and Apache Ambari in order to deploy and configure Hadoop on OpenStack. Core to the HDP Plugin is Apache Ambari which is used as the orchestrator for deploying HDP on OpenStack.
.. image:: ../images/hdp-plugin-architecture.png
:width: 800 px
:scale: 80 %
:align: center
The HDP plugin uses Ambari Blueprints aka templates for cluster provisioning.
The HDP plugin can make use of Ambari Blueprints for cluster provisioning.
Apache Ambari Blueprints (aka Cluster templates)
------------------------------------------------
Apache Ambari Blueprints
------------------------
Apache Ambari Blueprints is a portable document definition, which provides a complete definition for an Apache Hadoop cluster, including cluster topology, components, services and their configurations. Ambari Blueprints can be consumed by the HDP plugin to instantiate a Hadoop cluster on OpenStack. The benefits of this approach is that it allows for Hadoop clusters to be configured and deployed using an Ambari native format that can be used with as well as outside of OpenStack allowing for clusters to be re-instantiated in a variety of environments.
For more information about Apache Ambari Blueprints, refer to: https://issues.apache.org/jira/browse/AMBARI-1783. Note that Apache Ambari Blueprints are not yet finalized.
@ -29,62 +29,29 @@ Images
------
The Sahara HDP plugin can make use of either minimal (operating system only) images or pre-populated HDP images. The base requirement for both is that the image is cloud-init enabled and contains a supported operating system (see http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.4/bk_hdp1-system-admin-guide/content/sysadminguides_ha_chap2_3.html).
The advantage of a pre-populated image is that provisioning time is accelerated, as packages do not need to be downloaded and installed which make up the majority of the time spent in the provisioning cycle.
As with the provided pre-populated image, a pre-populated image can install any of the following packages:
* hadoop-libhdfs
* hadoop-native
* hadoop-pipes
* hadoop-sbin
* hadoop-lzo
* hadoop-lzo-native
* mysql-server
* httpd
* net-snmp
* net-snmp-utils
* perl-Net-SNMP
* nagios
* fping
* nagios-plugins
* hdp_mon_nagios_addons
* ganglia-gmetad
* gweb hdp_mon_ganglia_addons
* ganglia-gmond
* python-rrdtool.x86_64
* glibc glibc.i686
* appropriate JDK satisfying Ambari requirement
* epel-release
Any packages that are not installed in a pre-populated image will automatically be installed during the HDP provisioning process.
There are two VM images provided for use with the HDP Plugin:
The advantage of a pre-populated image is that provisioning time is reduced, as packages do not need to be downloaded and installed which make up the majority of the time spent in the provisioning cycle. In addition, provisioning large clusters will put a burden on the network as packages for all nodes need to be downloaded from the package repository.
1. `centos-6_64-hdp-1.3.qcow2 <http://public-repo-1.hortonworks.com/savanna/images/centos-6_4-64-hdp-1.3.qcow2>`_: This image contains most of the requisite packages necessary for HDP deployment. The packages contained herein correspond to the HDP 1.3 release. The operating system is a minimal CentOS 6.4 cloud-init enabled install. This image can only be used to provision HDP 1.3 hadoop clusters.
2. `centos-6-64-hdp-vanilla.qcow2 <http://public-repo-1.hortonworks.com/savanna/images/centos-6_4-64-vanilla.qcow2>`_: This image provides only a minimal install of CentOS 6.4 and is cloud-init enabled. This image can be used to provision any versions of HDP supported by Sahara.
For more information about HDP images, refer to https://github.com/openstack/sahara-image-elements.
HDP plugin requires an image to be tagged in Sahara Image Registry with
two tags: 'hdp' and '<hdp version>' (e.g. '1.3.2').
There are three VM images provided for use with the HDP Plugin, that can also be built using the tools available in sahara-image-elemnts:
Also in the Image Registry you will need to specify username for an image.
It should be 'root' for both images.
Please refer to the reference VM image provided for specific details.
1. `centos-6_64-hdp-1.3.qcow2 <http://public-repo-1.hortonworks.com/sahara/images/centos-6_4-64-hdp-1.3.qcow2>`_: This image contains most of the requisite packages necessary for HDP deployment. The packages contained herein correspond to the HDP 1.3 release. The operating system is a minimal CentOS 6.5 cloud-init enabled install. This image can only be used to provision HDP 1.3 hadoop clusters.
2. `centos-6_64-hdp-2.0.6.qcow2 <https://s3.amazonaws.com/public-repo-1.hortonworks.com/sahara/images/centos-6_4-64-hdp-2_0_6.qcow2>`_: This image contains most of the requisite packages necessary for HDP deployment. The packages contained herein correspond to the HDP 2.0.6 release. The operating system is a minimal CentOS 6.5 cloud-init enabled install. This image can only be used to provision HDP 2.0.6 hadoop clusters.
3. `centos-6-64-hdp-vanilla.qcow2 <http://public-repo-1.hortonworks.com/sahara/images/centos-6_4-64-vanilla.qcow2>`_: This image provides only a minimal install of CentOS 6.5 and is cloud-init enabled. This image can be used to provision any versions of HDP supported by Sahara.
HDP plugin requires an image to be tagged in Sahara Image Registry with two tags: 'hdp' and '<hdp version>' (e.g. '1.3.2').
Also in the Image Registry you will need to specify username for an image. The username specified should be 'root'.
Limitations
-----------
The HDP plugin currently has the following limitations:
* Currently, the HDP plugin provides support for HDP 1.3. Once HDP2 is released, support for this version will be provided.
* Swift integration is not yet implemented.
* It is not possible to decrement the number of node-groups or hosts per node group in a Sahara generated cluster.
* Only the following services are available to be deployed via Sahara:
* Ambari
* Nagios
* Ganglia
* HDFS
* MAPREDUCE
Note: Other services may be added using Ambari after initial cluster creation.
HDP Version Support
-------------------
The HDP plugin currently supports HDP 1.3.2 and HDP 2.0.6. Support for future version of HDP will be provided shortly after software is generally available.
Cluster Validation
------------------
@ -97,4 +64,4 @@ Prior to Hadoop cluster creation, the HDP plugin will perform the following vali
The HDP Plugin and Sahara Support
----------------------------------
A Hortonworks supported version of HDP OpenStack plugin will become available at a future date. For more information, please contact Hortonworks.
For more information, please contact Hortonworks.