Merge "import Big Data workload"

This commit is contained in:
Jenkins 2017-01-20 23:52:29 +00:00 committed by Gerrit Code Review
commit 5b48e78bc8
7 changed files with 1420 additions and 0 deletions

View File

@ -0,0 +1,436 @@
OpenStack Workload Reference Architecture: Big Data
===================================================
Introduction
------------
Big Data analytics has established itself as an important process to support
new or enhanced business models. Big Data is a term for data sets that are so
large or complex that traditional data processing applications are inadequate
to deal with them. Big Data analytics refers to the use of predictive
analytics, user behavior analytics, or certain other advanced data analytics
methods that extract value from data.
Since Big Data analytics can include and analyze all types of data sources,
the results are valuable for most departments in an enterprise. Each might
perform analytics with different business objectives. Considering the short
innovation cycle of most digital business models, Enterprise IT is often
under pressure to fulfill a multitude of demands quickly. A flexible, fast,
efficient and easy-to-manage Big Data deployment is critical.
Cloud is one approach to tackle the dynamic situation caused by high volumes
of analytics requests with rapid deployment time requirements. In an
OpenStack-based cloud environment, a Big Data cluster can be provisioned in
an automated manner. The value of Big Data on cloud contributes to it being
one of the top use cases for OpenStack. According to the April 2016
`OpenStack User Survey`_, 27 percent of users have deployed or are testing
Big Data analytics solutions.
Apache Hadoop on OpenStack offers a Big Data infrastructure that scales out
both compute and storage resources, and provides the secure and automated
capabilities for the analytics process. The `Apache Hadoop project`_ is the
de facto standard open source framework for Big Data analytics, used in the
vast majority of deployments. Multiple Hadoop clusters are often deployed to
respond to an enterprises needs.
This reference architecture is intended for enterprise architects who are
looking to deploy Big Data Hadoop clusters on an OpenStack cloud. It describes
a generic Hadoop architecture and uses open source technologies:
* `OpenStack cloud software`_
* `Ubuntu Linux`_ operating system
* `Apache Ambari`_ open source software to provision, manage and monitor
Hadoop clusters.
.. _OpenStack User Survey: https://www.openstack.org/assets/survey/April-2016-User-Survey-Report.pdf
.. _Apache Hadoop project: http://hadoop.apache.org/
.. _OpenStack cloud software: http://www.openstack.org/software/
.. _Ubuntu Linux: https://www.ubuntu.com/
.. _Apache Ambari: http://ambari.apache.org/
This reference architecture describes and includes installation files for a
basic Hadoop cluster. Additional services can be applied for a more complex
configuration and will be covered in future works.
Figure 1: High-level overview of Hadoop architecture
.. figure:: figures/figure01.png
:alt: Figure 1: High-level overview of Hadoop architecture
OpenStack for Hadoop Clusters
-----------------------------
This Hadoop architecture is derived from actual use cases and experience.
Building a Hadoop-based Big Data environment can be a complex task. It is
highly recommended to use common approaches to reduce this complexity, such as
identifying the data processing models. These processing models demand high
availability of resources, networking, bandwidth, storage, as well as security
constraints in the enterprise context.
* **Batch processing model** Analytics based on historic data
In the batch processing model, the analytic tasks are executed or queried in
a scheduled or recurring manner. Typically the data is already available for
analysis in a static repository such as large files or databases. The batch
processing model is often used to analyze business data of a certain period.
One example is an ETL (extract, transform, load) process to extract business
data from various ERP systems for supply chain planning.
* **Stream processing model** Business real-time analytics
In the stream processing model, data is continuously streamed and directly
analyzed in real time. Actions can be triggered in case of occurrence of
special or defined events. An example of a stream processing workload is
fraud detection for credit card companies. A credit card transaction is
transmitted online to the credit card company and is evaluated in real time
based on certain parameters; for example, checking the cards validity and
the purchase amount against the limit. It is also possible to check the
location of purchases and compare this to other recent purchases.
For example, if purchases are made in the U.S. and Europe in a timespan of
only a few hours, this indicates a high likelihood of fraud and action can be
taken to decline the transaction.
* **Predictive processing model** Predict outcome based on recent and
historical data
This model is used to predict an outcome, behavior or other actions for the
future. Generally this analytic model consists of various predictive
algorithms. One example is predictive maintenance. Data from machines,
engines or other sensors is collected and analyzed so that predictive actions
can be made to recommend the next maintenance cycle before a failure might
occur.
Hadoop clusters use a master-slave architecture. The data is ingested into the
cluster and stored in blocks in the Hadoop distributed file system (HDFS). The
default block size is 64MB. The blocks of data are replicated to different
nodes in the clusters. Part of the core Hadoop project, `YARN`_ provides a
framework for job scheduling and cluster resource management. With YARN,
multiple data processing applications can be implemented in the Hadoop cluster.
.. _YARN: http://hortonworks.com/apache/yarn/
Typically a Hadoop cluster with YARN is composed of different types of cluster
nodes:
* **NameNode** The metadata about the data blocks are stored in the NameNode.
This provides lookup functionality and tracking for all data or files in the
Hadoop cluster. NameNode does not store the actual data. Generally the
NameNode requires high memory (RAM) allocation. The NameNode belongs to the
"master" part of Hadoop architecture.
* **DataNode** This is also referred as the worker node and belongs to the
"slave" part of a Hadoop architecture. It is responsible for storing and
computing the data and responds to the NameNode for filesystem operations.
Generally a DataNode requires high amount of storage space.
* **ResourceManager** This is the master that manages the resources in the
Hadoop cluster. It has a scheduler to allocate resources to the various
applications across the cluster.
* **NodeManager** This takes instruction from the ResourceManager and is
responsible for executing the applications. It monitors and reports the
resources (cpu, memory, disk) to the ResourceManager.
An OpenStack cloud is powered by many different services (also known as
projects). Utilizing the core services and the Hadoop Common package, a
Hadoop cluster can be deployed in a virtualized environment with minimal
effort. Optional services such as the OpenStack Orchestration service (Heat)
can be added to automate deployment. This reference architecture does not
cover OpenStack Big Data Service (Sahara). Sahara provides a simple means to
provision as well as scale previously provisioned Hadoop clusters.
Sahara will be covered in future reference architecture documents.
Figure 2 shows the core and optional services in relation to one another,
and the services to confirm are available in your OpenStack cloud.
Figure 2. Logical representation of OpenStack services in support of Hadoop
clusters
.. figure:: figures/figure02.png
:alt: Figure 2. Logical representation of OpenStack services in support of Hadoop clusters
Brief descriptions of the core and optional services are as follow.
The `OpenStack Project Navigator`_ provides additional information.
.. _OpenStack Project Navigator: http://www.openstack.org/software/project-navigator/
.. list-table:: **Core services**
:widths: 20 50
* - Compute (Nova)
- Manages the life cycle of compute instances, including spawning,
scheduling, and decommissioning of virtual machines (VMs) on demand.
* - Image Service (Glance)
- Stores and retrieves VM disk images. Used by OpenStack Compute during
instance provisioning.
* - Block Storage (Cinder)
- Virtualizes the management of block storage devices and provides a
self-service API to request and use those resources regardless of the
physical storage location or device type. Supports popular storage
devices.
* - Networking (Neutron)
- Enables network connectivity as a service for other OpenStack services,
such as OpenStack Compute. Provides an API to define networks and their
attachments. Supports popular networking vendors and technologies. Also
provides LBaaS and Firewall-as-a-Service (FWaaS).
* - Identity Service (Keystone)
- Provides authentication and authorization for the other OpenStack
services.
* - Object Storage (Swift)
- Stores and retrieves arbitrary unstructured data objects via a RESTful
HTTP-based API. Highly fault-tolerant with data replication and
scale-out architecture.
.. list-table:: **Optional services**
:widths: 20 50
* - Orchestration (Heat)
- Orchestrates multiple composite cloud applications by using either the
native HOT template format or the AWS CloudFormation template format,
through both an OpenStack-native REST API and a
CloudFormation-compatible Query API.
* - Telemetry (Ceilometer)
- Monitors and meters the OpenStack cloud for billing, benchmarking,
scalability, and statistical purposes.
* - Dashboard (Horizon)
- Provides an extensible web-based self-service portal to interact with
underlying OpenStack services, such as launching an instance, assigning
IP addresses, or configuring access controls.
Figure 3 illustrates the basic functional interaction between these services.
For further details:
`OpenStack Conceptual Architecture Diagram <http://docs.openstack.org/admin-guide/common/get-started-conceptual-architecture.html>`_.
Figure 3. Functional interaction between OpenStack components
.. figure:: figures/figure03.png
:alt: Figure 3. Functional interaction between OpenStack components
Structuring a Hadoop Cluster with OpenStack
-------------------------------------------
OpenStack provides the necessary compute, network and data storage services
for building a cloudbased Hadoop cluster to meet the needs of the various
processing models.
Networking
**********
Multiple networks can be created for the Hadoop cluster connectivity. Neutron
routers are created to route the traffic between networks.
* **Edge Network** Provides connectivity to the client-facing and enterprise
IT network. End users are accessing the Hadoop cluster through this network.
* **Cluster Network** Provides inter-node communication for the Hadoop
cluster.
* **Management Network** Optionally provides a dedicated network for
accessing the Hadoop nodes' operating system for maintenance and monitoring
purposes.
* **Data Network** Provides a dedicated network for accessing the object
storage within an OpenStack Swift environment or to an external object
storage such as Amazon S3. This is optional if object storage is not used.
Neutron security groups are used to filter traffic. Hadoop uses different
ports and protocols depending on the services deployed and communications
requirements. Different security groups can be created for different types of
nodes, depending on the Hadoop services running on it. With OpenStack security
groups, multiple rules can be specified that allow/deny traffic from certain
protocols, ports, or IP addresses or ranges. Each virtual machine (VM) can be
applied with one or more security groups. In OpenStack, each tenant has a
default security group, which is applied to instances that have no other
security group defined. Unless changed, this security group denies all
incoming traffic.
Image Management
****************
There are multiple options to provide operating system configuration for the
Hadoop nodes. On-the-fly configuration allows greater flexibility but can
increase spawning time. The operating system images can also be pre-configured
to contain all of the Hadoop-related packages required for the different types
of nodes. Pre-configuration can reduce instance build time, but includes its
own set of problems, such as patching and image lifecycle management. In this
example, the Heat orchestration features are used to configure the Hadoop
nodes on-the-fly. Additional Hadoop and operating system packages are installed
on-the-fly depending on the node type (e.g. NameNode, DataNode). These packages
can be downloaded from Internet-based or local repositories. For a more secure
enterprise environment, local package repository is recommended.
Data Management
***************
Similar to an external hard drive, Cinder volumes are persistent block-storage
virtual devices that may be mounted and dismounted from the VM. Cinder volumes
can be attached to only one instance at a time. A Cinder volume is attached to
each Hadoop DataNode to provide the HDFS.
If the data to be processed by a Hadoop cluster needs to be accessed by other
applications, the OpenStack Swift object storage can be used to store it.
Swift offers a cost-effective way of storing unstructured data. Hadoop provides
a built-in interface to access Swift or AWS S3 object storage; either can be
configured to serve data over HTTP to the Hadoop cluster.
Orchestration
*************
Heat uses template files to automate the deployment of complex cloud
environments. Orchestration is more than just standing up virtual servers;
it can also be used to install software, apply patches, configure networking
and security, and more. Heat templates are provided with this reference
architecture that allow the user to quickly and automatically setup and
configure a Hadoop cluster for different data processing models
(types of analytics).
Figure 4: A Hadoop cluster on OpenStack
.. figure:: figures/figure04.png
:alt: Figure 4: A Hadoop cluster on OpenStack
Demonstration and Sample Code
-----------------------------
This section describes the Heat template provided for this workload. The
template is used to configure all of the Hadoop nodes. It has been created
for reference and training and is not intended to be used unmodified in a
production environment.
An Ambari Hadoop environment is created on a standard Ubuntu 14.04 server
cloud image in QEMU copy on write (qcow2). The qcow2 cloud image is stored in
the Glance repository. The Apache Ambari open source project makes Hadoop
management simpler by providing an easy-to-use Hadoop management web UI backed
by its RESTful APIs. Basically, Ambari is the central management service
for open source Hadoop. In this architecture, an Ambari service is installed
on the Master Node (NameNode). The Heat template also installs additional
required services such as the name server, Network Time Protocol (NTP) server,
database, and the operating system configuration customization required for
Ambari. Floating IP can be allocated to the Master Node to provide user access
to the Ambari service. In addition, an Ambari agent service is deployed on
each node of the cluster. This provides communication and authentication
functionality between the cluster nodes.
The following nodes are installed by the Heat template:
* **Master Node (NameNode)** This node houses the cluster-wide management
services that provide the internal functionality to manage the Hadoop cluster
and its resources.
* **Data Nodes** Services used for managing and analyzing the data, stored in
HDFS, are located on these nodes. Analytics jobs access and compute the data
on the Data Nodes.
* **Edge Node** Services used to access the cluster environment or the data
outside the cluster are on this node. For security, direct user access to the
Hadoop cluster should be minimized. Users can access the cluster via the
command line interface (CLI) from the Edge Node. All data-import and
data-export processes can be channeled on one or more Edge Nodes.
* **Admin Node** Used for system-wide administration
Multiple networks (edge, cluster, management, data) described in previous
sections are created by the Heat orchestration. A Neutron security group
is attached to each instance of the cluster node. The template also provisions
Cinder volumes and attaches one Cinder volume to each node. Swift is not
configured in this template and will be covered in future work.
The Heat template, BigData.yaml, can be downloaded from
http://www.openstack.org/software/sample-configs/#big-data.
Please review the README file for further details.
Scope and Assumptions
---------------------
The Heat template provided for this reference architecture assumes that the
Hadoop cluster workload is deployed in a single-region, single-zone OpenStack
environment. The deployment in a multi-zone/multiregion environment is outside
the scope of this document.
The Heat template is configured to address the minimum infrastructure
resources for deploying a Hadoop cluster. Architecting a Hadoop cluster is
highly dependent on the data volume and other performance indicators defined by
the business use cases, such as response times for analytic processes and how
and which services will be used.
The sample environment uses the Java environment. As such, the Heat template
installer will be required to accept the Java license agreement.
As mentioned, Sahara is not used in this implementation. Sahara is the
OpenStack Big Data Service that provisions a data-intensive application cluster
such as Hadoop or Spark. The Sahara project enables users to easily provision
and manage clusters with Hadoop and other data processing frameworks on
OpenStack. An update to this reference architecture to include Sahara is under
consideration.
Summary
-------
There are many possible choices or strategies for deploying a Hadoop cluster
and there are many possible variations in OpenStack deployment. This document
and the accompanying Heat templates serve as a general reference architecture
for a basic deployment and installation process via Openstack orchestration.
They are intended to demonstrate how easily and quickly a Hadoop Cluster can be
deployed, using the core OpenStack services. Complementary services will be
included in future updates.
These additional resources are recommended to delve into more depth on overall
OpenStack cloud architecture, the OpenStack services covered in this reference
architecture, and Hadoop and Ambari. The vibrant, global OpenStack community
and ecosystem can be invaluable for their experience and advice, especially the
users that have deployed Big Data solutions. Visit openstack.org to get started
or click on these resources to begin designing your OpenStack-based Big Data
analytics system.
.. list-table::
:widths: 25 50
:header-rows: 1
* - Resource
- Overview
* - `OpenStack Marketplace`_
- One-stop resource to the skilled global ecosystem for distributions,
drivers, training, services and more.
* - `OpenStack Architecture Design Guide`_
- Guidelines for designing an OpenStack cloud architecture for common use
cases. With examples.
* - `OpenStack Networking Guide`_
- How to deploy and manage OpenStack Networking (Neutron).
* - `OpenStack Virtual Machine Image Guide`_
- This guide describes how to obtain, create, and modify virtual machine
images that are compatible with OpenStack.
* - `Complete OpenStack documentation`_
- Index to all documentation, for every role and step in planning and
operating an OpenStack cloud.
* - `Community Application Catalog`_
- Download this LAMP/WordPress sample application and other free
OpenStack applications here.
* - `Apache Hadoop project`_
- The de facto standard open source framework for Big Data analytics,
used in this reference architecture.
* - `Apache Ambari project`_
- This reference architecture and files deploy Big Data using Ambari, an
open source package for installing, configuring and managing a Hadoop
cluster.
* - `Welcome to the community!`_
- Join mailing lists and IRC chat channels, find jobs and events, access
the source code and more.
* - `User groups`_
- Find a user group near you, attend meetups and hackathons—or organize
one!
* - `OpenStack events`_
- Global schedule of events including the popular OpenStack Summits and
regional OpenStack Days.
.. _OpenStack Marketplace: http://www.openstack.org/marketplace/
.. _OpenStack Architecture Design Guide: http://docs.openstack.org/arch-design/
.. _OpenStack Networking Guide: http://docs.openstack.org/mitaka/networking-guide/
.. _OpenStack Virtual Machine Image Guide: http://docs.openstack.org/image-guide/
.. _Complete OpenStack Documentation: http://docs.openstack.org/
.. _Community Application Catalog: http://apps.openstack.org/
.. _Apache Ambari project: http://ambari.apache.org/
.. _Welcome to the community!: http://www.openstack.org/community/
.. _User groups: https://groups.openstack.org/
.. _OpenStack events: http://www.openstack.org/community/events/

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

View File

@ -0,0 +1,881 @@
### Heat Template ###
heat_template_version: 2014-10-16
description: >
Generated template
parameters:
network_external_for_floating_ip:
default: 38a4e580-e368-4404-a2e0-cbef9343740e
description: Network to allocate floating IP from
type: string
network_router_0_external:
default: 38a4e580-e368-4404-a2e0-cbef9343740e
description: Router external network
type: string
network_router_1_external:
default: 38a4e580-e368-4404-a2e0-cbef9343740e
description: Router external network
type: string
network_router_2_external:
default: 38a4e580-e368-4404-a2e0-cbef9343740e
description: Router external network
type: string
image_ubuntu:
default: a808eacb-ab6f-4929-873d-be3ae8535f0d
description: An Ubuntu cloud image (glance image id) to use for all server
type: string
flavor_edge:
default: l1.medium
description: Flavor to use for edge server
type: string
flavor_master:
default: l1.medium
description: Flavor to use for master server
type: string
flavor_data:
default: l1.medium
description: Flavor to use for worker server
type: string
flavor_repo:
default: l1.medium
description: Flavor to use for repository server
type: string
config_dns_nameserver:
default: 8.8.8.8
description: DNS Server for external Access (Temporary)
type: string
resources:
deploymentscript:
type: OS::Heat::SoftwareConfig
properties:
inputs:
- name: previous
default: 'NONE'
group: script
config:
str_replace:
params:
$variable1: "Test"
template: |
#!/bin/bash
case $(hostname) in
*edge*)
SYSTEMTYPE="edge";
;;
*master*)
SYSTEMTYPE="master";
;;
*data*)
SYSTEMTYPE="data";
;;
*repo*)
SYSTEMTYPE="repo";
;;
*)
SYSTEMTYPE="nothing";
;;
esac
FULLHOSTNAME=$(curl http://169.254.169.254/latest/meta-data/hostname)
SHORTHOSTNAME=$(echo $FULLHOSTNAME | awk -F'.' {'print $1'})
DOMAIN=$(echo $FULLHOSTNAME | awk -F'.' {'print $NF'})
MASTERNODE=master-node
function issue_start {
echo ${@}: started >> /etc/issue
}
function issue_end {
if [ "$1" -eq "0" ]; then
echo ${@:2}: success >> /etc/issue
else
echo ${@:2}: failed >> /etc/issue
fi
}
function set_local_hosts {
# Set hostname
ip -o a | grep "inet " | grep -v "^1: lo" | awk -F"/" {'print $1'} | awk {'print $4 " HOSTNAME-"$2".DOMAIN HOSTNAME-"$2'} | sed s/HOSTNAME/$HOSTNAME/g | sed s/DOMAIN/$DOMAIN/g > /mnt/shared/host-$HOSTNAME.txt
# Change eth to networkname
COUNT=0;
for i in ${@}; do
sed -i s/eth${COUNT}/$i/g /mnt/shared/host-$HOSTNAME.txt
COUNT=$(($COUNT + 1));
done
sed -i s/-Cluster-Network//g /mnt/shared/host-$HOSTNAME.txt
}
if [ "$SYSTEMTYPE" == "repo" ]; then
issue_start nfsserver
apt-get -y install nfs-server
mkdir /shared
chmod 777 /shared
echo "/shared *(rw)" >> /etc/exports
service nfs-kernel-server start
issue_end $? nfsserver
# Set SSH Key
ssh-keygen -b 4096 -t rsa -f /root/.ssh/id_rsa -N ''
cp -rp /root/.ssh/id_rsa.pub /shared
fi
cp -rp /etc/issue /etc/issue.orig
issue_start GroupCheck
echo "SYSTEMTYPE: $SYSTEMTYPE" >> /root/output.txt
echo "params: $variable1" >> /root/output.txt
issue_end $? GroupCheck
# Format Partition
issue_start Prepare /dev/vdb
mkfs.ext4 /dev/vdb
# /hadoop
mkdir /hadoop
echo "/dev/vdb /hadoop ext4 defaults 0 0" >> /etc/fstab
mount /hadoop
issue_end $? Prepare /dev/vdb
# Set multiple network adapters
issue_start dhclient
ip a | grep mtu | grep -v lo: | awk {'print "dhclient "$2'} | sed s/:$//g | bash
issue_end $? dhclient
issue_start set ulimits
cat << EOF >> /etc/security/limits.conf
* - nofile 32768
* - nproc 65536
EOF
issue_end $? set ulimits
issue_start deactivate transparent huge pages
cat << EOF > /etc/rc.local
#!/bin/bash
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
echo "never" > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
echo "never" > /sys/kernel/mm/transparent_hugepage/defrag
fi
EOF
/bin/bash /etc/rc.local
issue_end $? deactivate transparent huge pages
# Mount NFS Share
issue_start mount nfs share
apt-get -y install nfs-common
mkdir /mnt/shared
# Check if mount is available
while [ ! "$(showmount -e 10.20.7.5)" ]; do
issue_end 1 mount nfs share: not available at present
done
mount 10.20.7.5:/shared /mnt/shared
issue_end $? mount nfs share
# Set Admin SSH Key for easy access
issue_start set admin ssh key
cat /mnt/shared/id_rsa.pub >> /root/.ssh/authorized_keys
issue_end $? set admin ssh key
# Save Hostnames to /mnt/shared
issue_start gathering hostnames
case $SYSTEMTYPE in
edge)
set_local_hosts admin Cluster-Network edge
;;
master)
set_local_hosts admin Cluster-Network Object-Storage-Connect-Network Management
;;
data)
set_local_hosts admin Cluster-Network Object-Storage-Connect-Network Management
;;
repo)
set_local_hosts admin Cluster-Network edge
;;
*)
set_local_hosts normal
;;
esac
issue_end $? gathering hostnames
# Set local /etc/hosts
issue_start hosts_localhost
echo "127.0.0.1 $FULLHOSTNAME $SHORTHOSTNAME" >> /etc/hosts
issue_end $? hosts_localhost
# Configure Name Server
#issue_start nameserver
#echo "nameserver 8.8.8.8" > /etc/resolv.conf
#issue_end $? nameserver
# Configure Time-Server
issue_start Install ntp
apt-get -y install ntp
issue_end $? Install ntp
# Deactivate Swappiness
issue_start Deactivate swappiness
echo "vm.swappiness=1" >> /etc/sysctl.conf
sysctl -w vm.swappiness=1
issue_end $? Deactivate swappiness
# Activate Hortonworks Repository
issue_start Installation ambari-agent
wget -nv http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.4.0.1/ambari.list -O /etc/apt/sources.list.d/ambari.list
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD
apt-get update
apt-get -y install ambari-agent
sed -i s/hostname=localhost/hostname=${MASTERNODE}.$DOMAIN/g /etc/ambari-agent/conf/ambari-agent.ini
issue_end $? Installation ambari-agent
# Install Java 1.8
issue_start java
echo "\n" | add-apt-repository ppa:webupd8team/java
apt-get update
# Accept Licence
echo debconf shared/accepted-oracle-license-v1-1 select true | debconf-set-selections
echo debconf shared/accepted-oracle-license-v1-1 seen true | debconf-set-selections
apt-get -y install oracle-java8-installer
issue_end $? java
# Set all /etc/hosts
issue_start hosts
cp -rp /etc/hosts /tmp/hosts-original
cat /tmp/hosts-original | grep -v "127.0.0.1 $FULLHOSTNAME" > /etc/hosts
cat /mnt/shared/host*.txt >> /etc/hosts
issue_end $? hosts
###################### Individual parts ######################
if [ "$SYSTEMTYPE" == "master" ]; then
issue_start ambari-server
apt-get -y install ambari-server expect
JAVA_HOME="/usr/lib/jvm/java-8-oracle/jre/"
SETUP_AMBARI=$(expect -c "
set timeout 60
spawn ambari-server setup -j $JAVA_HOME
expect \"Customize user account for ambari-server daemon\" {send \"n\r\"}
expect \"Enter advanced database configuration\" {send \"n\r\"}
expect eof
")
echo "${SETUP_AMBARI}"
touch /mnt/shared/ambari-server-installed.txt
service ambari-server start
issue_end $? ambari-server
fi
if [ "$SYSTEMTYPE" == "repo" ]; then
issue_start puppetmaster
apt-get -y install puppetmaster
issue_end $? puppetmaster
fi
issue_start Start Ambari Agent
# Start ambari Agent
# Checks if /mnt/shared/ambari-server-installed.txt exists
while [ ! "$(ls /mnt/shared/ambari-server-installed.txt)" ]; do
issue_end 1 Check if Ambaris Server is installed $(date)
sleep 60
done
service ambari-agent start
issue_end $? Start Ambari Agent
issue_end 0 Finished
volume_0:
properties:
metadata:
attached_mode: rw
readonly: 'False'
bootable: 'False'
size: 10
type: OS::Cinder::Volume
volume_1:
properties:
metadata:
attached_mode: rw
readonly: 'False'
bootable: 'False'
size: 10
type: OS::Cinder::Volume
volume_2:
properties:
metadata:
attached_mode: rw
readonly: 'False'
bootable: 'False'
size: 10
type: OS::Cinder::Volume
volume_3:
properties:
metadata:
attached_mode: rw
readonly: 'False'
bootable: 'False'
size: 10
type: OS::Cinder::Volume
volume_4:
properties:
metadata:
attached_mode: rw
readonly: 'False'
bootable: 'False'
size: 10
type: OS::Cinder::Volume
volume_5:
properties:
metadata:
attached_mode: rw
readonly: 'False'
bootable: 'False'
size: 10
type: OS::Cinder::Volume
floatingip_0:
properties:
floating_network_id:
get_param: network_external_for_floating_ip
type: OS::Neutron::FloatingIP
key_0:
properties:
name: demo1
public_key: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDayVuy2lZ11GuFVQmA402tZvDl7CopLCSPNZn/IqVvdA5A4XtocQnkZVUegQYJ8XMz9RMPAi/0LreUQbaS4/mSDtjAs0GupAbFeMumjzlwdmZEmgCO+iEwkawmXiARV/7A1qZT+5WP7hVJk9svQv2BAiHiXugGQPx4TlRCnMOJZf3T5LmIeNh1XgzWpcmj7NX97hs12iiIBu7HWALgyrp5qshZo0y1vxnedSIQgwnOQiFx0/fUAL7k1pioE7fe88rwQegMDibSeTvDgABLhJUOtC6Gv8kp02XuoOoAecrlqIRfBASQQf7aaNs9oIBiJ4U6Jt6ladHlB/fKpqMbPllf
type: OS::Nova::KeyPair
network_1:
properties:
admin_state_up: true
name: Cluster-Network
shared: false
type: OS::Neutron::Net
subnet_1:
properties:
allocation_pools:
- end: 10.20.1.100
start: 10.20.1.10
cidr: 10.20.1.0/24
dns_nameservers: [ {get_param: config_dns_nameserver} ]
enable_dhcp: true
host_routes: []
ip_version: 4
name: subCluster-Network
network_id:
get_resource: network_1
type: OS::Neutron::Subnet
network_2:
properties:
admin_state_up: true
name: Object-Storage-Connect-Network
shared: false
type: OS::Neutron::Net
subnet_2:
properties:
allocation_pools:
- end: 10.20.2.100
start: 10.20.2.10
cidr: 10.20.2.0/24
dns_nameservers: [ {get_param: config_dns_nameserver} ]
enable_dhcp: true
host_routes: []
ip_version: 4
name: subObject-Storage-Connect-Network
network_id:
get_resource: network_2
type: OS::Neutron::Subnet
network_3:
properties:
admin_state_up: true
name: Object-Storage-Cluster-Network
shared: false
type: OS::Neutron::Net
subnet_3:
properties:
allocation_pools:
- end: 10.20.3.100
start: 10.20.3.10
cidr: 10.20.3.0/24
dns_nameservers: [ {get_param: config_dns_nameserver} ]
enable_dhcp: true
host_routes: []
ip_version: 4
name: subObject-Storage-Cluster-Network
network_id:
get_resource: network_3
type: OS::Neutron::Subnet
network_4:
properties:
admin_state_up: true
name: Management
shared: false
type: OS::Neutron::Net
subnet_4:
properties:
allocation_pools:
- end: 10.20.4.100
start: 10.20.4.10
cidr: 10.20.4.0/24
dns_nameservers: [ {get_param: config_dns_nameserver} ]
enable_dhcp: true
host_routes: []
ip_version: 4
name: subManagement
network_id:
get_resource: network_4
type: OS::Neutron::Subnet
network_5:
properties:
admin_state_up: true
name: Storage-Access-Network
shared: false
type: OS::Neutron::Net
subnet_5:
properties:
allocation_pools:
- end: 10.20.5.100
start: 10.20.5.10
cidr: 10.20.5.0/24
dns_nameservers: [ {get_param: config_dns_nameserver} ]
enable_dhcp: true
host_routes: []
ip_version: 4
name: subStorage-Access-Network
network_id:
get_resource: network_5
type: OS::Neutron::Subnet
network_6:
properties:
admin_state_up: true
name: Edge
shared: false
type: OS::Neutron::Net
subnet_6:
properties:
allocation_pools:
- end: 10.20.6.100
start: 10.20.6.10
cidr: 10.20.6.0/24
dns_nameservers: [ {get_param: config_dns_nameserver} ]
enable_dhcp: true
host_routes: []
ip_version: 4
name: subEdge
network_id:
get_resource: network_6
type: OS::Neutron::Subnet
network_7:
properties:
admin_state_up: true
name: Admin
shared: false
type: OS::Neutron::Net
subnet_7:
properties:
allocation_pools:
- end: 10.20.7.100
start: 10.20.7.10
cidr: 10.20.7.0/24
dns_nameservers: [ {get_param: config_dns_nameserver} ]
enable_dhcp: true
host_routes: []
ip_version: 4
name: subAdmin
network_id:
get_resource: network_7
type: OS::Neutron::Subnet
router_0:
properties:
admin_state_up: true
name: Router_Storage
type: OS::Neutron::Router
router_0_gateway:
properties:
network_id:
get_param: network_router_0_external
router_id:
get_resource: router_0
type: OS::Neutron::RouterGateway
router_0_interface_0:
properties:
router_id:
get_resource: router_0
subnet_id:
get_resource: subnet_3
type: OS::Neutron::RouterInterface
router_1:
properties:
admin_state_up: true
name: Router_Ext
type: OS::Neutron::Router
router_1_gateway:
properties:
network_id:
get_param: network_router_1_external
router_id:
get_resource: router_1
type: OS::Neutron::RouterGateway
router_1_interface_0:
properties:
router_id:
get_resource: router_1
subnet_id:
get_resource: subnet_5
type: OS::Neutron::RouterInterface
router_2:
properties:
admin_state_up: true
name: Router_Admin
type: OS::Neutron::Router
router_2_gateway:
properties:
network_id:
get_param: network_router_2_external
router_id:
get_resource: router_2
type: OS::Neutron::RouterGateway
router_2_interface_0:
properties:
router_id:
get_resource: router_2
subnet_id:
get_resource: subnet_7
type: OS::Neutron::RouterInterface
security_group_0:
properties:
description: ''
name: master
rules:
- direction: ingress
ethertype: IPv4
protocol: icmp
remote_ip_prefix: 0.0.0.0/0
- direction: egress
ethertype: IPv6
- direction: ingress
ethertype: IPv4
port_range_max: 65535
port_range_min: 1
protocol: udp
remote_ip_prefix: 0.0.0.0/0
- direction: egress
ethertype: IPv4
- direction: ingress
ethertype: IPv4
port_range_max: 65535
port_range_min: 1
protocol: tcp
remote_ip_prefix: 0.0.0.0/0
type: OS::Neutron::SecurityGroup
security_group_1:
properties:
description: ''
name: data
rules:
- direction: ingress
ethertype: IPv4
protocol: icmp
remote_ip_prefix: 0.0.0.0/0
- direction: egress
ethertype: IPv6
- direction: ingress
ethertype: IPv4
port_range_max: 65535
port_range_min: 1
protocol: udp
remote_ip_prefix: 0.0.0.0/0
- direction: egress
ethertype: IPv4
- direction: ingress
ethertype: IPv4
port_range_max: 65535
port_range_min: 1
protocol: tcp
remote_ip_prefix: 0.0.0.0/0
type: OS::Neutron::SecurityGroup
security_group_3:
properties:
description: ''
name: edge
rules:
- direction: ingress
ethertype: IPv4
protocol: icmp
remote_ip_prefix: 0.0.0.0/0
- direction: ingress
ethertype: IPv4
port_range_max: 65535
port_range_min: 1
protocol: tcp
remote_ip_prefix: 0.0.0.0/0
- direction: egress
ethertype: IPv4
- direction: ingress
ethertype: IPv4
port_range_max: 65535
port_range_min: 1
protocol: udp
remote_ip_prefix: 0.0.0.0/0
- direction: egress
ethertype: IPv6
type: OS::Neutron::SecurityGroup
security_group_6:
properties:
description: ''
name: Admin
rules:
- direction: ingress
ethertype: IPv4
protocol: icmp
remote_ip_prefix: 0.0.0.0/0
- direction: egress
ethertype: IPv6
- direction: ingress
ethertype: IPv4
port_range_max: 65535
port_range_min: 1
protocol: udp
remote_ip_prefix: 0.0.0.0/0
- direction: egress
ethertype: IPv4
- direction: ingress
ethertype: IPv4
port_range_max: 65535
port_range_min: 1
protocol: tcp
remote_ip_prefix: 0.0.0.0/0
type: OS::Neutron::SecurityGroup
server_0:
type: OS::Nova::Server
depends_on: [ volume_0, subnet_1, subnet_2, subnet_3, subnet_4, subnet_5, subnet_6, subnet_7, server_5 ]
properties:
name: data-node-3
diskConfig: AUTO
flavor:
get_param: flavor_data
image:
get_param: image_ubuntu
key_name:
get_resource: key_0
networks:
- network:
get_resource: network_7
- network:
get_resource: network_1
- network:
get_resource: network_2
- network:
get_resource: network_3
security_groups:
- get_resource: security_group_1
block_device_mapping_v2:
- device_name: /dev/vdb
boot_index: 1
volume_id:
get_resource: volume_0
user_data_format: SOFTWARE_CONFIG
user_data: {get_resource: deploymentscript}
server_1:
type: OS::Nova::Server
depends_on: [ volume_1, subnet_1, subnet_2, subnet_3, subnet_4, subnet_5, subnet_6, subnet_7, server_5 ]
properties:
name: data-node-2
diskConfig: AUTO
flavor:
get_param: flavor_data
image:
get_param: image_ubuntu
key_name:
get_resource: key_0
networks:
- network:
get_resource: network_7
- network:
get_resource: network_1
- network:
get_resource: network_2
- network:
get_resource: network_4
security_groups:
- get_resource: security_group_1
block_device_mapping_v2:
- device_name: /dev/vdb
boot_index: 1
volume_id:
get_resource: volume_1
user_data_format: SOFTWARE_CONFIG
user_data: {get_resource: deploymentscript}
server_2:
type: OS::Nova::Server
depends_on: [ volume_2, subnet_1, subnet_2, subnet_3, subnet_4, subnet_5, subnet_6, subnet_7, server_5 ]
properties:
name: data-node-1
diskConfig: AUTO
flavor:
get_param: flavor_data
image:
get_param: image_ubuntu
key_name:
get_resource: key_0
networks:
- network:
get_resource: network_7
- network:
get_resource: network_1
- network:
get_resource: network_2
- network:
get_resource: network_4
security_groups:
- get_resource: security_group_1
block_device_mapping_v2:
- device_name: /dev/vdb
boot_index: 1
volume_id:
get_resource: volume_2
user_data_format: SOFTWARE_CONFIG
user_data: {get_resource: deploymentscript}
server_3:
type: OS::Nova::Server
depends_on: [ volume_3, subnet_1, subnet_2, subnet_3, subnet_4, subnet_5, subnet_6, subnet_7, server_5 ]
properties:
name: master-node
diskConfig: AUTO
flavor:
get_param: flavor_master
image:
get_param: image_ubuntu
key_name:
get_resource: key_0
networks:
- network:
get_resource: network_7
- network:
get_resource: network_1
- network:
get_resource: network_2
- network:
get_resource: network_4
security_groups:
- get_resource: security_group_0
block_device_mapping_v2:
- device_name: /dev/vdb
boot_index: 1
volume_id:
get_resource: volume_3
user_data_format: SOFTWARE_CONFIG
user_data: {get_resource: deploymentscript}
server_4:
type: OS::Nova::Server
depends_on: [ volume_4, subnet_1, subnet_2, subnet_3, subnet_4, subnet_5, subnet_6, subnet_7, server_5 ]
properties:
name: edge-server
diskConfig: AUTO
flavor:
get_param: flavor_edge
image:
get_param: image_ubuntu
key_name:
get_resource: key_0
networks:
- network:
get_resource: network_7
- network:
get_resource: network_1
- network:
get_resource: network_6
security_groups:
- get_resource: security_group_3
block_device_mapping_v2:
- device_name: /dev/vdb
boot_index: 1
volume_id:
get_resource: volume_4
user_data_format: SOFTWARE_CONFIG
user_data: {get_resource: deploymentscript}
server_5:
type: OS::Nova::Server
depends_on: [ volume_5, subnet_1, subnet_2, subnet_3, subnet_4, subnet_5, subnet_6, subnet_7 ]
properties:
name: repo-server
diskConfig: AUTO
flavor:
get_param: flavor_repo
image:
get_param: image_ubuntu
key_name:
get_resource: key_0
networks:
- port:
get_resource: server_5_port_admin
- network:
get_resource: network_1
- network:
get_resource: network_6
block_device_mapping_v2:
- device_name: /dev/vdb
boot_index: 1
volume_id:
get_resource: volume_5
user_data_format: SOFTWARE_CONFIG
user_data: {get_resource: deploymentscript}
server_5_port_admin:
type: OS::Neutron::Port
properties:
network_id: { get_resource: network_7 }
security_groups:
- get_resource: security_group_6
fixed_ips:
- subnet_id: { get_resource: subnet_7 }
ip_address: 10.20.7.5

View File

@ -0,0 +1,103 @@
Big Data Sample Heat Template
==============================
This heat templates deploy a Hadoop cluster with Apache Ambari.
Ambari is the central management service for Open Source Hadoop. It provides
central administration and management functionality via a web UI. In this
example, the Ambari service is installed on the MasterNode and an Ambari agent
is deployed on each DataNode in the cluster. This provides communication and
authentication functionality between the Hadoop cluster nodes.
**Type of roles in this Hadoop cluster**
====== ==================================================================
Role Details
====== ==================================================================
Master Master Node (aka Name Node) - this node houses the cluster-wide
management services that provide the internal functionality to manage
the Hadoop cluster and its resources.
Data Data Nodes services used for managing and analyzing the data,
stored in HDFS, are located on these nodes. Analytics jobs access and
compute the data on the Data Nodes.
Edge Services used to access the cluster environment or the data outside
the cluster are on this node. For security, direct user access to the
Hadoop cluster should be minimized. Users can access the cluster via
the command line interface (CLI) from the Edge Node. All data-import
and data-export processes can be channeled on one or more Edge Nodes.
Admin Administrative Server - Used for system-wide administration.
====== ==================================================================
This template provision a small testing environment which demonstrate the
deployment of a Hadoop cluster in an OpenStack cloud environment. The
default settings used in this template should not be used without changes
in a production environment. Users are advised to change the settings that
fit in their own environment.
This template was tested using Mitaka & Liberty release of OpenStack.
-----------------
Heat File Details
-----------------
This template requires a few standard components such as an Ubuntu cloud image
and an external network for internet access.
The template prepares a few resources that are required by the Hadoop
deployment.
Multiple Cinder volumes are created for the Hadoop filesystem.
For simplicity, every node is attached with a Cinder volume with a default size
in this example.
Multiple Neutron subnets are created. This includes:
================== ======================
Subnet Details
================== ======================
Cluster Network Provides inter-node communication for the Hadoop cluster.
Data Network Provides a dedicated network for accessing the object
storage within an OpenStack Swift environment or to an
external object storage such as Amazon S3. This is
optional if object storage is not used.
Management Network Provides a dedicated network for accessing the Hadoop
nodes' operating system for maintenance and monitoring
purposes.
Edge Network Provides connectivity to the client-facing and enterprise
IT network. End users are accessing the Hadoop cluster
through this network.
================== ======================
Multiple routers are created to route the traffic between subnets.
Other networks can also be created depending on your specific needs.
Security Groups are defined and attached to every Node in the cluster.
Custom rules can be created for different types of nodes to allow/deny
traffic from certain protocols, ports or IP address ranges.
Next, the template creates a few servers of different roles (Master, Data,
Edge, Admin). An Ubuntu 14.04 cloud image is assumed to be used as the default
operating system of each servers.
When the server is booted, additional packages (depending on roles) are
installed and configured on each server. In this example, the Apache Ambari
is installed and all systems are configured with name server, ntp,
package repositories and other necessary settings for the Apache Ambari
service.
The Ambari Web UI can be accessed by pointing to the MasterNode's
IP address at port 8080. A Floating IP can be associated to the MasterNode.
-------------------------------
Running the heat template files
-------------------------------
You need to source the OpenStack credential file. You may download a copy of
the credential file from Horizon under Project>Compute>Access & Security>API
Access
Prior to running the template, please edit and change the default value of each
parameters to the one that match your own environment.
**Example to setup the Hadoop cluster environment**::
openstack stack create --template BigData.yaml HadoopCluster