Building Cloudera hadoop images with Cloudera Manager

* Add element hadoop-cloudera to install Cloudera hadoop packages
* Edit diskimage-create script to build Cloudera images

Partial implements blueprint cdh-plugin

Change-Id: I0368eb12a6c06c8fc900c0f9cca94758437de1df
This commit is contained in:
iberezovskiy 2014-07-11 15:09:14 +04:00
parent 69d2b6da36
commit 76ae53ad42
7 changed files with 122 additions and 6 deletions

View File

@ -1,13 +1,13 @@
Diskimage-builder script for creation cloud images
==================================================
This script builds Ubuntu, Fedora, CentOS cloud images for use in Sahara. By default the all plugin are targeted, all images will be built. The '-p' option can be used to select plugin (vanilla, spark or hdp). The '-i' option can be used to select image type (ubuntu, fedora or centos). The '-v' option can be used to select hadoop version (1, 2 or plain).
This script builds Ubuntu, Fedora, CentOS cloud images for use in Sahara. By default the all plugin are targeted, all images will be built. The '-p' option can be used to select plugin (vanilla, spark, hdp or cloudera). The '-i' option can be used to select image type (ubuntu, fedora or centos). The '-v' option can be used to select hadoop version (1, 2 or plain).
NOTE: You should use Ubuntu or Fedora host OS for building images, CentOS as a host OS has not been tested well.
For users:
1. Use your environment (export / setenv) to alter the scripts behavior. Environment variables the script accepts are 'DIB_HADOOP_VERSION_1' and 'DIB_HADOOP_VERSION_2', 'JAVA_DOWNLOAD_URL', 'JAVA_TARGET_LOCATION', 'OOZIE_DOWNLOAD_URL', 'HIVE_VERSION', 'ubuntu_[vanilla|spark]_hadoop_[1|2]_image_name', 'fedora_vanilla_hadoop_[1|2]_image_name', 'centos_[vanilla|hdp]_[hadoop_1|hadoop_2|plain]_image_name'.
1. Use your environment (export / setenv) to alter the scripts behavior. Environment variables the script accepts are 'DIB_HADOOP_VERSION_1' and 'DIB_HADOOP_VERSION_2', 'JAVA_DOWNLOAD_URL', 'JAVA_TARGET_LOCATION', 'OOZIE_DOWNLOAD_URL', 'HIVE_VERSION', 'ubuntu_[vanilla|spark|cloudera]_[hadoop_1|hadoop_2]_image_name', 'fedora_vanilla_hadoop_[1|2]_image_name', 'centos_[vanilla|hdp|cloudera]_[hadoop_1|hadoop_2|plain]_image_name'.
2. For creating all images just clone this repository and run script.
@ -25,7 +25,7 @@ For users:
.. sourcecode:: bash
sudo bash sahara-image-elements/diskimage-create/diskimage-create.sh -p [vanilla|spark|hdp]
sudo bash sahara-image-elements/diskimage-create/diskimage-create.sh -p [vanilla|spark|hdp|cloudera]
5. To select which hadoop version to target use the '-v' commandline option like this:
@ -44,8 +44,9 @@ NOTE for 4, 5, 6:
For Vanilla you can create ubuntu, fedora and centos cloud image with hadoop 1.x.x and 2.x.x versions. Use environment variables 'DIB_HADOOP_VERSION_1' and 'DIB_HADOOP_VERSION_2' to change defaults.
For Spark you can create only ubuntu image with one hadoop version. You shouldn't specify image type and hadoop version.
For HDP you can create only centos image with hadoop 1.3.0 or 2.0 and without hadoop ('plain' image). You shouldn't specify image type.
For Cloudera you can create ubuntu and centos images with preinstalled cloudera hadoop. You shouldn't specify hadoop version.
NOTE for CentOS images (for vanilla and hdp plugins):
NOTE for CentOS images (for vanilla, hdp and cloudera plugins):
Resizing disk space during firstboot on that images fails with errors (https://bugs.launchpad.net/sahara/+bug/1304100). So, you will get an instance that will have a small available disk space. To solve this problem we build images with 10G available disk space as default. If you need in more available disk space you should export parameter DIB_IMAGE_SIZE:

View File

@ -37,7 +37,7 @@ while getopts "p:i:v:d:m" opt; do
*)
echo
echo "Usage: $(basename $0)"
echo " [-p vanilla|spark|hdp]"
echo " [-p vanilla|spark|hdp|cloudera]"
echo " [-i ubuntu|fedora|centos]"
echo " [-v 1|2|plain]"
echo " [-d]"
@ -81,7 +81,7 @@ if [ "$DEBUG_MODE" = "true" -a "$platform" != 'NAME="Ubuntu"' ]; then
fi
fi
if [ -n "$PLUGIN" -a "$PLUGIN" != "vanilla" -a "$PLUGIN" != "spark" -a "$PLUGIN" != "hdp" ]; then
if [ -n "$PLUGIN" -a "$PLUGIN" != "vanilla" -a "$PLUGIN" != "spark" -a "$PLUGIN" != "hdp" -a "$PLUGIN" != "cloudera" ]; then
echo -e "Unknown plugin selected.\nAborting"
exit 1
fi
@ -101,6 +101,11 @@ if [ "$PLUGIN" = "vanilla" -a "$HADOOP_VERSION" = "plain" ]; then
exit 1
fi
if [ "$PLUGIN" = "cloudera" -a "$IMAGE_TYPE" = "fedora" ]; then
echo "Impossible combination.\nAborting"
exit 1
fi
#################
if [ "$platform" = 'NAME="Ubuntu"' ]; then
@ -363,5 +368,35 @@ if [ -z "$PLUGIN" -o "$PLUGIN" = "hdp" ]; then
unset BASE_IMAGE_FILE DIB_IMAGE_SIZE DIB_CLOUD_IMAGES
fi
if [ -z "$PLUGIN" -o "$PLUGIN" = "cloudera" ]; then
echo "For cloudera plugin option -v is ignored"
if [ -z "$IMAGE_TYPE" -o "$IMAGE_TYPE" = "ubuntu" ]; then
cloudera_ubuntu_image_name=${cloudera_ubuntu_image_name:-ubuntu_sahara_cloudera_latest}
cloudera_elements_sequence="base vm ubuntu hadoop-cloudera"
disk-image-create $cloudera_elements_sequence -o $cloudera_ubuntu_image_name
mv $cloudera_ubuntu_image_name.qcow2 ../
fi
if [ -z "$IMAGE_TYPE" -o "$IMAGE_TYPE" = "centos" ]; then
export DIB_IMAGE_SIZE=${IMAGE_SIZE:-"20"}
# CentOS cloud image:
# - Disable including 'base' element for CentOS
# - Export link and filename for CentOS cloud image to download
export BASE_IMAGE_FILE="CentOS-6.5-cloud-init.qcow2"
export DIB_CLOUD_IMAGES="http://sahara-files.mirantis.com"
cloudera_centos_image_name=${cloudera_centos_image_name:-centos_sahara_cloudera_latest}
cloudera_elements_sequence="base vm rhel hadoop-cloudera redhat-lsb selinux-permissive"
disk-image-create $cloudera_elements_sequence -n -o $cloudera_centos_image_name
mv $cloudera_centos_image_name.qcow2 ../
unset BASE_IMAGE_FILE DIB_IMAGE_SIZE DIB_CLOUD_IMAGES
fi
fi
popd # out of $TEMP
rm -rf $TEMP

View File

@ -0,0 +1,7 @@
Installs cloudera (cloudera-manager-agent cloudera-manager-daemons cloudera-manager-server cloudera-manager-server-db-2 hadoop-hdfs-namenode hadoop-hdfs-secondarynamenode hadoop-hdfs-datanode hadoop-yarn-resourcemanager hadoop-yarn-nodemanager hadoop-mapreduce hadoop-mapreduce-historyserver) and java (oracle-j2sdk1.7) packages from cloudera repositories `cdh5 <http://archive-primary.cloudera.com/cdh5/>`_ and `cm5 <http://archive-primary.cloudera.com/cm5>`_.
In order to create the Cloudera images with the diskimage-create.sh script, use the following syntax to select the "cloudera" plugin:
.. sourcecode:: bash
sudo bash diskimage-create.sh -p cloudera

View File

@ -0,0 +1 @@
ssh

View File

@ -0,0 +1,21 @@
#!/bin/bash
set -eux
if [ $(lsb_release -is) = 'Ubuntu' ]; then
export DEBIAN_FRONTEND=noninteractive
fi
install-packages cloudera-manager-agent \
cloudera-manager-daemons \
oracle-j2sdk1.7 \
cloudera-manager-server \
cloudera-manager-server-db-2 \
hadoop-hdfs-namenode \
hadoop-hdfs-secondarynamenode \
hadoop-hdfs-datanode \
hadoop-yarn-resourcemanager \
hadoop-yarn-nodemanager \
hadoop-mapreduce \
hadoop-mapreduce-historyserver \
oozie

View File

@ -0,0 +1,27 @@
#!/bin/bash
set -eux
for i in cloudera-scm-agent \
cloudera-scm-server \
cloudera-scm-server-db \
hadoop-hdfs-datanode \
hadoop-hdfs-namenode \
hadoop-hdfs-secondarynamenode \
hadoop-mapreduce-historyserver \
hadoop-yarn-nodemanager \
hadoop-yarn-resourcemanager \
oozie \
postgresql
do
if [ $(lsb_release -is) = 'Ubuntu' ]; then
update-rc.d -f $i remove
else
chkconfig $i off
fi
done
if [ $(lsb_release -is) = 'CentOS' ]; then
chkconfig iptables off
chkconfig ip6tables off
fi

View File

@ -0,0 +1,24 @@
#!/bin/bash
distro=$(lsb_release -is)
case $distro in
Ubuntu)
# Add repository with postgresql package (it's dependency of cloudera packages)
# Base image doesn't contain this repo
echo -e 'deb http://nova.clouds.archive.ubuntu.com/ubuntu/ precise universe multiverse main' >> /etc/apt/sources.list
# Cloudera repositories
wget -O /etc/apt/sources.list.d/cdh5.list http://archive-primary.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/cloudera.list
wget -qO - http://archive-primary.cloudera.com/cm5/ubuntu/precise/amd64/cm/archive.key | apt-key add -
wget -O /etc/apt/sources.list.d/cm5.list http://archive-primary.cloudera.com/cm5/ubuntu/precise/amd64/cm/cloudera.list
wget -qO - http://archive-primary.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/archive.key | apt-key add -
apt-get update
;;
CentOS)
wget -P /etc/yum.repos.d/ http://archive-primary.cloudera.com/cdh5/redhat/6/x86_64/cdh/cloudera-cdh5.repo
wget -P /etc/yum.repos.d/ http://archive-primary.cloudera.com/cm5/redhat/6/x86_64/cm/cloudera-manager.repo
;;
esac