Hive via Puppet
So we know that Hive (http://hadoop.apache.org/hive/) moves pretty fast. We would all like to be running solid releases but the reality of it all is that releases are maybe two times a year and new features and fixes are being added almost on a daily basis.
We have some special requirements, some of our nodes need to be able to launch jobs on production or research (dev). One binary of hive can actually serve both production and research.
Hive actually 'sucks up' hadoop from the environment and can optionally take hive configuration through a switch.
One way to do see that is like so:
export HADOOP_HOME=/opt/hadoop-0.20-shell-dev
exec /opt/hive-0.6.0/bin/hive --config /opt/hive-conf/conf-dev
/opt/hadoop-0.20-shell-dev is my "client" install. jars,confs, and binary files only.
Two folders /opt/hive-conf/conf-dev and /opt/hive-conf/conf-prod are where the hive configuration files (hive-site.xml) live.
site.pp
class hive_configuration {
file { "/opt/hive-conf" :
owner => root,
group => root,
path => "/opt/hive-conf",
source => "puppet:///mainfiles/hadoop/hive-conf",
recurse => true ,
purge => true
}
}
class hadoop_client_prod {
file {
"/opt/hadoop-0.20-shell":
owner => root,
group => root,
path => "/opt/hadoop-0.20-shell",
source => "puppet:///mainfiles/hadoop/hadoop-0.20-shell",
recurse => true,
purge => true
}
}
class hadoop_client_dev {
file {
"/opt/hadoop-0.20-shell-dev":
owner => root,
group => root,
path => "/opt/hadoop-0.20-shell-dev",
source => "puppet:///mainfiles/hadoop/hadoop-0.20-shell-dev",
recurse => true ,
purge => true
}
}
Now I build hive from trunk. NO I am not going to talk about doing that but I ran ant package and moved build/dist to a folder named hive-0.6.0-957988.
class hive_0_6_0 {
file { "/opt/hive-0.6.0" :
owner => root,
group => root,
path => "/opt/hive-0.6.0" ,
source => "puppet:///mainfiles/hadoop/hive-0.6.0-957988",
recurse => true,
purge => true
}
file { "/opt/hive-0.6.0/bin":
mode => 755,
source => "puppet:///mainfiles/hadoop/hive-0.6.0-957988/bin",
require => File [ "/opt/hive-0.6.0" ],
recurse => true
}
}
Now notice i pushed this out to /opt/hive-0.6.0. I am not going to maintain multiple pre-trunk branches so this is fine.
As I mentioned above you can launch hive like:
export HADOOP_HOME=/opt/hadoop-0.20-shell-dev
exec /opt/hive-0.6.0/bin/hive --config /opt/hive-conf/conf-dev
But come on be nice to your users, this can be made easier.
file { "/usr/bin/pig-prod":
mode => 755,
content => "exec /opt/pig-0.6.0/bin/pig-prod \"\$@\""
}
file { "/usr/bin/hive-prod":
mode => 755,
content =>
"export HADOOP_HOME=/opt/hadoop-0.20-shell
exec /opt/hive-0.6.0/bin/hive --config /opt/hive-conf/conf-prod \"\$@\""
}
file { "/usr/bin/hive-dev":
mode => 755,
content =>
"export HADOOP_HOME=/opt/hadoop-0.20-shell-dev
exec /opt/hive-0.6.0/bin/hive --config /opt/hive-conf/conf-dev \"\$@\""
}
So this is cool. We can have puppet write tiny files for us on the fly rather then serving them with puppet. I am just full of tricks aren't I?
So for the sake of completeness:
node 'research.hadoop.pvt' {
include hadoop_client_dev
include hadoop_client_prod
include hive_configuration
include hive_0_6_0
file { "/usr/bin/hive-prod":
mode => 755,
content =>
"export HADOOP_HOME=/opt/hadoop-0.20-shell
exec /opt/hive-0.6.0/bin/hive --config /opt/hive-conf/conf-prod \"\$@\""
}
file { "/usr/bin/hive-dev":
mode => 755,
content =>
"export HADOOP_HOME=/opt/hadoop-0.20-shell-dev
exec /opt/hive-0.6.0/bin/hive --config /opt/hive-conf/conf-dev \"\$@\""
}
file { "/usr/bin/hive":
mode => 755,
content =>
"export HADOOP_HOME=/opt/hadoop-0.20-shell-dev
exec /opt/hive-0.6.0/bin/hive --config /opt/hive-conf/conf-dev \"\$@\""
}
}
/usr/bin/hive can also be a system dependant symlink.
So why did i do this? I have completely separated my configuration from my binary. I can pretty much update the binary in place or change /usr/bin/hive and new clients get the update. I only have to do it once and everything gets pushed out across the board by puppet.
Posted at 03:58PM Jul 13, 2010 by edwardcapriolo in General | Comments[0]