Hadoop SecondaryNameNode puppet
It is time for another instalment of ' Do this with puppet' this time we tackle the legendary and often uncared about SecondaryNameNode. Come on guys the SSN needs love too!
For some deployments the datanode/tasktracker conf could probably be the same as the snn configuration but in most cases, it is not. Thus I store those on the puppet fileserver in their own separate directory.
class hadoop_prod_conf {
file {
"/etc/hadoop-0.20/conf.m6":
owner => root,
group => root,
ensure => directory,
source => "puppet:///mainfiles/hadoop/hadoop_name_conf/conf.m6",
recurse => true,
require => File["/etc/alternatives/hadoop-0.20-conf"]
}
file { "/etc/alternatives/hadoop-0.20-conf" :
ensure => link,
target => "/etc/hadoop-0.20/conf.m6",
}
file { "/etc/hadoop-0.20/conf" :
ensure => link,
target => "/etc/alternatives/hadoop-0.20-conf",
}
file {
"/etc/hadoop-0.20/conf.m6/jmxremote.access":
owner => hadoop,
group => hadoop,
require => File[ "/etc/hadoop-0.20/conf.m6" ]
}
file {
"/etc/hadoop-0.20/conf.m6/jmxremote.password":
owner => hadoop,
group => hadoop,
require => File[ "/etc/hadoop-0.20/conf.m6" ]
}
}
Also as you can see I am doing some mumbo jumbo to make the alternatives system happy. This is just a factor of the way the upstream package was made. I could use puppet to rip it out, but I just through it would be interesting to keep it in and show that puppet can work with alternatives (somewhat) a buddy of mine suggested grapht? which is an alternative to alternatives :)
Now onto the fun part.
class hadoop_ssn {
file { [ "/usr/local/hadoop_root",
"/usr/local/hadoop_root/hdfs_master",
"/usr/local/hadoop_root/hdfs_master/dfs",
"/usr/local/hadoop_root/hdfs_master/dfs/namesecondary"
] :
owner => hadoop,
group => hadoop,
ensure => directory
}
package { [ "hadoop-0.20-secondarynamenode" ] :
ensure => installed ,
require => Yumrepo["m6-trusted"]
}
service { "hadoop-0.20-secondarynamenode":
enable => true,
ensure => true,
require => Package["hadoop-0.20-secondarynamenode"]
}
}
Here we use the puppet fil object to make sure the path the Secondary NameNode is going to write to is created and owned properly.
Then we tell puppet to install the secondary name node package (and all its dependencies), notice we require out YumRepo to be done first.
Finally we use the service resource so puppet 'chkconfig's' this file so it runs on startup.
We only now need to put a server in this class.
node 'ssn.jointhegrid.com' {
include pig
include hadoop_client_dev
include hadoop_client_prod
include hive_configuration
include hive_0_6_0
include standardrepos
include hadoop_prod_conf
include hadoop_ssn
file { "/usr/bin/pig":
mode => 755,
content => "exec /opt/pig-0.6.0/bin/pig-prod \"\$@\""
}
file { "/usr/bin/hive":
mode => 755,
content =>
"export HADOOP_HOME=/opt/hadoop-0.20-shell
exec /opt/hive-0.6.0/bin/hive --config /opt/hive-conf/conf-prod \"\$@\""
}
}
On the target server I can run:
puppetd --no-daemonize --onetime -v
Or take a lunch and let puppet handle this at next check-in :)
[hadoop-0.20]# /etc/init.d/hadoop-0.20-secondarynamenode start
Starting Hadoop secondarynamenode daemon (hadoop-secondarynamenode): starting secondarynamenode, logging to /usr/lib/hadoop-0.20/bin/../logs/hadoop-hadoop-secondarynamenode-jtg.out
[ OK ]
[ hadoop-0.20]# tail -f /var/log/hadoop-0.20/hadoop-hadoop-secondarynamenode-jtg.log
2010-07-15 15:07:58,995 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=SecondaryNameNode, sessionId=null
2010-07-15 15:07:59,315 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2010-07-15 15:07:59,382 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50090
Great success!
Posted at 03:10PM Jul 15, 2010 by edwardcapriolo in General | Comments[0]