Edward Capriolo

Sunday Apr 01, 2012

Redundant failover capable hive thrift

A while back Nathan covered a blog on production-izing-hive-thrift. Well as you know the #1 and #2 considerations in my life are scalability and redundancy.

We were doing too much (as in at all) bash + "hive -e" and I was happy to get to the future with hive-service. There is one new consideration however, running one instance hive-service is not enough. All the jobs will launch from a single node which could cause utilization to spike. Second, if that node hangs up or goes down there is going to be a world of hurt.

Step 1: Run two hive thrift severs. This is easy, just follow Nathan's instructions two or more times. The result:


(http://www.askmen.com)

Next we can call our networking guys to setup a TCP based load balancer or setup a proxy. In this case because hive thrift is not very network intensive I went with a simple HA-proxy configuration.

/etc/haproxy/rhiveserver.cfg
listen hive-primary 10.71.74.218:10000
balance leastconn
mode tcp
server hivethrift1 rs06.hadoop.pvt:10000 check
server hivethrift2 rs07.hadoop.pvt:10000 check

Now here is the kicker. What if the node with HA-proxy fails? Isn't that a single point of failure? Yes it is a single point of failure, but since this service has no real state we can fail it over to a new node fairly quickly with linux-ha. Linux-ha allows us to manage resources across clusters of servers.

Many people who deploy linux-ha still deploy it in the deprecated V1 mode where all the resources must live on one side of the other on a two node clusters. If you are one of those people it is time to man-up and switch to V2. V2 allows resources to run independently across N node clusters.

The IPAddr2 resource agent is built into linux-ha. The HAProxy OCF can be downloaded separately at  https://github.com/russki/cluster-agents.

In linux-ha groups move together and start up in order. That means in this case the VIP and haproxy instance move together and start in a specific order. (They also stop in the reverse order).

$crm 
crm# config
crm(live)configure# show
primitive haproxy_10000 ocf:heartbeat:HAProxy \
        params config="/etc/haproxy/rhiveserver.cfg" pid="/var/run/haproxy_rhive.pid" proxy="/usr/sbin/haproxy"
primitive ip_ha_hiveserver ocf:heartbeat:IPaddr2 \
        params ip="10.71.74.218" cidr_netmask="16" nic="eth1"
group ha_hiveserver_grp ip_ha_hiveserver haproxy_10000 \
meta target-role="Started"

Then we can use the crm_mon tool to interactively watch where the resource comes up.

[root@rs02 ~]# crm_mon -1
============
Last updated: Sun Apr  1 11:36:05 2012
Stack: Heartbeat
Current DC: rs02.hadoop.pvt (3fa9313f-0e32-4277-be23-b71fdd8af14d) - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, unknown expected votes
7 Resources configured.
============

Online: [ rs03.hadoop.pvt rs02.hadoop.pvt ]

 Resource Group: ha_hiveserver_grp
     ip_ha_hiveserver    (ocf::heartbeat:IPaddr2):    Started rs03.hadoop.pvt
     haproxy_10000    (ocf::heartbeat:HAProxy):    Started rs03.hadoop.pvt

 

Users can reach hive-service on the redundant failover capable vip on 10.71.74.218.

Mission Accomplished!


Comments:

Hi In my case, i have 2 hive servers(A & B) and 1 HAproxyserver. it works fine. But.., if A(Primary) is down and B is active . In this case am getting an SQL Exception when am executing a query from a Java code which connects to hive database. Instead HAProxy should redirect the request to secondary active hive server. T Plz let me know how this is handled.

Posted by Sathya Narayanan on December 13, 2012 at 02:30 AM EST #

[Trackback] ?? ?????? HiveServer ?? hive ????????????????? sql

Posted by Confluence: ??? on September 24, 2013 at 12:07 PM EDT #

[Trackback] ?? ?????? HiveServer ?? hive ????????????????? sql

Posted by Confluence: ??? on September 24, 2013 at 12:08 PM EDT #

[Trackback] ?? ?????? HiveServer ?? hive ????????????????? sql

Posted by Confluence: ??? on September 24, 2013 at 12:09 PM EDT #

[Trackback] ?? ?????? HiveServer ?? hive ????????????????? sql

Posted by Confluence: ??? on September 24, 2013 at 12:11 PM EDT #

[Trackback] ?? ?????? HiveServer ?? hive ????????????????? sql

Posted by Confluence: ??? on September 25, 2013 at 02:20 AM EDT #

[Trackback] ?? ?????? HiveServer ?? hive ????????????????? sql

Posted by Confluence: ??? on September 25, 2013 at 02:28 AM EDT #

[Trackback] ?? ?????? HiveServer ?? hive ????????????????? sql

Posted by Confluence: ??? on September 25, 2013 at 11:21 AM EDT #

[Trackback] ?? ?????? HiveServer ?? hive ????????????????? sql

Posted by Confluence: ??? on September 25, 2013 at 11:22 AM EDT #

[Trackback] ?? ?????? HiveServer ?? hive ????????????????? sql

Posted by Confluence: ??? on September 25, 2013 at 11:22 AM EDT #

[Trackback] tck ?? ?????? HiveServer ?? hive ?????????????????

Posted by Confluence: ??? on September 29, 2013 at 02:57 AM EDT #

[Trackback] tck ?? ?????? HiveServer ?? hive ?????????????????

Posted by Confluence: ??? on October 17, 2013 at 04:27 AM EDT #

Post a Comment:
Comments are closed for this entry.

Calendar

Feeds

Search

Links

Navigation