Edward Capriolo

Friday Aug 10, 2012

Trying to find a fit for Yarn and Mesos

I have been following the development of Yarn and Mesos and done some tinkering over the past few months. If you have not ever heard of these projects get some information here:



I have a good conceptual understanding of what can be done with these projects, but I have some trouble fitting them into my current infrastructure. There is no one specific reason but more of a question of 'What can this thing do that puppet can not?'.

I look at Yarn and I see a tool with one use a tool to stand up 3 revs of hadoop on the same hardware mainly because the migration path off a release is ugly. This is something I can already do with configuration management.

I look at the mesos examples and I see a container that 'starts a jail, installs http and installs ha-proxy'. Again, something I can already do with configuration management.

Maybe I have just been standing up clusters for too long so everything looks the same to me, but in my own head I have trouble sorting these things out. The big questions are:

  1. Can a technology like yarn or mesos be used together with puppet or chef? 
    1. What at the best practices when using these two things together?
  2. In YARNs case. How many current software packages can yarn manage outside hadoop?
    1. MPI?
    2. Then what?
  3. Aren't yarn/mesos just sneaky forms of devops/noops?
  4. With clusters spinning up and falling on command how do we monitor this environment and guarantee quality of service?
  5. Couldn't AWS/open stack do this on a more general scale?
  6. Shouldn't we just all be using solaris zones?

Thinking deeper on #6. Really one of the things about solaris is they spent a lot of time making a virtualized environment. They spent time making resource controls. Controlling RAM, sockets, open files per process. Currently AFAIK there is no support in the mainline linux kernel for sharing/limiting disk IO like solaris has. When I look at yarn the only resource constraint I see is units of memory.

How are these platforms supposed to be successful when Quality of Service is an afterthought? Lets say you use YARN to spin up hbase or Cassandra and want low latency. Then randomly a map task lands on the machine and crushes the node. Just putting a cap on memory is not going to help as the map task is crushing your IO subsystem and degrading your service. This is like bringing the noisy-neighbors problem home to your private cloud.


Hey Edward, happy to talk more in person, but some answers (which hopefully make sense! *smile*): # The way you run multiple MR versions with YARN is not via having your admin deploy it. The idea is to run the same YARN 'system' while you run multiple MR versions as 'applications' i.e. run hadoop-2.0-MR-AppMaster hadoop-2.1-MR-AppMaster etc. *concurrently*. There-by this is all user-land, as opposed to puppet/chef deploying hadoop-2.0 and hadoop-2.1 via admin intervention. # Similarly with a MPI or Hama or Graph or Spark ApplicationMaster you can 'share' the same cluster with MapReduce. All are in various stages of development as we speak. Hama, I believe, is nearly done. There-by you can have puppet/chef deploy the *system* i.e. YARN and then everything else is now *user-land*, hence the admin is out of the loop - a nice thing! # YARN is days away from support CPU as the 2nd resource (YARN-2) to prevent the 'noisy neighbour' problem you described. In future we'll add more 'resource types'. We are doing the equivalent of 'zones' via Linux cgroups which offers similar support for constraining resources for Unix processes such as memory, cpu, disk/network I/O etc.

Posted by Arun C Murthy on August 10, 2012 at 01:38 PM EDT #

Well, we probably should all be using zones and pay for our sinful Linux ways. Despite the terrifying lack of documentation, I thought cgroups did something useful IO now. Or by "mainline" did you mean "need newer than RHEL6's ancient 2.6.32" kernel?

Posted by cburroughs on August 10, 2012 at 02:05 PM EDT #

Arun Thanks for you comments. I do not see it as a good thing that admins are our of the loop. Puppet is really not an admin only tool. Puppet is controlled via text files and it is very easy to make the text files owned by different Unix groups. I just see that you have listed 4 applications in various stages of getting yarnified . But this is where the mismatch is. Puppet has modules to deploy hundreds of things like dns or Kafka, Apache, etc, hadoop,hbase, cassandra...and on and on.... Many things need customization of an OS os to be deployed. Hadoop ain't so good without gzip, snappy, changing open file limit. How will yarn handle this in a platform independent way. That is why is admins exist, we do all the uncool stuff devs take for granted. I just feel Luke with already established tools like openstack or puppet the case to build a custom userspace framework to deploy limited packages is not very compelling.

Posted by edward capriolo on August 10, 2012 at 07:03 PM EDT #

Hello , I've been visiting your site and found it very interesting. I'm currently working on a website in a similar field, and I am interested In placing a text link on your site, in exchange for a monthly payment. Does it sound interesting to you? If so, I would be glad to get an email from you. Thank you, Moran

Posted by Moran Faigenbaum on August 28, 2012 at 08:49 AM EDT #

All our Jordan Retro are of authentic quality which features lightweight mesh offering breathability, rubber outsole for cushioning, multidirectional traction plus a natural feel, or longer-to-day patern for fashionable modern outfit.

Posted by Jordan Shoes on January 27, 2013 at 09:35 PM EST #

Kobe Bryant 6 enjoyed an excellent senior high basketball career at Lower Merion Senior high school, where he was accepted as the very best high school cager in the united states. Kobe Bryant 7 declared his eligibility for your NBA Draft upon graduation, and was selected with the 13th overall pick inside 1996 NBA Draft by the Charlotte Hornets, then traded for the Kobe Bryant 7 Shoes Elite Edition .

Posted by Cheap Jordan Shoes on January 30, 2013 at 03:03 AM EST #

This Air Max 2013 reaches back to the Air Max 90 with its use of Laser Blue and Zen Grey but leaves no doubt you’re looking at this year’s flagship’s lines thanks to the high contrast black/white base.

Posted by Nike Air Max on March 31, 2013 at 01:51 AM EDT #

Post a Comment:
Comments are closed for this entry.