Trying to find a fit for Yarn and Mesos
I have been following the development of Yarn and Mesos and done some tinkering over the past few months. If you have not ever heard of these projects get some information here:
I have a good conceptual understanding of what can be done with these projects, but I have some trouble fitting them into my current infrastructure. There is no one specific reason but more of a question of 'What can this thing do that puppet can not?'.
I look at Yarn and I see a tool with one use a tool to stand up 3 revs of hadoop on the same hardware mainly because the migration path off a release is ugly. This is something I can already do with configuration management.
I look at the mesos examples and I see a container that 'starts a jail, installs http and installs ha-proxy'. Again, something I can already do with configuration management.
Maybe I have just been standing up clusters for too long so everything looks the same to me, but in my own head I have trouble sorting these things out. The big questions are:
- Can a technology like yarn or mesos be used together with puppet or chef?
- What at the best practices when using these two things together?
- In YARNs case. How many current software packages can yarn manage outside hadoop?
- Then what?
- Aren't yarn/mesos just sneaky forms of devops/noops?
- With clusters spinning up and falling on command how do we monitor this environment and guarantee quality of service?
- Couldn't AWS/open stack do this on a more general scale?
- Shouldn't we just all be using solaris zones?
Thinking deeper on #6. Really one of the things about solaris is they spent a lot of time making a virtualized environment. They spent time making resource controls. Controlling RAM, sockets, open files per process. Currently AFAIK there is no support in the mainline linux kernel for sharing/limiting disk IO like solaris has. When I look at yarn the only resource constraint I see is units of memory.
How are these platforms supposed to be successful when Quality of Service is an afterthought? Lets say you use YARN to spin up hbase or Cassandra and want low latency. Then randomly a map task lands on the machine and crushes the node. Just putting a cap on memory is not going to help as the map task is crushing your IO subsystem and degrading your service. This is like bringing the noisy-neighbors problem home to your private cloud.