Edward Capriolo

Monday Nov 11, 2013

YARN... Either it is really complicated or I have brain damage

I noticed yarn is getting all this incredible press lately. I see articles with subjects I can not parse like yarn is hadoops data center os. A while back I took a slam at YARN, but periodically I like to re-investigate my rants and determine how right I was.

Because I am bored I decided to build a YARN app. I figured I could do something like maybe spin up 5 instances of tomcat in yarn or something.  Like a hello world application, it might take me 2 hours or so.

5 hours later I feel I must have fallen and hit my head, because I am unable to figure out what the fuck yarn is and why I would ever want to fucking use it. I know I'm sorry for cursing...but soon you will know my pain.

Starting applications....You might have done this before, maybe you click an icon, maybe you run a command,  maybe if you have a half hour you write a nice init script with start/stop/status....

Or maybe, just maybe, you create a massive fricken JAVA project with 20,000 fucking lines of code to start hbase..

[edward@jackintosh java]$ git clone https://github.com/hortonworks/hoya.git
Cloning into 'hoya'...
remote: Counting objects: 11963, done.
remote: Compressing objects: 100% (4658/4658), done.
remote: Total 11963 (delta 4357), reused 11933 (delta 4329)
Receiving objects: 100% (11963/11963), 3.12 MiB | 566 KiB/s, done.
Resolving deltas: 100% (4357/4357), done.

[edward@jackintosh hoya]$ find . -name "*.java" |xargs cat | wc -l
21461

That is right 20,000+ fricken lines of code to ....start some other code!!! Are alarms going off? Does this seem maintainable to anyone?

After weeding around some of the 20k lines of code, I think I found the secret sauce:

https://github.com/hortonworks/hoya/blob/master/hoya-core/src/main/java/org/apache/hadoop/hoya/exec/RunLongLivedApp.java

@Override // Runnable
  public void run() {
    LOG.debug("Application callback thread running");
    //notify the callback that the process has started
    if (applicationEventHandler != null) {
      applicationEventHandler.onApplicationStarted(this);
    }
    try {
      exitCode = process.waitFor();
    } catch (InterruptedException e) {
      LOG.debug("Process wait interrupted -exiting thread");
    } finally {

Yarn...forks a process... amazing...absolutely amazing...pause not.

Ok. Ed maybe this example is not representative of yarn...Maybe you should keep looking on github.  I find https://github.com/cloudera/kitten

"Kitten is a set of tools for writing and running applications on YARN, the general-purpose resource scheduling framework that ships with Hadoop 2.0.0. Kitten handles the boilerplate around configuring and launching YARN containers, allowing developers to easily deploy distributed applications that run under YARN."

Win! I love easy deployment of things that should be easy. aka things I can do in puppet with about 10 lines of code.

So lets read on...

A configuration language, based on Lua 5.1, that is used to specify the resources the application needs from the cluster in order to run.

Wait? What Lua? You said you were going to make this simple for me....How is explaining something I never heard of simple...

So read on...

distshell = yarn {
  name = "Distributed Shell",
  timeout = 10000,
  memory = 512,

  master = {
    env = base_env, -- Defined elsewhere in the file
    command = {
      base = "java -Xmx128m com.cloudera.kitten.appmaster.ApplicationMaster",
      args = { "-conf job.xml" }, -- job.xml contains the client configuration info.
    }
  },

  container = {
    instances = 3,
    env = base_env,  -- Defined elsewhere in the file
    command = "echo 'Hello World!' >> /tmp/hello_world"
  }
}

Wow that is astonishing... Almost nearly as simple as:

for i in  server1,server2, server3 ; do
  ssh $i 'echo hello world'
done

Ok, so at least it was a nice try to make something easier. But having to learn lua, to learn kitten, to learn yarn, was not my battle plan. Lets keep looking:

https://github.com/continuuity/weave

[edward@jackintosh weave]$ find . -name "*.java" |xargs cat | wc -l
26488

Ok now....To be fair this project actually looks like it has a pulse, and actually it sounds fairly reasonable...

 "This EchoServer model above is familiar, but what if you want to run your EchoServer on a YARN cluster?

All you need to do is implement the WeaveRunnable interface, similar to how you would normally implement Runnable. In this model, the EchoServer implements WeaveRunnable, which in turn implements Runnable. This allows you to run a WeaveRunnable implementation within a Thread and also in a container on a YARN cluster:"

While there is much more talk about other abstractions as the page goes on, which scare me somewhat, I actually have high hopes for this. I mean I wont get my 5 instances of tomcat started tonight, but maybe in a couple days or so.

Eventually I did find a "simple" example...

https://github.com/hortonworks/simple-yarn-app

[edward@jackintosh simple-yarn-app]$ find . -name "*.java" |xargs cat | wc -l
232

It's description alone inspires incredible confidence

"Simple YARN application to run n copies of a unix command - deliberately kept simple (with minimal error handling etc.)"

In the end, I now understand the statement I thought was a typo, "yarn is hadoop's datacenter OS". I think it means "write an operating system's worth of code to wrap around hadoop"

Stuff like this is why my unix friends make fun of java, sometimes I can't blame them. When I read things like, "Frameworks to simplify the configurations of containers" I wonder why I just don't switch to c code.

Comments:

Post a Comment:
Comments are closed for this entry.

Calendar

Feeds

Search

Links

Navigation