Edward Capriolo

Friday May 04, 2012

Hadoop is the best thing since sliced bread

Stonebreaker is at it again:

I seem to remember the last "big news" Stonebreaker had was another facebook criticism.


There is some irony here that facebook is about to go IPO for what 10 billion. And all those "five parallel DBMSs" are probably not recognizable names to the average tech person.

Yea! I said it, I really did not want to nasty but whatever, your picking on hive "that's my brother man".

First off, I do not even understand what this article is trying to get at. We know that hadoop is not MPI, we know that hadoop and hive is not a true parallel database, and we know that you can not fit every problem into hadoop or hive. Just like you can not fit every problem into a relational database even a parallel one.

Last, I checked no one goes around saying "Hadoop is the best thing since sliced bread" in fact the only one who goes around acting like his shit don't stink is Stonebreaker himself.

Let me tell you a story of how I got into hadoop and hive. I was following advice like Stonebreaker's that said Parallel DBs are the way to go. But I quickly found out Parallel Database are too rich for my blood.  Now, I am not telling you or anyone else that you should not spend money on Parallel DBs, because maybe you have the money, or maybe you need some of those things the parallel database provides. But for things I need to do:

  • store tons of data
  • processed it reasonably fast
  • be LOW on the cost scale

Hadoop and hive work fine for me.

Stonebreaker:, "Most Hadoop sites are somewhere between steps 2 and 3, and “hitting the wall” is still in their future"

This remind me of a quote between the infamous rap battle between LL Cool J and Cannabis. 

Cannabis: "99% of your fans wear high heals."
LL Cool J: "99% of your fans don't exist."

VoltDB, need I say more.

Most Hadoop sites NEVER WANTED A PARALLEL DATABASE IN THE FIRST PLACE. So there is no wall to hit. Were not as smart as you, but we are not fucking idiots. We did not use NoSQL because we want a parallel database and we did not use Hadoop and hive because we wanted one either.

Hive is only a declarative language that makes working with hadoop look like querying a database. It does a good job paralleling some relational problems on top of hadoop.

Also an irony of the article is the mention of the hype-cycle. This would be a fair criticism of Hadoop and Big Data that we are in hype-cycle. However when your "go-to move" for promoting yourself and your products is to fling fud at others you do not have a leg to stand on.

Also what is with having a picture every time you have an article. Who does that?