Edward Capriolo

Sunday Aug 31, 2014

Stream Processing, Counting the things that are counting things

One of the qualities of good software is visibility. For me to provide users statistics on how the site is performing, data moves through a series of systems. Browser events fire from Javascript which are received by web servers, to be placed on a message queue. A process will read data from a message queue and write this data to a hadoop file system. Another process needs to consume this data and apply some streaming transformation before writing to a NoSQL database.

Imagine a system without enough visibility, a user might send you an email that stating "It looks like the count of hits to this page for today are off. There should be more hits." If you have no visibility into the system a request like this can turn out to be a nightmare. The data could be getting lost anywhere, in your custom software, in open source software, even in commercial software! The other possibility the person reporting the issue is just wrong. That happens to! Without visibility it is hard to say what is wrong, maybe the NoSQL database is dropping messages, or maybe it is your code? You just have to take a shot in the dark and start somewhere, maybe you pick the right component and find a bug, maybe you spend two weeks and find nothing wrong.

Now, imagine a system with enough visibility.  You would look at some graphs your software is maintaining and determine that "The number of messages sent to our NoSQL system is close (hopefully exact :) to the number of raw messages we received into our message queue". You could even go a step further and attempt to create pre-emptive alerts based on what is normal message flow for this time of day and day of week, so if there is an issue you can hopefully notice it and fix it before a user becomes aware of a problem.

Some rules:

  1. Count things in and out of each system. Even if the correlation is not 1 to 1 some relationship should exist that will become apparent over time
  2. Record things that are dropped or cause exception, actively monitor so this number stays close to 0
  3. Go for low hanging fruit, do not try to build an overarching system round one. If a sprint builds or adds a feature find a way to monitor this new feature.
  4. Time things that could be orders of magnitudes long. Use histograms to time DB requests that involve reading disk, things that can have a high variance if load increases.

Getting it done

With our stream processing platform, teknek, I had been doing counters and timers on a case by case basis in user code. I decided to extend this into the framework itself so that users would get some a set of metrics for free. Users have the ability to add their own metrics easily. (We will show the code to add your own counters later in this article)

The de-facto standard metrics package for Java is the coda-hale library. Originally called "yammer-metrics" it provides counters, meters, histograms and other types. It has a clever way to do sampling similar to streaming quantiles, so that it can efficiently keep a 95th percentile measurement without having to save a large number of samples in memory. Really really cool.

For each "plan" in teknek we have a series of counters that record events inbounds, operator retries, time to process the event, and more. In the image below "shutup" :) is the name of the plan. Metrics are captured both globally and on a per thread basis.

 Sample of metrics provided

Every teknek "operator" in the plan has it's own set of metrics. For example, if the plan has three steps such as "read from kafka", "lowercase", "write to cassandra", "write to hbase" metrics are kept on these for free with no extra effort for the user.

The same metrics library is available to each the operators you are implementing so custom metrics can be added with ease.

package io.teknek.metric;
import io.teknek.model.ITuple;
import io.teknek.model.Operator;

public class OperatorWithMetrics extends Operator {
  @Override
  public void handleTuple(ITuple tuple) {
    getMetricRegistry().meter("a.b").mark();
    getMetricRegistry().meter(getPath() + ".processed").mark();
    getMetricRegistry().meter(getPath() + "." + this.getPartitionId() + ".processed").mark();
  }
}

The theme of this post is visibility, and having counters in JMX is one form of visibility.

"But come on playa! You gotta up your big data game!"

No problem! It turns out that there is already great support in coda-hale metrics to send those metrics directly to graphite. Thus all the counters that you have in teknek are available in graphite with no extra effort. Graphite offers a number of ways to search group and make custom dashboards with this information.

Quick note: The coda-hale graphite reporter tends to send too many counters to graphite. For example it sends 50th,95th,99th,999th etc to graphite which generally more information then you need. Take a look at my graphite package which does a lot to trim down the metrics sent, adds host name, cluster name, and overall streamlines the process and configuration.

Conclusion

Build monitoring up front, make it a party of your definition of done. Good monitoring makes it easier to trouble shoot. It also makes it easier to be confident in beta testing or after releasing a new version of your software. With a new release old metrics should stay near there pre-release values and you can use the new metrics to reason that new features are working correctly in production.

The new features to teknek discussed in this post were incorporated in this pull request, and should appear in the 0.0.7 release.


Thursday Aug 21, 2014

CQL, did you just tell me to fck myself?

Last night decided to give CQL another chance. After about 20 minutes of hacking at a 1 row table I pretty much hit every caveat and error message possible in my quest to get some result that was not SELECT *. The query language is a minefield of things you CAN'T do!

cqlsh:test>  select * from stats where year='54'  and tags='bla' order by ycount ALLOW filtering;
Bad Request: ORDER BY with 2ndary indexes is not supported.

cqlsh:test> select * from stats where year='54' and ycount >  10  order by ycount ALLOW filtering;
Bad Request: No indexed columns present in by-columns clause with Equal operator

cqlsh:test> select * from stats where year='54' and ycount >  10 and tags='bla';
Bad Request: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING

cqlsh:test> select * from stats where year='54' and ycount >  0  allow filtering;
Bad Request: No indexed columns present in by-columns clause with Equal operator
 

http://b.z19r.com/post/did-you-just-tell-me-to-go-fuck-myself

Sunday Aug 10, 2014

Why I am reverting from Java to C++

For a long time java has been good to me. I know these days everyone is looking to move on to maybe scala or closure or whatever, but I am actually looking the other way. Java has quite a large number of problems that are bubbling below the surface. Where to start...

Ok go back to 1995 when java was like an upstart. The going logic was that c++ pointers were "TOO COMPLICATED FOR PROGRAMMERS"... Fast forward to 2014 and even cell phone languages have pointers. So how is it that a lowly cell phone programmer can understand pointers but a server side coder thinks they are "too complicated"?

Ok but lets talk of the big ugly problem...GC and memory. I have been supporting a number of Java projects for a long time and the major performance bottleneck is always Garbage Collection. Now listen, regardless of what anyone tells you, there is NO way to tune away all the garbage collection issues. At some point if your application moves enough data the JVM will pause.

JVMs don't get much bigger then 10 GB of memory before its best performing GC algorithm CMS fall apart. I have seen it in batch processing systems like Hadoop and Hive, I have seen it in Hbase, I have seen it in cassandra. Want to run a database with a really fast fusion IO SSD under high load? CPU bottlenecks with GC before the disk runs out of IO. G1 the garbage collector that was supposed be an answer for these large heaps, it seems to be a large failure.

Many projects that require decent performance are doing some combination of java and off-heap memory. This I do not get. At the point where you start doing things off-heap you are basically start giving up everything Java provides for you. (thread safety, debugging) Besides the fact that it makes debugging harder, it still has more overhead than native code.

In many cases causes random corruptions due to library developers not actually writing these systems correctly. Followed by embarrassing statements like "Sorry our really smart really efficient off-heap thing x was fucking up for three versions and we just figured it out."

Let's talk more about memory. Java is just plain pig-ish with memory. Java objects have a number of bytes of overhead and an object is just way bigger in java than c++. "Enterprise" libraries are so big and bloated, I find myself having to tweak my eclipse JVM settings just so that I CAN DEVELOP java apps.

You can not even allocate a damn object on the stack, Java forces you to put it in heap. WTF?

Every enterprise java application seems to need about 512 MB of ram just to start up and about 2GB of overhead to run under light load...Hello cloud...No micro instances for me! Every hello world needs 4GB heap.

Back in 2007 a server machine had maybe 4GB memory... So no big deal that Java VM gets pause-ish with 13 GB heap....But now in 2014 I can get a server from amazon with 222GB ram. Machines with 1TB are around the corner, when i have a big-data application and I going to have to run 100-1000 shard-ed copies of a program on a single machine so I can simply just address the memory?

So I have started going backwards. Writing programs in c++. Falling in love with template functions and finding them more powerful then java's generics. Using lambdas in c++11 and saying, "what is the big deal with scala?". Using smart pointers in boost when I need to, freeing memory by hand when I do not.

Feels good , feels great. Feels great to run a program that only uses 4K of memory that starts up in .0000001 seconds. "Did that run? Yes it did run and its already finished!"

Saturday Jul 19, 2014

Travis CI is awesome!

https://travis-ci.org/edwardcapriolo/teknek-core

Travis CI is awesome... That is all.

Thursday Jul 03, 2014

MapReduce on Cassandra's sstable2json backups

I was talking to a buddy about having nothing to do today. He said to me, "You know what would be awesome? We have all these sstable2json files in s3 and it would be cool if we could map reduce them."

For those not familiar sstable2json makes files like this:

[
{"key": "62736d697468","columns": [["6c6173746e616d65","736d697468",1404396845806000]]},
{"key": "6563617072696f6c6f","columns": [["66697273746e616d65","656477617264",1404396708566000], ["6c6173746e616d65","63617072696f6c6f",1404396801537000]]}
]

Now. There already exists a json hive serde. https://github.com/rcongiu/Hive-JSON-Serde, however there is a small problem.

That serde expects data to look like this:

{}
{}

Not like this:

[
{},
{}
]

What is a player to do? Make a custom input format that is what:

The magic is in a little custom record reader that skips everything except what the json serde wants and trims trailing commas.

  @Override
  public synchronized boolean next(LongWritable arg0, Text line) throws IOException {
    boolean res = super.next(arg0, line);
    if (line.charAt(0) == '['){
      res = super.next(arg0, line);
    }
    if (line.charAt(0) == ']'){
      res = super.next(arg0, line);
    }
    if (line.getLength() > 0 && line.getBytes()[line.getLength()-1]==','){  
      line.set( line.getBytes(),0, line.getLength()-1);
    }
    if (res == false){
      return false;
    } 
    return res;
  }

Next, we create a table using the JSON serde and the input format from above.

hive> show create table json_test1;                                                         
OK
CREATE  TABLE json_test1(
  key string COMMENT 'from deserializer',
  columns array<array<string>> COMMENT 'from deserializer')

ROW FORMAT SERDE
  'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT
  'io.teknek.arizona.ssjsoninputformat.SSTable2JsonInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'file:/user/hive/warehouse/json_test1'
TBLPROPERTIES (
  'numPartitions'='0',
  'numFiles'='1',
  'transient_lastDdlTime'='1404408280',
  'numRows'='0',
  'totalSize'='249',
  'rawDataSize'='0')

When we use these together we get:

hive> SELECT key , col FROM json_test1 LATERAL VIEW explode (columns) colTable as col;
62736d697468    ["6c6173746e616d65","736d697468","1404396845806000"]
6563617072696f6c6f    ["66697273746e616d65","656477617264","1404396708566000"]
6563617072696f6c6f    ["6c6173746e616d65","63617072696f6c6f","1404396801537000"]
Time taken: 4.704 seconds, Fetched: 3 row(s)

Winning! Now there are some things to point out here:

  1. sstable2json with replication N is going get N duplicates that you will have to filter yourself. (maybe it would be nice to build a feature in sstable2json that only dumps the primary range of each node?)
  2. Your probably going to need a group and a window function to remove all but the last entry (dealing with overwrites and tombstones)

But whatever, I just started playing with this this morning. I do not have time to sort out all the details. (maybe you don't have updates and this is not a big deal for you).

Tuesday Jul 01, 2014

Next hadoop enterprise pissing match beginning

http://hortonworks.com/blog/cloudera-just-shoot-impala/

"I hold out hope that their interests in enabling Hive on Spark are genuine and not part of some broader aspirational marketing campaign laced with bombastic FUD."

I really think horton is being FUDLY here. Cloudera has had 1-2 people involved with the hive project for a while now. Maybe like 6+ years. Carl is the hive lead, previously he worked for Cloudera. Cloudera has 2 people now adding features one is Brock Noland who is doing an awesome job.

Hortonworks is relatively new to the hive project. 2-3 years tops? (not counting people who did work for hive before joining horton)

So, even though cloudera did build impala (and made some noise about it being better then hive), they have kept steady support on the hive project for a very long time.

Spark is just very buzzy now. Everyone wants to have it, or be involved with it, like "cloud", but spark is actually 3-4 years old right?

But it is really great to see spark. Everyone wants to have it, and the enterprise pissing matches are starting! Sit back and watch the fun! Low blows coming soon!

Previous pissing matches: 

  1. Who has the best hadoop distro?
  2. Who "leads" the community?
  3. Parquet vs ORC?
  4. Who got the "credit" for hadoop security and who did "all the work"



Monday Jun 23, 2014

Server Side Cassandra

http://www.xaprb.com/blog/2014/06/08/time-series-database-requirements/

"Ideally, the language and database should support server-side processing of at least the following, and probably much more"

A co-worker found this. I love it. Sounds JUST like what I am trying to implement in:

https://issues.apache.org/jira/browse/CASSANDRA-6704

and what we did implement in https://github.com/zznate/intravert-ug .

How does that saying go? First they ignore you...

 


Tuesday Jun 17, 2014

Cloudera desktop manager forces you to disable SELINUX

This is a very curious thing. When trying to install cdh I found that it forces me to disable SELINUX completely. I can understand why an installed would have problems, but why wont it allow me to do the install in 'permissive' mode? Then I would be able to see the warnings.

This is kinda ##SHADEY##. Normally I do not give a shit about selinux, but being forced to have it completely disabled?

Monday Jun 09, 2014

Wait...Say what.

http://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra

Cassandra’s storage engine is optimized to avoid storing unnecessary empty columns, but when using prepared statements those parameters that are not provided result in null values being passed to Cassandra (and thus tombstones being stored). Currently the only workaround for this scenario is to have a predefined set of prepared statement for the most common insert combinations and using normal statements for the more rare cases.

So what your saying is ... if I don't specify a column when I insert, I delete it?

Saturday May 17, 2014

The important lesson of functional programming

I wanted to point something out: Many times I hear people going on and on about functional programming, how java can't be good without function passing (functors), how lambda features are massively important, or ivory tower talk about how terrible the 'kingdom of nouns" is.

Let us look at Wikipedia's definition of functional programming.
----

In computer science, functional programming is a programming paradigm, a style of building the structure and elements of computer programs, that treats computation as the evaluation of mathematical functions and avoids state and mutable data. 

---

Though hipsters and 'kingdom of verb' fan boys will go on and on about lamdas, anonymous inner functions, and programs that have so many callbacks you need an api to un roll the callbacks into something readable,  the important part of functional programming (to me) is avoiding state and mutable data, and you can leverage that concept from any language that has a method (function)!

Removing state has big benefits. One is repeatability this brings testability. I enjoy writing code that is easily testable without mocking or a writing large test harness.

Here is an example. I am currently working on a teknek feature to coordinate how many instances of a process run on a cluster of nodes. At first you may think this problem is not a functional problem, because depends on the state of local threads, as well as a cluster state that is stored in zookeeper. Let's look at an implementation:

---

  private boolean alreadyAtMaxWorkersPerNode(Plan plan){
List<String> workerUuids = null;
try {
workerUuids = WorkerDao.findWorkersWorkingOnPlan(zk, plan);
} catch (WorkerDaoExecption ex) {
return true;
}
    if (plan.getMaxWorkersPerNode() == 0){
      return false;
    }
    int numberOfWorkersRunningInDaemon = 0;
    List<Worker> workingOnPlan = workerThreads.get(plan);
    if (workingOnPlan == null){
      return false;
    }
    for (Worker worker: workingOnPlan){
      if (worker.getMyId().toString().equals(workerUuids)){
        numberOfWorkersRunningInDaemon++;
      }
    }
    if (numberOfWorkersRunningInDaemon >= plan.getMaxWorkersPerNode()){
      return true;
    } else {
      return false;
    }
  }

---

Worker threads is a member variable, another method uses a data access object, and the method is called from 'deep' inside a stateful application.

There is a simple way to develop this feature and still have great test coverage. Eliminate state! Functional Programming! Write methods that are functional, methods that return the same output always based on inputs.

Let's pull everything not functional out of the method and see what marvellous things this does for us!

---

  @VisibleForTesting
  boolean alreadyAtMaxWorkersPerNode(Plan plan, List<String> workerUuids, List<Worker> workingOnPlan){
    if (plan.getMaxWorkersPerNode() == 0){
      return false;
    }
    int numberOfWorkersRunningInDaemon = 0;
    if (workingOnPlan == null){
      return false;
    }
    for (Worker worker: workingOnPlan){
      if (worker.getMyId().toString().equals(workerUuids)){
        numberOfWorkersRunningInDaemon++;
      }
    }
    if (numberOfWorkersRunningInDaemon >= plan.getMaxWorkersPerNode()){
      return true;
    } else {
      return false;
    }
  }

---

Look Ma! No state! All the state is in the caller!

---

  private void considerStarting(String child){
    Plan plan = null;
    List<String> workerUuidsWorkingOnPlan = null;
    try {
      plan = WorkerDao.findPlanByName(zk, child);
      workerUuidsWorkingOnPlan = WorkerDao.findWorkersWorkingOnPlan(zk, plan);
    } catch (WorkerDaoException e) {
      logger.error(e);
      return;
    }
    if (alreadyAtMaxWorkersPerNode(plan, workerUuidsWorkingOnPlan, workerThreads.get(plan))){
      return;
    }

---

Why is removing state awesome? For one it makes Test Driven Development easy. Hitting this condition with an integration test is possible but it involves a lot of effort and hard to coordinate timing. Since we removed the state look how straight forward the test is.

---

  @Test
  public void maxWorkerTest(){
    Plan aPlan = new Plan().withMaxWorkersPerNode(0).withMaxWorkers(2);
    Worker workingOn1 = new Worker(aPlan, null, null);
    Worker workingOn2 = new Worker(aPlan, null, null);
    List<String> workerIds = Arrays.asList(workingOn1.getMyId().toString(), workingOn2.getMyId().toString());
    List<Worker> localWorkers = Arrays.asList(workingOn1,workingOn2);
    Assert.assertFalse(td.alreadyAtMaxWorkersPerNode(aPlan, workerIds, localWorkers));
    aPlan.setMaxWorkersPerNode(2);
    Assert.assertTrue(td.alreadyAtMaxWorkersPerNode(aPlan, workerIds, localWorkers));
  }

---

Holy crap! The bolded line failed the assert! Remember "testing reveals the presence of bugs not the absence". Bugs should be easy to find an fix now that the logic is not buried deep. In fact, I easily stepped this code and found out the problem.

---

 if (worker.getMyId().toString().equals(workerUuids)){

---

In Java is is not a syntax error to call String.equals(List). It always returns false! DOH. Without good testing we may not have even found this bug. Win. Lets fix that.

---

 for (Worker worker: workingOnPlan){
for (String uuid : workerUuids ) {
        if (worker.getMyId().toString().equals(uuid)){
          numberOfWorkersRunningInDaemon++;
        }
    }
 }

---

Now, lets use our friend cobertura to see what our coverage looks like. (If your not familiar with coverage tests get acquainted fast.  Cobertura is awesome sauce! It runs your tests and counts how many times each branch and line of code was hit! This way you see where you need to do more testing!)

[edward@jackintosh teknek-core]$ mvn cobertura:cobertura


Pretty good! We can see many of our cases are covered and we can write a few more tests to reach 100%. That is just academic at this point. Anyway tests are great. I think of tests like tripwire against future changes, and assurance that the project does what it advertises.

Anyway the big take away is functional programming is possible from a "non functional" language. Functional programming makes it easy to build and tests applications. And as always, anyone that does not write tests should be taken out back and beaten with a hose.

For those interested you can see the entire commit here.

Saturday Apr 26, 2014

Implementing AlmostSQL, dream of the 2010 era

2010 will be the generation defined by a bunch of tech geeks trying as hard as possible to re-invent SQL and falling short over and over again. NoSQL was novel because it took the approach of not attempting to implement all the things hard to implement in a distributed way. Even without full SQL (or any SQL) these system are still very useful for a variety of tasks....

But just being useful for some tasks is not good enough. The twitter tech universe requires you to be good at EVERYTHING to get VC. So even if you are not good at it or can't do it, just pretend like you can. Mislead or just lie!

The latest example of this phenomenon is http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html .

You read and you think. "Ow shit! This is awesome. SQL!" Finally the dream is realized fast distributed SQL. SALVATION!, retweet that!

But before you tweet your grandma over what nosql she should now use to count the eggs in her fridge, read the docs reading about this "SQL" support.
http://spark.apache.org/docs/0.9.1/scala-programming-guide.html

Note that Spark SQL currently uses a very basic SQL parser. Users that want a more complete dialect of SQL should look at the HiveQL support provided by HiveContext.

OMG. WHAT THE FUCK IS THIS POINTLESS OBSESSION WITH CALLING HALF ASSED SQL LANGUAGES SQL?

This has come up on the Hive mailing list and before, the conversation "Why don't we rename it from HiveQL to Hive SQL?"

I will tell you why...

Cause SQL IS A STANDARD! . I'm tired of every piece of tech that wants a press release to go viral that just throws the word SQL in there when they DONT ACTUALLY DO SQL. You can't put lipstick on a pig (or the pig language). Supporting 20% of the SQL features does NOT make you an SQL system! Neither does building a scala DSL!

To add insult to injury, why not go on about how much better your "SQL" implementation is implemented:

http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html


"The Catalyst framework allows the developers behind Spark SQL to rapidly add new optimizations, enabling us to build a faster system more quickly. In one recent example, we found an inefficiency in Hive group-bys that took an experienced developer an entire weekend and over 250 lines of code to fix; we were then able to make the same fix in Catalyst in only a few lines of code."

By the way, not saying spark sql is not interesting, cool, novel, but don't go out bragging about how "Hard it is for hive to fix group bys" and how "easy it is with catalyst" when your basically not even implementing a full query language, or even half of one.

It is not surprising that a less complete query language has less lines of code then a more complete one!

Not CQL, SQL

As you may know I have a quixotic quest is to have non-fork support for server side operations in cassandra.

See https://issues.apache.org/jira/browse/CASSANDRA-5184 for details on how this is going no where.

My latest take is pretty cool (to me anyway). Cassandra is made for slicing http://wiki.apache.org/cassandra/API10. Call it 'Legacy API' if you want but the ugly truth is all the CQL stuff is still build on top of slice. To be clear EVEN when you select 'some columns' it still calls a massive getSlice() that returns all the columns.

Anyway here is my latest brilliant idea. What if we make Cassandra and H2 have a baby? We let Cassandra do what it is good at, Slicing. The with the results of the slice we allow a 'sub select' of that slice using FULL SQL? In other words:
If your first slice/query fits into memory, then load it into h2, then do any sql query on that data only!

Something like this:

     Connection conn = o.getMutationTool().runSliceAndLoad("stats", Arrays.asList("dateAsRowKey", "1970-12-31"), 
              Arrays.asList("vertical", "page", "toys"),
              new int [] { CompositeTool.NORMAL, CompositeTool.NORMAL, CompositeTool.NORMAL},
              Arrays.asList("vertical", "page", "toys"),
              new int [] { CompositeTool.NORMAL, CompositeTool.NORMAL, CompositeTool.INCLUSIVE_END}
             );
      ResultSet rs = conn.createStatement().executeQuery("SELECT sum(value) from data");
      if (rs.next()){
        System.out.println(rs.getLong(1));
      }
      rs.close();
      conn.close();

 

Did I just blow your mind? Wouldn't that be amazeballs? Well don't worry, Arizona will have it, but for now I give you the secret sauce!

Step 1: h2 in memory, do astaynax slice

   conn = DriverManager.getConnection("jdbc:h2:mem:;LOG=0;LOCK_MODE=0;UNDO_LOG=0;CACHE_SIZE=4096");
result = keyspace
              .prepareQuery(stats)
              .getKey(CompositeTool.makeComposite(CompositeTool.stringArrayToByteArray(rowKey)))
              .withColumnRange(
                      CompositeTool.makeComposite(
                              CompositeTool.stringArrayToByteArray(startColumn), startRange),
                      CompositeTool.makeComposite(CompositeTool.stringArrayToByteArray(endColumn),
                              finishRange), false, 10000).execute().getResult();

Step 2: Create in memory h2 table for data

   sb.append("CREATE TEMPORARY TABLE data (");
   for (int j = 0; j < unwrap.size() / 2; j++) {
   sb.append(new String(unwrap.get(j))).append(" VARCHAR(255) ,");
   }
   sb.append("value bigint ");
   sb.append(")");

Step 3: load data from cassandra into h2

     ps = conn.prepareStatement("insert into data VALUES (?,?,?)");
     for (int j = unwrap.size() / 2, k = 0; j < unwrap.size(); j++, k++) {
        ps.setString(k + 1, new String(unwrap.get(j)));
     }
     ps.setLong(3, l);
     ps.execute();


Step 4: return Connection to user so they can do WHATEVER QUERY THEY WANT!

      ResultSet rs = conn.createStatement().executeQuery("SELECT sum(value) from data");
      if (rs.next()){
        System.out.println(rs.getLong(1));
      }
      rs.close();
      conn.close();

Step 5. Winning

Use slicing to create "primary dimension" then query the heck out of it. Any way your little heart desires.

Sunday Apr 13, 2014

21 hour work weeks

http://www.policymic.com/articles/87465/why-we-all-shared-the-story-about-france-s-alleged-ban-on-after-work-e-mails

Sorry, I can not agree, the world is getting soft. I do not believe you should work yourself to death or bad health or anything like that. But if you like what you do, it is not really work,....I do not really count how much "work" I do a week as a way of showing off or anything, but like hey when I'm not "working" I might still be writing code, or reading about something that will help me work better.

I don't remember anyone in my family even liking the idea of not working hard...Maybe its an italian worker thing :)


 

Wednesday Apr 09, 2014

Oracle refuses to let me download old JVM

I have a java project that will not build with open JDK. My machine has jdk 1.7 and the target platform is java 1.6. I say to myself, "Hey no problem Ill just download an older jdk"

So I go to oracle.com which forced me to sign up for an oracle account. After I sign up I click the link to download jdk...

 

God fucking kill yourself...All you "open source" f*ckers...Kill yourselves.

Tuesday Apr 08, 2014

Todays moment of CQL zen

[edward@jackintosh Downloads]$ netstat -nl | grep 9160
tcp        0      0 127.0.0.1:9160          0.0.0.0:*               LISTEN     
[edward@jackintosh Downloads]$ /home/edward/.farsandra/apache-cassandra-2.0.4/bin/cassandra-cli
Connected to: "Test Cluster" on 127.0.0.1/9160
Welcome to Cassandra CLI version 2.0.4

The CLI is deprecated and will be removed in Cassandra 3.0.  Consider migrating to cqlsh.
CQL is fully backwards compatible with Thrift data; see http://www.datastax.com/dev/blog/thrift-to-cql3

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown] exit;
[edward@jackintosh Downloads]$ /home/edward/.farsandra/apache-cassandra-2.0.4/bin/cqlsh
Connection error: Could not connect to localhost:9160

[edward@jackintosh Downloads]$ /home/edward/.farsandra/apache-cassandra-2.0.4/bin/cqlsh 127.0.0.1 9160
Connection error: Could not connect to 127.0.0.1:9160
[edward@jackintosh Downloads]$ /home/edward/.farsandra/apache-cassandra-2.0.4/bin/cqlsh localhost 9160
Connection error: Could not connect to localhost:9160


Calendar

Feeds

Search

Links

Navigation