Edward Capriolo

Wednesday Dec 10, 2014

Key Cache - Building a column family store - Part 3

In my last entry we discussed how to build a data structure similar to that of Cassandra's Index File. If you remember the Index Interval setting builds a structure that indexes every N keys. In a best case scenario a look up may hit this index, but in a worse case scenario our linear search may have had to page through (Index Interval - 1) rows to locate data.

In one of my unit tests I captured the effect that this has:

      long x = System.currentTimeMillis();
      for (int i = 0 ; i < 50000 ; i++) {
        Assert.assertEquals("c", s.get("00001", "column2").getValue());
      System.out.println("index match " + (System.currentTimeMillis() - x));
      long x = System.currentTimeMillis();
      for (int i = 0 ; i < 50000 ; i++) {
        Assert.assertEquals("c", s.get("08999", "column2").getValue());
      System.out.println("far from index " +(System.currentTimeMillis() - x));
We can see the performance difference between a perfect index hit to a key far from the index:
Test folder: /tmp/junit4018208685678427527
index match 1088
far from index 11166

Note: Our SStable format and decision to use String (and not ByteBuffer) means there are several micro-optimizations that can be done.

If the requests have a normal distribution the difference in access speed may not be noticeable. Still, it seems unfair that some key requests will always be faster then others.

Key Cache to the rescue!

The Key Cache is like the Index Interval in that it helps avoid seeking for keys. It works by storing the offset of frequently used keys in memory. The less random the key access is the more benefit this type of caching delivers. Also like the Index format we can store a large cache in a small amount of storage.

Let's build our own Key Cache and add it to our increasingly more sophisticated Column Family store. There is nothing spectacular being done here. We are wrapping a guava cache that will handle eviction with a small facade.

package io.teknek.nibiru.engine;

import com.google.common.cache.Cache;
import com.google.common.cache.CacheBuilder;

public class KeyCache {
private Cache<String,Long> lc;
public KeyCache(int cacheSize){
lc = CacheBuilder.newBuilder().maximumSize(cacheSize).recordStats().build();

public void put(String key, long value){
lc.put(key, value);

public long get(String key) {
long res = -1;
Long l = lc.getIfPresent(key);
if (l != null) {
res = l.longValue();
return res;

Note: We would be better off using something like this library that does not have to box and un-box primitives, but again our goal is to show relative performance speed up from the Key Cache.

Now we only need to make a few small changes to our sstable reader.

-    bg.setStartOffset((int)index.findStartOffset(row));
+ long startOffset = keyCache.get(row);
+ if (startOffset == -1){
+ startOffset = index.findStartOffset(row);
+ }
+ bg.setStartOffset((int)startOffset);
String searchToken = row;//this is not correct
do {
if (bg.dst[bg.currentIndex] == END_ROW){
+ long startOfRow = bg.mbb.position() - bg.getBlockSize() + bg.currentIndex;
StringBuilder token = readToken(bg);
if (token.toString().equals(searchToken)){
StringBuilder rowkey = readRowkey(bg);
if (rowkey.toString().equals(row)){
+ keyCache.put(row, startOfRow);

We can see the difference if we re run our tests:

Test folder: /tmp/junit6910931010171014079
index match 840
far from index 171
index match 219
far from index 137

Notice here that I ran the tests multiple times. This is because the Java virtual machine is much like an old car. You do not drive it fast right away; you let it sit in your driveway and warm up. :) Just kidding, the reality is with Just In Time compilation and multiple garbage collection threads timing code exactly can be difficult.

Even though the numbers fluctuate, we see the Key Cache has given us much better performance by helping us get data without having to seek on disk (memory mapped disk) as much. We went from 11 seconds for 50,000 look ups to around 200ms!

Awesome! What is next? Bloom filters, or Row Cache? Maybe start building a system to compact multiple SSTables together? Stay tuned.


Post a Comment:
Comments are closed for this entry.