Edward Capriolo

Monday Aug 01, 2011

YCSB! Cassandra 0.7.6-2 and HBase

Off to the races! Note: We are inserting the dataset in parallel from 3 nodes at once.

head step1a.out
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.CassandraClient7 -P workloads/workloadb -load -threads 100 -p measurementtype=timeseries -p recordcount=75000000 -p hosts=10.70.82.102,10.70.82.103,10.70.82.104,10.70.82.105,10.70.82.106,10.70.82.107,10.70.82.108 -p operationcount=1000 -p readproportion=0.0 -p updateproportion=0.0 -p scanproportion=0 -p insertproportion=1.0 -p insertstart=0 -p insertcount=25000000 -s
 0 sec: 0 operations;
 10 sec: 190662 operations; 19016.76 current ops/sec; [INSERT AverageLatency(ms)=2.73]
 20 sec: 399033 operations; 20824.61 current ops/sec; [INSERT AverageLatency(ms)=0.83]
 30 sec: 608326 operations; 20920.93 current ops/sec; 
 40 sec: 794484 operations; 18610.22 current ops/sec; [INSERT AverageLatency(ms)=3]
 50 sec: 986640 operations; 19209.84 current ops/sec; [INSERT AverageLatency(ms)=3]
 60 sec: 1191175 operations; 20447.37 current ops/sec; [INSERT AverageLatency(ms)=2.11]
 70 sec: 1372818 operations; 18157.04 current ops/sec; 
 
...

 1280 sec: 24705376 operations; 16756.87 current ops/sec; 
 1290 sec: 24882313 operations; 17688.39 current ops/sec; [INSERT AverageLatency(ms)=1]
 1300 sec: 24993463 operations; 11110.56 current ops/sec; 
 1303 sec: 25000000 operations; 2971.36 current ops/sec; 
[OVERALL], RunTime(ms), 1303125.0
[OVERALL], Throughput(ops/sec), 19184.65227817746
[INSERT], Operations, 25000000
[INSERT], AverageLatency(ms), 4.66685756
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 14711
[INSERT], Return=0, 24993441
[INSERT], Return=0, 24993441

So this was really fast. In fact the systems were running at 2% IO. I am pretty sure I could have pushed it harder. I  was also happy that everything stayed constant for latency and ops/sec.

Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadb -load -threads 100 -p columnfamily=family -p me
asurementtype=timeseries -p recordcount=75000000 -p hosts=10.70.82.102,10.70.82.103,10.70.82.104,10.70.82.105,10.70.82
.106,10.70.82.107,10.70.82.108 -p operationcount=1000 -p readproportion=0.0 -p updateproportion=0.0 -p scanproportion=0 -p insertproportion=1.0 -p insertstart=0 -p insertcount=25000000 -s
 0 sec: 0 operations;
 11 sec: 349542 operations; 30688.5 current ops/sec; [INSERT AverageLatency(ms)=0]
 21 sec: 504384 operations; 15052.2 current ops/sec; 
 31 sec: 504384 operations; 0 current ops/sec; 
 41 sec: 553602 operations; 4920.32 current ops/sec; 
 51 sec: 553602 operations; 0 current ops/sec; 
 61 sec: 635717 operations; 8209.04 current ops/sec; [INSERT AverageLatency(ms)=0]
 71 sec: 688946 operations; 5321.3 current ops/sec; 
 81 sec: 688946 operations; 0 current ops/sec; 
 91 sec: 688946 operations; 0 current ops/sec; 
 101 sec: 775081 operations; 8610.92 current ops/sec; 
 111 sec: 775081 operations; 0 current ops/sec;  

...

 3300 sec: 24650357 operations; 14121.55 current ops/sec; 
 3310 sec: 24795935 operations; 14551.98 current ops/sec; 
 3320 sec: 24900493 operations; 10452.66 current ops/sec; [INSERT AverageLatency(ms)=0.25]
 3330 sec: 24975950 operations; 7543.44 current ops/sec; 
 3340 sec: 24996084 operations; 2012.8 current ops/sec; 
 3345 sec: 25000000 operations; 906.69 current ops/sec; 
[OVERALL], RunTime(ms), 3345274.0
[OVERALL], Throughput(ops/sec), 7473.229397651732

[INSERT], Operations, 25000000
[INSERT], AverageLatency(ms), 12.86780104
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 287816
[INSERT], Return=0, 24999378
[INSERT], 0, 1.437813354603464
[INSERT], 1000, 0.039011300892718925
[INSERT], 2000, 0.029626653945617514
[INSERT], 3000, 0.0769757098871023
 

HBase took a lot longer, I was also quite surprised to lots of IO for a write based work load. You would think the structured log format would prevent that. Toward the end there was less variance. I guess the slow intervals were region splitting.

Onto the read phase.

[edward@aeq202 YCSB-1]$ more step2.out
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.CassandraClient7 -P workloads/workloadb -t -threads 10 -p measurementtype=timeseri
es -p recordcount=75000000 -p hosts=10.70.82.102,10.70.82.103,10.70.82.104,10.70.82.105,10.70.82.106,10.70.82.107,10.7
0.82.108 -p operationcount=1000000 -p readproportion=1.0 -p updateproportion=0.0 -p scanproportion=0.0 -p insertpropor
tion=0.0 -s
 0 sec: 0 operations;
 10 sec: 2232 operations; 222.51 current ops/sec; 
 20 sec: 5270 operations; 303.62 current ops/sec; 
 30 sec: 8581 operations; 330.97 current ops/sec; 
 40 sec: 12018 operations; 343.6 current ops/sec; 
 50 sec: 15616 operations; 359.69 current ops/sec;

...

1360 sec: 916923 operations; 741.48 current ops/sec; [READ AverageLatency(ms)=39]
 1370 sec: 923903 operations; 697.72 current ops/sec; 
 1380 sec: 930687 operations; 678.2 current ops/sec; 
 1390 sec: 937087 operations; 639.81 current ops/sec; 
 1400 sec: 943192 operations; 610.32 current ops/sec; 
 1410 sec: 949557 operations; 636.31 current ops/sec; 
 1420 sec: 955440 operations; 588.12 current ops/sec; 
 1430 sec: 961131 operations; 568.82 current ops/sec; [READ AverageLatency(ms)=13]
 1440 sec: 966886 operations; 575.33 current ops/sec; 
 1450 sec: 973026 operations; 613.75 current ops/sec; 
 1460 sec: 977823 operations; 479.56 current ops/sec; 
 1470 sec: 983100 operations; 527.54 current ops/sec; 
 1480 sec: 988426 operations; 532.39 current ops/sec; [READ AverageLatency(ms)=6]
 1490 sec: 993480 operations; 505.2 current ops/sec; 
 1500 sec: 997615 operations; 413.38 current ops/sec; 
 1510 sec: 999871 operations; 225.53 current ops/sec; 
 1511 sec: 1000000 operations; 149.31 current ops/sec; 
[OVERALL], RunTime(ms), 1511417.0
[OVERALL], Throughput(ops/sec), 661.6307742998789
[READ], Operations, 1000000
[READ], AverageLatency(ms), 14.618959
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 5956
[READ], Return=0, 999999

Cassandra started out slow here, but then ramped up well.

[edward@aeq202 YCSB-H]$ more step2h.out
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadb -t -threads 10 -p columnfamily=family -p me
asurementtype=timeseries -p recordcount=75000000 -p hosts=10.70.82.102,10.70.82.103,10.70.82.104,10.70.82.105,10.7
0.82.106,10.70.82.107,10.70.82.108 -p operationcount=1000000 -p readproportion=1.0 -p updateproportion=0.0 -p scan
proportion=0.0 -p insertproportion=0.0 -s
 0 sec: 0 operations;
 10 sec: 239 operations; 23.89 current ops/sec; 
 20 sec: 622 operations; 38.28 current ops/sec; 
 30 sec: 1047 operations; 42.48 current ops/sec; 
 40 sec: 1529 operations; 48.18 current ops/sec; 
 50 sec: 2109 operations; 57.98 current ops/sec; 
 60 sec: 2558 operations; 44.89 current ops/sec; 
 70 sec: 3174 operations; 61.58 current ops/sec; 
 80 sec: 3705 operations; 53.08 current ops/sec; 
 90 sec: 4240 operations; 53.48 current ops/sec; 
...

1470 sec: 897564 operations; 1789.96 current ops/sec; 
 1480 sec: 915252 operations; 1768.27 current ops/sec; 
 1490 sec: 933153 operations; 1789.56 current ops/sec; 
 1500 sec: 950931 operations; 1777.09 current ops/sec; 
 1510 sec: 968548 operations; 1761.17 current ops/sec; 
 1520 sec: 986841 operations; 1828.75 current ops/sec; 
 1529 sec: 1000000 operations; 1536.55 current ops/sec; 
[OVERALL], RunTime(ms), 1529054.0
[OVERALL], Throughput(ops/sec), 653.9991393371326
[READ], Operations, 1000000
[READ], AverageLatency(ms), 15.266457
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 4191
[READ], Return=0, 999996

HBase started out really bad, However it really ramped up at the end. This made me think about the fact that hbase uses Xmx ram as block cache automatically. It also makes me wonder how YCSB is distributing requests.

Anyway onto step 3.

[edward@aeq202 YCSB-1]$ more step3_1.out
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.CassandraClient7 -P workloads/workloadb -load -threads 10 -p measurementtype=times
eries -p recordcount=100000 -p hosts=10.70.82.102,10.70.82.103,10.70.82.104,10.70.82.105,10.70.82.106,10.70.82.107,10.
70.82.108 -p operationcount=1000000 -p readproportion=0.5 -p updateproportion=0.5 -p scanproportion=0 -p insertproportion=0.0 -s
 0 sec: 0 operations;
 4 sec: 100000 operations; 20868.11 current ops/sec; 
[OVERALL], RunTime(ms), 4805.0
[OVERALL], Throughput(ops/sec), 20811.654526534858

[INSERT], Operations, 100000
[INSERT], AverageLatency(ms), 0.37902
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 334
[INSERT], Return=0, 99995

And part b.

  more step3_2.out
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.CassandraClient7 -P workloads/workloadb -t -threads 10 -p measurementtype=timeseries -p recordcount=100000 -p hosts=10.70.82.102,10.70.82.103,10.70.82.104,10.70.82.105,10.70.82.106,10.70.82.107,10.70.
82.108 -p operationcount=1000000 -p readproportion=0.5 -p updateproportion=0.5 -p scanproportion=0 -p insertproportion=0.0 -s
 0 sec: 0 operations;
 10 sec: 10053 operations; 1003.69 current ops/sec;  [READ AverageLatency(ms)=13]
 20 sec: 26360 operations; 1629.89 current ops/sec;  
 30 sec: 45980 operations; 1961.22 current ops/sec;  
 40 sec: 68778 operations; 2278.89 current ops/sec;  
 50 sec: 94562 operations; 2577.63 current ops/sec;  [READ AverageLatency(ms)=0]
 60 sec: 121001 operations; 2643.11 current ops/sec;  
 70 sec: 152400 operations; 3138.96 current ops/sec; [UPDATE AverageLatency(ms)=0] 
 80 sec: 187185 operations; 3477.46 current ops/sec;  
 90 sec: 223981 operations; 3678.5 current ops/sec;  
 100 sec: 268378 operations; 4438.37 current ops/sec;  
 110 sec: 324743 operations; 5634.81 current ops/sec;  
 120 sec: 390906 operations; 6614.32 current ops/sec; [UPDATE AverageLatency(ms)=1] 
 130 sec: 471395 operations; 8046.49 current ops/sec; [UPDATE AverageLatency(ms)=1] 
 140 sec: 574907 operations; 10348.1 current ops/sec; [UPDATE AverageLatency(ms)=0] 
 150 sec: 695643 operations; 12068.77 current ops/sec; [UPDATE AverageLatency(ms)=1] [READ AverageLatency(ms)=1.5]
 160 sec: 862716 operations; 16702.29 current ops/sec; [UPDATE AverageLatency(ms)=0] [READ AverageLatency(ms)=1]
 170 sec: 982760 operations; 12000.8 current ops/sec;  
 173 sec: 1000000 operations; 4556.03 current ops/sec;  
[OVERALL], RunTime(ms), 173858.0
[OVERALL], Throughput(ops/sec), 5751.820451172796

[UPDATE], Operations, 500093
[UPDATE], AverageLatency(ms), 0.22737170886215163
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 487
[UPDATE], Return=0, 500091

No onto hbase:

[edward@aeq202 YCSB-H]$ more step3_1h.out
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadb -load -threads 10 -p columnfamily=family -p
 measurementtype=timeseries -p recordcount=100000 -p hosts=10.70.82.102,10.70.82.103,10.70.82.104,10.70.82.105,10.
70.82.106,10.70.82.107,10.70.82.108 -p operationcount=1000000 -p readproportion=0.5 -p updateproportion=0.5 -p scanproportion=0 -p insertproportion=0.0 -s
 0 sec: 0 operations;
[OVERALL], RunTime(ms), 4812.0
[OVERALL], Throughput(ops/sec), 20781.37988362427

[INSERT], Operations, 100000
[INSERT], AverageLatency(ms), 0.2766
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 1413
[INSERT], Return=0, 99956
[INSERT], 0, 0.13765942400078035
[INSERT], 2000, 0.34422091715606645
[INSERT], 3000, 0.4342396459057856

[edward@aeq202 YCSB-H]$ more step3_2h.out
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadb -t -threads 10 -p columnfamily=family -p me
asurementtype=timeseries -p recordcount=100000 -p hosts=10.70.82.102,10.70.82.103,10.70.82.104,10.70.82.105,10.70.
82.106,10.70.82.107,10.70.82.108 -p operationcount=1000000 -p readproportion=0.5 -p updateproportion=0.5 -p scanproportion=0 -p insertproportion=0.0 -s
 0 sec: 0 operations;
 10 sec: 26219 operations; 2621.11 current ops/sec; [UPDATE AverageLatency(ms)=0] 
 20 sec: 59081 operations; 3284.89 current ops/sec;  
 30 sec: 98302 operations; 3920.53 current ops/sec; [UPDATE AverageLatency(ms)=0] 
 40 sec: 143634 operations; 4531.84 current ops/sec;  
 50 sec: 196948 operations; 5329.8 current ops/sec;  [READ AverageLatency(ms)=7]
 60 sec: 258859 operations; 6189.24 current ops/sec;  [READ AverageLatency(ms)=1]
 70 sec: 332519 operations; 7363.79 current ops/sec;  
 80 sec: 404282 operations; 7171.28 current ops/sec;  
 90 sec: 488022 operations; 8371.49 current ops/sec; [UPDATE AverageLatency(ms)=0.25] [READ AverageLatency(ms)=2]
 100 sec: 600742 operations; 11267.49 current ops/sec; [UPDATE AverageLatency(ms)=0] [READ AverageLatency(ms)=1.67
]
 110 sec: 740608 operations; 13982.41 current ops/sec;  
 120 sec: 857629 operations; 11698.59 current ops/sec; [UPDATE AverageLatency(ms)=0] [READ AverageLatency(ms)=1.5]
 
 130 sec: 999905 operations; 14221.91 current ops/sec;  
 130 sec: 1000000 operations; 215.91 current ops/sec;  
[OVERALL], RunTime(ms), 130489.0
[OVERALL], Throughput(ops/sec), 7663.481212975807

[UPDATE], Operations, 500710
[UPDATE], AverageLatency(ms), 0.08582812406382936
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 3350
[UPDATE], Return=0, 500709

Hey, just to show I did not cook the test hbase won something :) This probably has to do with the caches being nicely warmed. (although read my notes about threads below)

[edward@aeq202 YCSB-1]$ more step4_1.out
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.CassandraClient7 -P workloads/workloadd -t -threads 10 -p measurementtype=timeseries -p recordcount=100000 -p hosts=10.70.82.102,10.70.82.103,10.70.82.104,10.70.82.105,10.70.82.106,10.70.82.107,10.70.
82.108 -p operationcount=1000000 -s
 0 sec: 0 operations;
 10 sec: 29798 operations; 2957.32 current ops/sec;  
 20 sec: 82760 operations; 5292.5 current ops/sec;  [READ AverageLatency(ms)=1]
 30 sec: 147511 operations; 6473.16 current ops/sec;  
 40 sec: 215983 operations; 6845.15 current ops/sec;  [READ AverageLatency(ms)=4.25]
 50 sec: 291032 operations; 7501.9 current ops/sec;  [READ AverageLatency(ms)=0]
 60 sec: 360551 operations; 6949.12 current ops/sec;  
 70 sec: 427172 operations; 6660.1 current ops/sec;  [READ AverageLatency(ms)=0]
 80 sec: 495079 operations; 6788.66 current ops/sec;  
 90 sec: 561615 operations; 6651.6 current ops/sec;  
 100 sec: 629529 operations; 6789.36 current ops/sec;  
 110 sec: 704223 operations; 7467.16 current ops/sec;  
 120 sec: 778477 operations; 7422.43 current ops/sec;  
 130 sec: 856599 operations; 7809.86 current ops/sec;  [READ AverageLatency(ms)=1]
 140 sec: 930612 operations; 7399.08 current ops/sec;  [READ AverageLatency(ms)=1]
 150 sec: 976090 operations; 4546.44 current ops/sec;  
 160 sec: 999361 operations; 2326.4 current ops/sec;  
 160 sec: 1000000 operations; 1293.52 current ops/sec;  
[OVERALL], RunTime(ms), 160642.0
[OVERALL], Throughput(ops/sec), 6225.022098828451

[INSERT], Operations, 49909
[INSERT], AverageLatency(ms), 0.33410807669959325
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 118
[INSERT], Return=0, 49909

Let's see if hbase can keep up on a streak.

[edward@aeq202 YCSB-H]$ more step4_1h.out
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadd -t -threads 10 -p columnfamily=family -p measurementtype=timeseries -p recordcount=100000 -p hosts=10.70.82.102,10.70.82.103,10.70.82.104,10.70.82.105,10.70.
82.106,10.70.82.107,10.70.82.108 -p operationcount=1000000 -s
 0 sec: 0 operations;
 10 sec: 64667 operations; 6465.41 current ops/sec;  
 20 sec: 139638 operations; 7494.1 current ops/sec;  
 30 sec: 213618 operations; 7395.78 current ops/sec;  [READ AverageLatency(ms)=0.67]
 40 sec: 291113 operations; 7747.18 current ops/sec;  
 50 sec: 359126 operations; 6798.58 current ops/sec;  
 60 sec: 436327 operations; 7717.01 current ops/sec;  
 70 sec: 513015 operations; 7666.5 current ops/sec;  
 80 sec: 590209 operations; 7717.08 current ops/sec;  
 90 sec: 664568 operations; 7433.67 current ops/sec;  [READ AverageLatency(ms)=0]
 100 sec: 738939 operations; 7434.87 current ops/sec;  
 110 sec: 811432 operations; 7247.13 current ops/sec;  [READ AverageLatency(ms)=1]
 120 sec: 871502 operations; 6004.6 current ops/sec;  
 130 sec: 944788 operations; 7326.4 current ops/sec;  
 138 sec: 1000000 operations; 6878.29 current ops/sec;  
[OVERALL], RunTime(ms), 138070.0
[OVERALL], Throughput(ops/sec), 7242.702976750924

[INSERT], Operations, 49674
[INSERT], AverageLatency(ms), 0.057514997785561864
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 384
[INSERT], Return=0, 49674

Yes! Two in a row for hbase.

[edward@aeq202 YCSB-1]$ more step4_2.out
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.CassandraClient7 -P workloads/workloadd -t -threads 10 -p measurementtype=timeseries -p recordcount=100000 -p hosts=10.70.82.102,10.70.82.103,10.70.82.104,10.70.82.105,10.70.82.106,10.70.82.107,10.70.
82.108 -p operationcount=1000000 -s
 0 sec: 0 operations;
 10 sec: 194837 operations; 19296.52 current ops/sec; [INSERT AverageLatency(ms)=1] [READ AverageLatency(ms)=0.25]
 20 sec: 413260 operations; 21824.84 current ops/sec;  [READ AverageLatency(ms)=0.86]
 30 sec: 628329 operations; 21500.45 current ops/sec;  [READ AverageLatency(ms)=1]
 40 sec: 865482 operations; 23708.19 current ops/sec;  [READ AverageLatency(ms)=1]
 50 sec: 998765 operations; 13322.97 current ops/sec;  
 50 sec: 1000000 operations; 3224.54 current ops/sec;  
[OVERALL], RunTime(ms), 50519.0
[OVERALL], Throughput(ops/sec), 19794.532750054434

[INSERT], Operations, 49874
[INSERT], AverageLatency(ms), 0.3497012471427999
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 173
[INSERT], Return=0, 49874

[edward@aeq202 YCSB-1]$ more ../YCSB-H/step4_2h.out
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadd -t -threads 10 -p columnfamily=family -p m
easurementtype=timeseries -p recordcount=100000 -p hosts=10.70.82.102,10.70.82.103,10.70.82.104,10.70.82.105,10.7
0.82.106,10.70.82.107,10.70.82.108 -p operationcount=1000000 -s
 0 sec: 0 operations;
 10 sec: 141807 operations; 14177.86 current ops/sec;  [READ AverageLatency(ms)=0]
 20 sec: 313782 operations; 17190.62 current ops/sec;  [READ AverageLatency(ms)=0]
 30 sec: 492157 operations; 17832.15 current ops/sec; [INSERT AverageLatency(ms)=0] [READ AverageLatency(ms)=0]
 40 sec: 671415 operations; 17920.42 current ops/sec;  [READ AverageLatency(ms)=1.5]
 50 sec: 836333 operations; 16486.85 current ops/sec;  
 60 sec: 996942 operations; 16054.48 current ops/sec;  [READ AverageLatency(ms)=0.5]
 60 sec: 1000000 operations; 3935.65 current ops/sec;  
[OVERALL], RunTime(ms), 60798.0
[OVERALL], Throughput(ops/sec), 16447.909470706272
[INSERT], Operations, 49689
[INSERT], AverageLatency(ms), 0.06967336835114411
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 444
[INSERT], Return=0, 49689

C* gets the nod here. 23,000 ops/sec. Sexy!

[edward@aeq202 YCSB-1]$ more step5.out
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.CassandraClient7 -P workloads/workloadb -t -threads 10 -p measurementtype=timeseri
es -p recordcount=75000000 -p hosts=10.70.82.102,10.70.82.103,10.70.82.104,10.70.82.105,10.70.82.106,10.70.82.107,10.7
0.82.108 -p operationcount=1000000 -p readproportion=0.33 -p updateproportion=0.33 -p scanproportion=0 -p insertproportion=0.33 -s
 0 sec: 0 operations;
 10 sec: 11856 operations; 1183.71 current ops/sec; [UPDATE AverageLatency(ms)=0]  
 20 sec: 27317 operations; 1545.33 current ops/sec;  [INSERT AverageLatency(ms)=1] [READ AverageLatency(ms)=14]
 30 sec: 44148 operations; 1682.6 current ops/sec;   
 40 sec: 62368 operations; 1821.45 current ops/sec;   
 50 sec: 81708 operations; 1933.42 current ops/sec;   
...

 400 sec: 984224 operations; 1595.42 current ops/sec;   
 410 sec: 999942 operations; 1571.33 current ops/sec;   
 410 sec: 1000000 operations; 610.53 current ops/sec;   
[OVERALL], RunTime(ms), 410248.0
[OVERALL], Throughput(ops/sec), 2437.5499697743803

[UPDATE], Operations, 333772
[UPDATE], AverageLatency(ms), 0.2027072372757451
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 206
[UPDATE], Return=0, 333772

And finally...

[edward@aeq202 YCSB-H]$ more step5h.out
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadb -t -threads 10 -p columnfamily=family -p measurementtype=timeseries -p recordcount=75000000 -p hosts=10.70.82.102,10.70.82.103,10.70.82.104,10.70.82.105,10.7
0.82.106,10.70.82.107,10.70.82.108 -p operationcount=1000000 -p readproportion=0.33 -p updateproportion=0.33 -p sc
anproportion=0 -p insertproportion=0.33 -s
 0 sec: 0 operations;
 10 sec: 39052 operations; 3904.03 current ops/sec;   
 20 sec: 83966 operations; 4489.6 current ops/sec;   
 30 sec: 122010 operations; 3803.26 current ops/sec;  [INSERT AverageLatency(ms)=0] 
 40 sec: 167584 operations; 4556.03 current ops/sec;   
 50 sec: 211101 operations; 4350.39 current ops/sec;   
...

200 sec: 872837 operations; 4796.66 current ops/sec; [UPDATE AverageLatency(ms)=0]  [READ AverageLatency(ms)=0]
 210 sec: 918882 operations; 4603.12 current ops/sec;   
 220 sec: 962394 operations; 4349.9 current ops/sec;   
 229 sec: 1000000 operations; 4074.32 current ops/sec;   
[OVERALL], RunTime(ms), 229305.0
[OVERALL], Throughput(ops/sec), 4361.0039030984935
[UPDATE], Operations, 333632
[UPDATE], AverageLatency(ms), 0.015984677728755035
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 385
[UPDATE], Return=0, 333632

Hbase really took this one. Or did it? Read on.

Stop point:

So, I am not done yet, but I wanted to stop here and say, hbase definitely surprised me in several ways. In the past it has not performed well in out of the box benchmarks, but other then setting Xmx to 4096 and upping dfs handles I did not do much. The variance in insertion and read performance due to (cold caches?) and (region splitting?) was really extreme. 

Cassandra performed consistently and well in most workloads. This too was an out of the box config (cassandra chooses Xmx) and I only set initial tokens.

But wait there's more!

As I mentioned above, I only did one tune to hbase (upping Xmx to 4096), but you know the thing about a butterfly's wings and deadly hurricanes.

http://hbase.apache.org/book/config.files.html

hfile.block.cache.size

Percentage of maximum heap (-Xmx setting) to allocate to block cache used by HFile/StoreFile. Default of 0.2 means allocate 20%. Set to 0 to disable.

Default: 0.2

I hate the phrase apples to apples. But I figured comparing hbase block, with cassandra using VFS cache was not quite fair.

create keyspace usertable with replication_factor=3;
create column family data with rows_cached=200000;


Thus I turned on the cassandra row cache, which you might compare to the hbase block cache.

[edward@aeq202 YCSB]$ grep -a5 'RunTime(' step1a.out
[OVERALL], RunTime(ms), 1313753.0
[OVERALL], Throughput(ops/sec), 19029.452263857816
[INSERT], Operations, 25000000
[INSERT], AverageLatency(ms), 4.42668952
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 15425

[edward@aeq202 YCSB]$ grep -a5 'RunTime(' step2.out
[OVERALL], RunTime(ms), 1403466.0
[OVERALL], Throughput(ops/sec), 712.5217140992372
[READ], Operations, 1000000
[READ], AverageLatency(ms), 12.55368
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 5606

[edward@aeq202 YCSB]$ grep -a5 'RunTime(' step3_1.out
[OVERALL], RunTime(ms), 4474.0
[OVERALL], Throughput(ops/sec), 22351.363433169423
[INSERT], Operations, 100000
[INSERT], AverageLatency(ms), 0.3598
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 207

[edward@aeq202 YCSB]$ grep -a5 'RunTime(' step3_2.out
[OVERALL], RunTime(ms), 147656.0
[OVERALL], Throughput(ops/sec), 6772.498239150458
[UPDATE], Operations, 499506
[UPDATE], AverageLatency(ms), 0.23532850456250776
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 526

[edward@aeq202 YCSB]$ grep -a5 'RunTime(' step4_1.out
[OVERALL], RunTime(ms), 114302.0
[OVERALL], Throughput(ops/sec), 8748.753302654372
[INSERT], Operations, 49638
[INSERT], AverageLatency(ms), 0.5584229823925219
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 10918

[edward@aeq202 YCSB]$ grep -a5 'RunTime(' step4_2.out
[OVERALL], RunTime(ms), 39329.0
[OVERALL], Throughput(ops/sec), 25426.530041445243
[INSERT], Operations, 49941
[INSERT], AverageLatency(ms), 0.3666926973829118
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 141

So that with that change Cassandra rocked out every test EXCEPT the 5th one.

I figured out why:
java -cp $CP com.yahoo.ycsb.Client  -db com.yahoo.ycsb.db.CassandraClient7 -P workloads/workloadb \
-t \
-threads 30 \

With only 10 threads readers were blocking writers and since C* writes fast, so I bumped the threads for less contention.


[OVERALL], RunTime(ms), 196521.0
[OVERALL], Throughput(ops/sec), 5088.514713440294
[UPDATE], Operations, 333423
[UPDATE], AverageLatency(ms), 0.21831427346043913
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 604
[UPDATE], Return=0, 333423

In a nutshell, this is why I hate benchmarking. There are just way to many permutations of configurations and options.

So step5 is kinda out the window for now because I need to load the test to try hbase with higher thread settings.

So this is the fun part. Here, I tell you that if you have something you want me to try to make this benchmark even better you email me edlinuxguru@gmail.com . I will post all the configurations soon. I actually have these servers on loan from another project, so if you have something you want to try, let me know, fast.

Comments:

Post a Comment:
Comments are closed for this entry.

Calendar

Feeds

Search

Links

Navigation