Cassandra and Yahoo Cloud Serving Benchmark Remix
When we get new servers in I also make a mental note to myself to work twice as hard during the week. You may ask why? The answer is if I can get the machines prepped before they are needed in production it gives me the chance to run Cassandra and the Yahoo Cloud Serving Benchmark on them.
What do you get when you cross, Apache Cassandra, solid state disks, a blade center loaded with 14 blades?.....
In this previous post we ran a benchmark against 8 server class machines with scsi disks. This time we have 14 blades, 32 GB RAM with SSDs! Ow it's on. It's on.
Last time, I did a lot of manual editing of the ycsb load testing scripts. This time I introduced some variables into the scripts that can be set from the command line. Which make it much easier to run the scripts on N machines without having to tweak them individually.
Some early results:
First, lets get this out of the way. This is BigData, not tiny little baby data. We are not inserting 1,000,000 rows. Because if the data fits in one machines main memory, its not BigData!
We are inserting 75,000,000 entries, which turns out to be 322 GB data (~23 GB node with replication factor 3) Now, our machines do have a good amount of RAM (32Gb), and the JVM is sized to 8GB for this test. (I need to make another test with about double the data so I can stress out the SSD drives more.
[OVERALL], RunTime(ms), 656811.0
[OVERALL], Throughput(ops/sec), 16312.58459435058
I actually need to revisit this test. I foo bared it a bit and launched 2 or 3 extra instances. In any case, I saw peaks of nearly 20,000 ops/ sec. I was impressed by this anyway, it took netflix 300 machines from amazons cloud to hit 1,000,000 inserts a second. My 14 node cluster hit was running 7 YCSB instances at once. I achieved 140,000 inserts per second without really trying to hard. I could say more about cloud vs bare metal but I would be digressing.
Bring able to write data fast is great, but reading it is more interesting. After all, I surely can write data fast using DD that is not saying much. The results are here , one node did:
[OVERALL], Throughput(ops/sec), 14835.694681403456
Since we have 7 instances of ycsb the total reads/sec was 98,000 reads/sec or to throw a big number at you 8,467,200,000 reads a day.
I'm not even done playing with the number of threads, and other tuning.
If you have a clever idea for a benchmark email me before I need to get these servers into production!
Just bumped the threads to 60 and got
nohup ssh -l edward cdbeq101 "cd YCSB && sh step2.sh 75000000 7 0 60" > foo.out 2> foo.err < /dev/null &
[OVERALL], RunTime(ms), 35712.0
[OVERALL], Throughput(ops/sec), 28001.79211469534
Went to 100 threads and got!
0 sec: 0 operations;
10 sec: 233789 operations; 23320.6 current ops/sec; [READ AverageLatency(ms)=5.07]
20 sec: 537465 operations; 30141.54 current ops/sec; [READ AverageLatency(ms)=1.6]
30 sec: 864622 operations; 32715.7 current ops/sec; [READ AverageLatency(ms)=1.36]
37 sec: 1000000 operations; 17411.96 current ops/sec;
[OVERALL], RunTime(ms), 37877.0
[OVERALL], Throughput(ops/sec), 26401.24613881775
Crazy peaked at 32K!