One of the big use cases for Oracle Cloud Infrastructure (OCI) formerly known as Baremetal Cloud is noSQL and newSQL. Oracle states what big advantage on storage layer would be PCIe based NVMe SSD with super low latency. And elimination of virtualization layer would let it shine at full.

While MemSQL row store is in-memory, it’s columnar store persists on disk. When look from filesystem level, columnar store data consists of a lot of small files. And per MemSQL documentation column store is very sensitive to latency, and concurrent writes is common pattern, so fast SSD storage or RAM based disks are recommended.

So lest benchmark and compare OCI baremetal NVMe SSD with AWS ephemeral local SSD storage.

Pick shapes

There is no exact 1-1 shape matches between AWS and OCI but will pick as close as possible, and compensate differences on test level.

Oracle: BM.HighIO1.36 (36 OCPU, 500GB of RAM, 4x3TB NVMe SSD drives) $4.46/hr.
AWS: x1.16xlarge (64 vCPU, 1TB of RAM, 1x2TB ephemeral SSD drive) $6.67/hr.

On Multithreaded systems 36 Oracle OCPU would be equivalent 72 Amazon vCPU, as OCPU=1core, vCPU=1 thread.

Choose OS/kernel

On AWS will do latest RHEL release:

  • Red Hat Enterprise Linux Server release 7.4 (Maipo)
  • hostname: aws-rhel7test

On OCI will use latest CnetOS (RHEL is not available yet):

  • CentOS Linux release 7.3.1611 (Core)
  • hostname: oci-centos7test

And as Oracle have done NVMe optimizations in their UEK kernel, lets test it on OCI too:

  • Oracle-Linux-7.4
  • hostname: oci-oel7test
[ec2-user@aws-rhel7test ~]$ uname -r 
3.10.0-693.el7.x86_64  
[opc@oci-centos7test ~]$ uname -r 
3.10.0-514.26.2.el7.x86_64
[opc@oci-oel7test ~]$ uname -r 
4.1.12-103.6.1.el7uek.x86_64  

Interesting to note, what oracle packages OEL 7.4 with a lot newer 4.1 kernel.

Install software

Its a same process on all 3 instances, so I will show only on example of Oracle Enterprise Linux host.

Format/Mount SSD

[opc@oci-oel7test ~]$ lsblk 
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT 
sda 8:0 0 46.6G 0 disk  
├─sda1 8:1 0 544M 0 part /boot/efi 
├─sda2 8:2 0 8G 0 part [SWAP] 
└─sda3 8:3 0 38G 0 part / 
nvme0n1 259:2 0 2.9T 0 disk <<---SSD we will use for test 
nvme1n1 259:3 0 2.9T 0 disk  
nvme2n1 259:1 0 2.9T 0 disk  
nvme3n1 259:0 0 2.9T 0 disk   
[opc@oci-oel7test ~]$ sudo parted /dev/nvme0n1 mklabel gpt 
[opc@oci-oel7test ~]$ sudo parted -a opt /dev/nvme0n1 mkpart primary ext4 0% 100% 
[opc@oci-oel7test ~]$ lsblk 
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 46.6G 0 disk  
├─sda1 8:1 0 544M 0 part /boot/efi 
├─sda2 8:2 0 8G 0 part [SWAP] 
└─sda3 8:3 0 38G 0 part / 
nvme0n1 259:2 0 2.9T 0 disk  
└─nvme0n1p1 259:4 0 2.9T 0 part 
nvme1n1 259:3 0 2.9T 0 disk  
nvme2n1 259:1 0 2.9T 0 disk  
nvme3n1 259:0 0 2.9T 0 disk   
[opc@oci-oel7test ~]$ sudo mkfs.ext4 -L datapartition /dev/nvme0n1p1 
[opc@oci-oel7test ~]$ sudo mkdir -p /mnt/data 
[opc@oci-oel7test ~]$ sudo mount -o defaults /dev/nvme0n1p1 /mnt/data  

Only difference for AWS instance - it only have 1 SSD drive. We will use only 1 on oracle baremetal to be more apples-to-apples to AWS. When drive is SSD drive is formatted and mounted on AWS it looks like:

[ec2-user@aws-rhel7test ~]$ lsblk 
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT 
xvda 202:0 0 10G 0 disk  
├─xvda1 202:1 0 1M 0 part  
└─xvda2 202:2 0 10G 0 part / 
xvdb 202:16 0 1.8T 0 disk  
└─xvdb1 202:17 0 1.8T 0 part /mnt/data <<---SSD we will use for test  

Install MemSQL

[ec2-user@aws-rhel7test ~]$ wget http://download.memsql.com/memsql-ops-5.8.1/memsql-ops-5.8.1.tar.gz  
[ec2-user@aws-rhel7test ~]$ tar xvf memsql-ops-5.8.1.tar.gz 
[ec2-user@aws-rhel7test ~]$ cd memsql-ops-5.8.1
[ec2-user@aws-rhel7test memsql-ops-5.8.1]$ sudo ./install.sh   

Pick Single node install for simplicity.
Note: This is not a scalable way of installing MemSQL, but it does not matter in this example, as all we need is file structure of MemSQL columnar store. More details below.
MemSQL installs by default to /var/lib/memsql. Need to move it point to the SSD drive:

[ec2-user@aws-rhel7test ~]$ sudo su -   
[ec2-user@aws-rhel7test ~]# /etc/init.d/memsql-ops stop  
[ec2-user@aws-rhel7test ~]# cd /var/lib/  
[ec2-user@aws-rhel7test ~]# mkdir _local  
[ec2-user@aws-rhel7test ~]# mv memsql* _local/  
[ec2-user@aws-rhel7test ~]# cd _local/  
[ec2-user@aws-rhel7test ~]# find * -depth -print | cpio -pvdm /mnt/data  
[ec2-user@aws-rhel7test ~]# cd ..  
[ec2-user@aws-rhel7test ~]# ln -s /mnt/data/memsql* .  
[ec2-user@aws-rhel7test ~]# cd  
[ec2-user@aws-rhel7test ~]# ls -la /mnt/data/  
[ec2-user@aws-rhel7test ~]# /etc/init.d/memsql-ops start  
[ec2-user@aws-rhel7test ~]# ls -l /var/lib/  
...  
drwx------. 2 root root 6 Aug 30 00:24 machines  
lrwxrwxrwx. 1 root root 16 Oct 15 05:22 memsql -> /mnt/data/memsql  
lrwxrwxrwx. 1 root root 20 Oct 15 05:22 memsql-ops -> /mnt/data/memsql-ops  
drwxr-xr-x. 2 root root 37 Oct 15 05:03 misc  
...  

Compile latest ruby from source, it will be needed for benchmark script.

[ec2-user@aws-rhel7test ~]# yum install gcc  
[ec2-user@aws-rhel7test ~]# wget https://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.5.tar.gz  
[ec2-user@aws-rhel7test ~]# tar xvf ruby-2.3.5.tar.gz  
[ec2-user@aws-rhel7test ~]# cd ruby-2.3.5/  
[ec2-user@aws-rhel7test ~]# ./configure  
[ec2-user@aws-rhel7test ~]# make  
[ec2-user@aws-rhel7test ~]# sudo make install  

Create test dataset

Create columnar store test table

The point is to generate enough distinctive values to make column store close by structure to real life example.

memsql> create database test; 
memsql> use test  
memsql> CREATE TABLE Persons1 
( 
ID INT NOT NULL AUTO_INCREMENT,
LastName varchar(255) NOT NULL, 
time TIMESTAMP, 
KEY (id) USING CLUSTERED COLUMNSTORE 
)  
ENGINE=MemSQL AUTO_INCREMENT=100;  
memsql> Insert into Persons1(LastName,time) values('Jordan', NOW()); 
memsql> Insert into Persons1(LastName,time) values('Kumar', NOW()); 
memsql> Insert into Persons1(LastName,time) values('Duglas', NOW()); 
memsql> Insert into Persons1(LastName,time) values('Johnson', NOW()); 
memsql> Insert into Persons1(LastName,time) values('Petrov', NOW()); 
memsql> Insert into Persons1(LastName,time) values('Hiraga', NOW()); 
memsql> commit;  
memsql> insert into Persons1 select id+row_number() over(), concat(LastName,row_number() over()), NOW() from Persons1 a; 
memsql> insert into Persons1 select id+row_number() over(), concat(LastName,row_number() over()), NOW() from Persons1 a;  
 ...  
memsql> insert into Persons1 select id+row_number() over(), concat(LastName,row_number() over()), NOW() from Persons1 a;  
memsql> select count(1) from Persons1;  
+----------+  
| count(1) |  
+----------+  
| 50331648 |  
+----------+  
1 row in set (0.26 sec)  
memsql> trigger 3 gc flush;  

Benchmark

CPU Benchmark, since we are already on it..

Last insert can be used a write benchmark for the column store. Its NOT Storage/SSD benchmark, as insert is rather CPU than IO heavy. Lest see.

aws-rhel7test:

memsql> insert into Persons1 select id+row_number() over(), concat(LastName,row_number() over()), NOW() from Persons1 a;  
Query OK, 25165824 rows affected (3 min 51.91 sec)  
Records: 25165824 Duplicates: 0 Warnings: 0  
oci-centos7test:  
memsql> insert into Persons1 select id+row_number() over(), concat(LastName,row_number() over()), NOW() from Persons1 a;  
Query OK, 25165824 rows affected (3 min 48.15 sec)  
Records: 25165824 Duplicates: 0 Warnings: 0  
oci-oel7test:  
memsql> insert into Persons1 select id+row_number() over(), concat(LastName,row_number() over()), NOW() from Persons1 a;  
Query OK, 25165824 rows affected (3 min 8.76 sec)  
Records: 25165824 Duplicates: 0 Warnings: 0  

This is CPU used:

[ec2-user@aws-rhel7test ~]$ lscpu | grep "Model name"  
Model name: Intel(R) Xeon(R) CPU E7-8880 v3 @ 2.30GHz  
[opc@oci-centos7test ~]$ lscpu | grep "Model name"  
Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz  
[opc@oci-oel7test ~]$ lscpu | grep "Model name"  
Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz  

Same generation of chip, but this AWS instance has higher end E7 series Haswell-EX model [read: expensive]. But still under-performing Oracle OCI version on Centos. (baremeral difference?).

And OCI UEK kernel perform way better. New linux kernel?
Will need to do kernel profiling and flame graphs to narrow it down. Will address it in different blog, This blog is about I/O benchmarking, not CPU. keep focused.

IO benchmark

Now time to do I/O benchmark.
Its hard to accurately I/O benchmark by running specific MemSQL query, as columnar store processing is tight to other components as CPU and IPC. We use simple ruby script for columnar store benchmark, which basically just reads every file in columnar store. It is the easiest way to simulate columnar store access pattern in predictable and linear way. Script needed to be run after cleaning file system cache.

Confirm column store looks similar on all 3 instances:

[ec2-user@aws-rhel7test columns]$ du --max-depth=1 -h /var/lib/memsql/leaf-3307/data/  
276K ./snapshots  
4.4G ./columns  
4.6G ./logs  
8.9G .  
find /var/lib/memsql/leaf-3307/data/columns/ -type f | wc -l  
3240  

[opc@oci-centos7test columns]$ du --max-depth=1 -h /var/lib/memsql/leaf-3307/data/  
308K ../snapshots  
4.5G ../logs  
4.3G ../columns  
8.8G ..  
[opc@oci-oel7test ~]$ find /var/lib/memsql/leaf-3307/data/columns/ -type f | wc -l  
3240  

[opc@oci-oel7test columns]$ du --max-depth=1 -h /var/lib/memsql/leaf-3307/data/  
308K ../snapshots  
4.3G ../columns  
4.5G ../logs  
8.8G ..  
[opc@centos74test ~]$ find /var/lib/memsql/leaf-3307/data/columns/ -type f | wc -l  
3240  

Run test with 50 threads (so not to exceed core count on either instance):


[ec2-user@aws-rhel7test columns]$ for i in `seq 2`; do sudo echo 3 | sudo tee /proc/sys/vm/drop_caches; sudo -u memsql /usr/local/bin/ruby /tmp/measure_reads.rb 50; done
3
running with 50 threads…
 list files: user 0.010 system 0.030 total 0.040 real 0.335
 read files: user 4.070 system 74.070 total 78.140 real 469.600
directory entries: 5100
3
running with 50 threads…
 list files: user 0.010 system 0.040 total 0.050 real 0.303
 read files: user 4.670 system 73.220 total 77.890 real 472.863
directory entries: 5100

[opc@oci-centos7test columns]$ for i in `seq 2`; do sudo echo 3 | sudo tee /proc/sys/vm/drop_caches; sudo -u memsql /usr/local/bin/ruby /tmp/measure_reads.rb 50; done
3
running with 50 threads…
 list files: user 0.000 system 0.030 total 0.030 real 0.164
 read files: user 5.320 system 74.320 total 79.640 real 63.207
directory entries: 5156
3
running with 50 threads…
 list files: user 0.010 system 0.040 total 0.050 real 0.180
 read files: user 5.220 system 74.820 total 80.040 real 63.143
directory entries: 5156

[opc@oci-oel7test columns]$ for i in `seq 2`; do sudo echo 3 | sudo tee /proc/sys/vm/drop_caches; sudo -u memsql /usr/local/bin/ruby /tmp/measure_reads.rb 50; done
running with 50 threads…
 list files: user 0.000 system 0.020 total 0.020 real 0.144
 read files: user 3.120 system 49.790 total 52.910 real 63.207
directory entries: 5156
3
running with 50 threads…
 list files: user 0.000 system 0.020 total 0.020 real 0.144
 read files: user 3.700 system 48.960 total 52.660 real 63.234
directory entries: 5156
“real” for “read files” is where IO time is counted. There is a HUGE difference between AWS and Oracle Baramreal Cloud: 470 vs 60. Oracle NVMe SSD is about 7.8 times faster.

This is how iostat looked during the test (took average numbers):


[ec2-user@aws-rhel7test columns]$ iostat -x -k 1
avg-cpu: %user %nice %system %iowait %steal %idle
 0.39 0.00 0.27 4.97 0.00 94.37
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdb 0.00 0.00 11577.00 0.00 502092.00 0.00 86.74 143.53 12.42 12.42 0.00 0.09 100.00

[opc@oci-centos7test columns]$ iostat -x -k 1 avg-cpu: %user %nice %system %iowait %steal %idle 0.67 0.00 1.48 28.34 0.00 69.51 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme2n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme3n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme0n1 0.00 0.00 28264.00 0.00 3478888.00 0.00 246.17 88.90 3.14 3.14 0.00 0.04 100.20 nvme1n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[opc@oci-oel7test columns]$ iostat -x -k 1 avg-cpu: %user %nice %system %iowait %steal %idle 0.33 0.00 0.91 50.95 0.00 47.81 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.00 29515.00 0.00 3553152.00 0.00 237.84 83.78 3.16 3.16 0.00 0.04 97.80 nvme3n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme2n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme1n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
So AWS was pushing 11k IOPS (r/s column) with 500MB/sec throughput (rkB/s column).
Oracle Baremetal NVMe SSD pushed 30k IOPS with 3.5GB/sec.
7 times difference.
By iostat output can see what load is large size I/O (100kB) with all systems hit limit on throughput.

Conclusion

On memsql use case Oracle barameral NVMe SSD are 7-8 times faster than AWS most performant counterpart - ephemeral storage SSD.

Worth to note: Oracle SSD are persistent over reboots, while AWS ephemeral SSDs are wiped out on instance shutdown, so AWS cluster version is not an easy to maintain. I just picked ehemral storage, as it is the fastest one on AWS to compare.

Real apples-to-apples comparison would be to compare Oracle baremeral NVMe SSDs to AWS EBS Provisioned IOPS SSD (io1) Volumes. Most probably gap will be even more, as io1 is not local. But its subject for another blog.