One of the big use cases for Oracle Cloud Infrastructure (OCI) formerly known as Baremetal Cloud is noSQL and newSQL. Oracle says what big advantage on storage layer would be PCIe based NVMe SSD with super low latency. And elimination of virtualization layer would let it shine at full.

While MemSQL row store is in-memory, it’s columnar store persists on disk. When look from filesystem level, columnar store data consists of a lot of small files. And per MemSQL documentation column store is very sensitive to latency, and concurrent writes is common pattern, so fast SSD storage or RAM based disks are recommended.

So lest benchmark and compare OCI baremetal NVMe SSD with AWS instance NVMe SSD store.

Pick shapes

There is no exact 1-1 shape matches between AWS and OCI but can go pretty close.

Oracle: BM.HighIO1.36 (36 OCPU, 500GB of RAM, 4x3TB NVMe SSD drives) $4.46/hr.   
AWS: i3.16xlarge (64 vCPU, 488GB of RAM, 8x1.9TB instance NVMe SSD drive) $4.99/hr.   

On Multithreaded systems 36 Oracle OCPU would be equivalent 72 Amazon vCPU, as OCPU=1core, vCPU=1 thread.
Oracle instance has a more memory and CPU, but less expensive. Oracle instance has a bit less SSD raw capacity.

Choose OS/kernel

On AWS will do latest RHEL release:

  • Red Hat Enterprise Linux Server release 7.4 (Maipo)
  • hostname: aws-rhel7test

On OCI will use latest CnetOS (RHEL is not available yet):

  • CentOS Linux release 7.3.1611 (Core)
  • hostname: oci-centos7test

And as Oracle have done NVMe optimizations in their UEK kernel, lets test it on OCI too:

  • Oracle-Linux-7.4
  • hostname: oci-oel7test
[ec2-user@aws-rhel7test ~]$ uname -r 
3.10.0-693.el7.x86_64  
[opc@oci-centos7test ~]$ uname -r 
3.10.0-514.26.2.el7.x86_64  
[opc@oci-oel7test ~]$ uname -r 
4.1.12-103.6.1.el7uek.x86_64  

Interesting to note, what oracle packages OEL 7.4 with a lot newer 4.1 kernel.

Install software

Its a same process on all 3 instances, so I will show only on example of Oracle Enterprise Linux host.

Format/Mount SSD

[opc@oci-oel7test ~]$ lsblk 
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda           8:0    0 46.6G  0 disk 
├─sda1        8:1    0  544M  0 part /boot/efi
├─sda2        8:2    0    8G  0 part [SWAP]
└─sda3        8:3    0   38G  0 part /
nvme0n1     259:2    0  2.9T  0 disk          <<---SSD we will use for test
nvme1n1     259:3    0  2.9T  0 disk 
nvme2n1     259:1    0  2.9T  0 disk 
nvme3n1     259:0    0  2.9T  0 disk 

[opc@oci-oel7test ~]$ sudo parted /dev/nvme0n1 mklabel gpt
[opc@oci-oel7test ~]$ sudo parted -a opt /dev/nvme0n1 mkpart primary ext4 0% 100%
[opc@oci-oel7test ~]$ lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda           8:0    0 46.6G  0 disk 
├─sda1        8:1    0  544M  0 part /boot/efi
├─sda2        8:2    0    8G  0 part [SWAP]
└─sda3        8:3    0   38G  0 part /
nvme0n1     259:2    0  2.9T  0 disk 
└─nvme0n1p1 259:4    0  2.9T  0 part
nvme1n1     259:3    0  2.9T  0 disk 
nvme2n1     259:1    0  2.9T  0 disk 
nvme3n1     259:0    0  2.9T  0 disk 
[opc@oci-oel7test ~]$ sudo mkfs.ext4 -L datapartition /dev/nvme0n1p1 
[opc@oci-oel7test ~]$ sudo mkdir -p /mnt/data 
[opc@oci-oel7test ~]$ sudo mount -o defaults /dev/nvme0n1p1 /mnt/data  

Only difference for AWS instance - different count of SSD drives. We will use only 1 on oracle baremetal and one on AWS to be more apples-to-apples.
When drive is SSD drive is formatted and mounted on AWS it looks like:

[ec2-user@aws-rhel7test ~]$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT  
nvme0n1 259:2 0 1.9T 0 disk   
└─nvme0n1p1 259:4 0 1.9T 0 part /mnt/data <<---SSD we will use for test nvme1n2 259:0 0 1.9T 0 disk  
nvme1n3 259:1 0 1.9T 0 disk  
nvme1n4 259:3 0 1.9T 0 disk  
nvme1n5 259:4 0 1.9T 0 disk  
nvme1n6 259:5 0 1.9T 0 disk  
nvme1n7 259:6 0 1.9T 0 disk  
nvme1n8 259:7 0 1.9T 0 disk  

Install MemSQL

[ec2-user@aws-rhel7test ~]$ wget http://download.memsql.com/memsql-ops-5.8.1/memsql-ops-5.8.1.tar.gz  
[ec2-user@aws-rhel7test ~]$ tar xvf memsql-ops-5.8.1.tar.gz  
[ec2-user@aws-rhel7test ~]$ cd memsql-ops-5.8.1 [ec2-user@aws-rhel7test memsql-ops-5.8.1]$ sudo ./install.sh   

Pick Single node install for simplicity.
Note: This is not a scalable way of installing MemSQL, but it does not matter in this example, as all we need is file structure of MemSQL columnar store. More details below.

MemSQL installs by default to /var/lib/memsql. Need to move it point to the SSD drive:

[ec2-user@aws-rhel7test ~]$ sudo su -   
[ec2-user@aws-rhel7test ~]# /etc/init.d/memsql-ops stop  
[ec2-user@aws-rhel7test ~]# cd /var/lib/  
[ec2-user@aws-rhel7test ~]# mkdir _local  
[ec2-user@aws-rhel7test ~]# mv memsql* _local/  
[ec2-user@aws-rhel7test ~]# cd _local/  
[ec2-user@aws-rhel7test ~]# find * -depth -print | cpio -pvdm /mnt/data  
[ec2-user@aws-rhel7test ~]# cd ..  
[ec2-user@aws-rhel7test ~]# ln -s /mnt/data/memsql* .  
[ec2-user@aws-rhel7test ~]# cd  
[ec2-user@aws-rhel7test ~]# ls -la /mnt/data/  
[ec2-user@aws-rhel7test ~]# /etc/init.d/memsql-ops start  
[ec2-user@aws-rhel7test ~]# ls -l /var/lib/  
...  
drwx------. 2 root root 6 Aug 30 00:24 machines  
lrwxrwxrwx. 1 root root 16 Oct 15 05:22 memsql -> /mnt/data/memsql  
lrwxrwxrwx. 1 root root 20 Oct 15 05:22 memsql-ops -> /mnt/data/memsql-ops  
drwxr-xr-x. 2 root root 37 Oct 15 05:03 misc  
...  

Compile latest ruby from source, it will be needed for benchmark script.

[ec2-user@aws-rhel7test ~]# yum install gcc  
[ec2-user@aws-rhel7test ~]# wget https://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.5.tar.gz  
[ec2-user@aws-rhel7test ~]# tar xvf ruby-2.3.5.tar.gz  
[ec2-user@aws-rhel7test ~]# cd ruby-2.3.5/  
[ec2-user@aws-rhel7test ~]# ./configure  
[ec2-user@aws-rhel7test ~]# make  
[ec2-user@aws-rhel7test ~]# sudo make install  

Create test dataset

Create columnar store test table

The point is to generate enough distinctive values to make column store close by structure to real life example.

memsql> create database test; memsql> use test  
memsql> CREATE TABLE Persons1 
( ID INT NOT NULL AUTO_INCREMENT, 
LastName varchar(255) NOT NULL, time TIMESTAMP, 
KEY (id) USING CLUSTERED COLUMNSTORE 
)  
ENGINE=MemSQL AUTO_INCREMENT=100;  
memsql> Insert into Persons1(LastName,time) values('Jordan', NOW()); 
memsql> Insert into Persons1(LastName,time) values('Kumar', NOW()); 
memsql> Insert into Persons1(LastName,time) values('Duglas', NOW()); 
memsql> Insert into Persons1(LastName,time) values('Johnson', NOW()); 
memsql> Insert into Persons1(LastName,time) values('Petrov', NOW()); 
memsql> Insert into Persons1(LastName,time) values('Hiraga', NOW()); 
memsql> commit;  
memsql> insert into Persons1 select id+row_number() over(), concat(LastName,row_number() over()), NOW() 
from Persons1 a; 
memsql> insert into Persons1 select id+row_number() over(), concat(LastName,row_number() over()), NOW() 
from Persons1 a; 
...  
memsql> insert into Persons1 select id+row_number() over(), concat(LastName,row_number() over()), NOW() 
from Persons1 a;  
memsql> select count(1) from Persons1;  
+----------+  
| count(1) |  
+----------+  
| 50331648 |  
+----------+  
1 row in set (0.26 sec)  
memsql> trigger 3 gc flush;  

Benchmark

CPU Benchmark, since we are already on it..

Last insert can be used a write benchmark for the column store. Its NOT Storage/SSD benchmark, as insert is rather CPU than I/O heavy. Lest see

aws-rhel7test:

memsql> insert into Persons1 select id+row_number() over(), concat(LastName,row_number() over()), NOW() 
from Persons1 a; 
Query OK, 25165824 rows affected (3 min 55.24 sec) 
Records: 25165824 Duplicates: 0 Warnings: 0  

oci-centos7test:

memsql> insert into Persons1 select id+row_number() over(), concat(LastName,row_number() over()), NOW() 
from Persons1 a;  
Query OK, 25165824 rows affected (3 min 48.15 sec)  
Records: 25165824 Duplicates: 0 Warnings: 0  

oci-oel7test:

memsql> insert into Persons1 select id+row_number() over(), concat(LastName,row_number() over()), NOW() from Persons1 a;  
Query OK, 25165824 rows affected (3 min 8.76 sec)  
Records: 25165824 Duplicates: 0 Warnings: 0  

This is CPU used:

[ec2-user@aws-rhel7test ~]$ lscpu | grep "Model name"  
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz  
[opc@oci-centos7test ~]$ lscpu | grep "Model name"  
Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz  
[opc@oci-oel7test ~]$ lscpu | grep "Model name"  
Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz  

At time of test AWS instance has newer generation E5 series Broadwell 14nm model. But still under-performing Oracle OCI CentOS counterpart. (baremeral difference?).

And OCI UEK kernel perform way better. New linux kernel?
Will need to do kernel profiling and flame graphs to narrow it down. Will address it in different blog, This blog is about I/O benchmarking, not CPU. keep focused.

I/O benchmark

Now time to do I/O benchmark.

Its hard to accurately I/O benchmark by running specific MemSQL query, as columnar store processing is tight to other components as CPU and IPC. We use simple ruby script for columnar store benchmark provided by MemSQL, which basically just reads every file in columnar store. It is the easiest way to simulate columnar store access pattern in predictable and linear way. Script needed to be run after cleaning file system cache.

Confirm column store looks similar on all 3 instances:

[ec2-user@aws-rhel7test columns]$ du --max-depth=1 -h /var/lib/memsql/leaf-3307/data/  
276K ./snapshots  
4.4G ./columns  
4.6G ./logs  
8.9G .  
find /var/lib/memsql/leaf-3307/data/columns/ -type f | wc -l  
3240  
[opc@oci-centos7test columns]$ du --max-depth=1 -h /var/lib/memsql/leaf-3307/data/  
308K ../snapshots  
4.5G ../logs  
4.3G ../columns  
8.8G ..  
[opc@oci-oel7test ~]$ find /var/lib/memsql/leaf-3307/data/columns/ -type f | wc -l  
3240  
[opc@oci-oel7test columns]$ du --max-depth=1 -h /var/lib/memsql/leaf-3307/data/  
308K ../snapshots  
4.3G ../columns  
4.5G ../logs  
8.8G ..  
[opc@centos74test ~]$ find /var/lib/memsql/leaf-3307/data/columns/ -type f | wc -l  
3240  

Run test with 50 threads (so not to exceed core count on either instance):


[ec2-user@aws-rhel7test columns]$ for i in `seq 2`; do sudo echo 3 | sudo tee /proc/sys/vm/drop_caches; sudo -u memsql /usr/local/bin/ruby /tmp/measure_reads.rb 50; done
3
running with 50 threads…
    list files: user 0.010 system 0.050 total 0.060 real 0.288
    read files: user 12.970 system 74.340 total 88.310 real 121.874
directory entries: 11365
3
running with 50 threads…
    list files: user 0.000 system 0.050 total 0.050 real 0.278
    read files: user 12.570 system 75.460 total 86.030 real 119.873
directory entries: 11365

[opc@oci-centos7test columns]$ for i in `seq 2`; do sudo echo 3 | sudo tee /proc/sys/vm/drop_caches; sudo -u memsql /usr/local/bin/ruby /tmp/measure_reads.rb 50; done
3
running with 50 threads…
 list files: user 0.000 system 0.030 total 0.030 real 0.164
 read files: user 5.320 system 74.320 total 79.640 real 63.207
directory entries: 5156
3
running with 50 threads…
 list files: user 0.010 system 0.040 total 0.050 real 0.180
 read files: user 5.220 system 74.820 total 80.040 real 63.143
directory entries: 5156

[opc@oci-oel7test columns]$ for i in `seq 2`; do sudo echo 3 | sudo tee /proc/sys/vm/drop_caches; sudo -u memsql /usr/local/bin/ruby /tmp/measure_reads.rb 50; done
running with 50 threads…
 list files: user 0.000 system 0.020 total 0.020 real 0.144
 read files: user 3.120 system 49.790 total 52.910 real 63.201
directory entries: 5156
3
running with 50 threads…
 list files: user 0.000 system 0.020 total 0.020 real 0.144
 read files: user 3.700 system 48.960 total 52.660 real 63.234
directory entries: 5156
“real” for “read files” is where IO time is counted. There is a HUGE difference between AWS and Oracle Baramreal Cloud: 120 vs 60. Oracle NVMe SSD is about 2 times faster.

This is how iostat looked during the test (took average numbers):


[ec2-user@aws-rhel7test columns]$ iostat -x -k 1
avg-cpu: %user %nice %system %iowait %steal %idle
        0.29    0.00    0.73   16.59    0.00   82.38
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util xvda 0.00 0.00 89.33 0.00 3940.00 0.00 88.21 0.12 1.29 1.29 0.00 0.18 1.63 nvme0n1 0.00 0.00 13079.00 0.00 1301985.33 0.00 199.10 108.67 8.26 8.26 0.00 0.06 75.33 nvme1n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme2n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme3n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme5n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme4n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme6n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme7n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

[opc@oci-centos7test columns]$ iostat -x -k 1
avg-cpu: %user %nice %system %iowait %steal %idle
          0.67  0.00  1.48    28.34   0.00  69.51
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme2n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme3n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme0n1 0.00 0.00 28264.00 0.00 3478888.00 0.00 246.17 88.90 3.14 3.14 0.00 0.04 100.20 nvme1n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

[opc@oci-oel7test columns]$ iostat -x -k 1
avg-cpu: %user %nice %system %iowait %steal %idle
          0.33  0.00   0.91    50.95  0.00  47.81
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.00 29515.00 0.00 3553152.00 0.00 237.84 83.78 3.16 3.16 0.00 0.04 97.80 nvme3n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme2n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme1n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

So AWS was pushing 13k IOPS (r/s column) with 1.3GB/sec throughput (rkB/s column).
Oracle Baremetal NVMe SSD pushed 30k IOPS with 3.5GB/sec.
2+ times difference.
By iostat output can see what load is larger size I/O (100kB) with all systems hit limit on throughput.

Conclusion

On memsql use case Oracle barameral NVMe SSD are 2 times faster than AWS most performant counterpart - instance NVMe SSD store. We see better CPU performance on Oracle Cloud Infrastructure too.

Worth to note: Oracle SSD are persistent over reboots, while AWS instance ephemeral SSDs are wiped out on instance shutdown, so cluster implemented on AWS instance ephemeral storage is very hard to maintain. AWS instance store is more designed as temporarily storage. I just picked ephemeral storage, as it is the fastest one on AWS to compare.
On another side, Oracle NVMe SSD is designed as permanent storage.

Real apples-to-apples comparison would be to compare Oracle baremeral NVMe SSDs to AWS EBS Provisioned IOPS SSD (io1) Volumes. Most probably gap will be even more, as io1 is not local. But its subject for another blog.