In the last blog we considered different Wide Column Store databases and examined some facets of performance and associated costs. In this blog we look at another type of NoSQL databases, the much simpler Key Value Stores. Offerings from different vendors and examined in an attempt to highlight some of the key differences between otherwise similar technologies. We also consider the differences in performance and what costs are associated to running each database on premises or in the cloud, so that it may become clearer and easier to recognize the database that best suits your needs.
4. NoSQL - Key Value Stores
|Description||Widely adopted Java in-memory data grid||In-Memory key-value store originally intended for caching||In-memory data store used as database, cache, and message broker|
|Primary DB Model||Key-Value Store||Key-Value Store||Key-Value Store|
|Additional DB Models||None||None||Document Store
Time Series DBMS
|Popularity Ranking (DBs Overall)||#41||#23||#9|
|Popularity Ranking (in Key-Value Stores)||#5||#3||#1|
|Developer||Hazelcast||Danga Interactive||Salvatore Sanfilippo|
|Current Release||3.9.2, January 2018||1.5.6, February 2018||4.0.9, March 2018|
|License||Open Source||Open Source||Open Source|
|Server Operating Systems||All OS with Java VM||FreeBSD
|SQL||SQL-like Query Language||No||No|
|APIs / Access Methods||JCache
RESTful HTTP API
|Proprietary Protocol||Proprietary Protocol|
|Supported Programming Languages||.Net
|Replication Methods||Yes, Replicated Map||None||Master-Slave Replication
|Consistency Concepts||Immediate Consistency||Eventual Consistency||Eventual Consistency|
|Transaction Concepts||ACID||No||Optimistic locking, atomic execution of command blocks and scripts|
|User Concepts||Access rights per client and object definable||Using SASL (Simple Authentication Security Layer)||Simple Password-based access control|
Redis is the world’s fastest database which makes it no wonder that it is ranked #1 as the most popular key-value database on the market today. In contrast to Memcached and Hazelcast, Redis also offers multiple data models including document store, graph database and time series DBMS, giving the user added flexibility and the choice of ingesting multiple different data types.
Hazelcast supports all operating systems with a Java VM which Memcached and Reid support BSD, Linux, OS X and Windows setups. All three compared databases are schema free but only Hazelcast offers full data typing. Redis only offers partial typing and Memcached has no typing at all. Also, Hazelcast is the only database that supports SQL-like queries to be run and it supports various access methods including JCache, JPA and a RESTful HTTP API, whereas Redis and Memcached only support a proprietary protocol.
All three of the databases store data in key-value formats reducing the data complexity and use in-memory storage as standard, making them incredibly fast. Only Redis has server side scripts in Lua, where Hazelcast is the only database that uses triggers. Hazelcast is also the only database uniquely replicating through what they call a Replicated Map. A Replicated Map does not partition data, nor does it spread data to different cluster members. Instead, it replicates the data to all members. This is different from the standard Master-Slave or Master-Master replication offered by Redis, and a lot more than Memcached which offers no replication methods at all.
Finally, Hazelcast and Redis both offer sharding as a way to partition the data, but only Hazelcast has immediate consistency versus the eventual consistency model offered by Redis and Memcached. And while Redis supports the greatest number of programming languages, making integration with development easier, only Hazelcast offers ACID transactions for data operations requiring guaranteed validity.
In a Hazelcast grid, data is distributed among the nodes or as we call them “members” of a computer cluster, allowing for horizontal scaling both in terms of available storage space and processing power. Backups are also distributed in a similar fashion to other members, based on configuration, thereby protecting against single member failure. Memcached clusters are comprised of 1 to 20 nodes. Scaling a Memcached cluster is as easy as adding or removing nodes from the cache cluster. Each Memcached Node is independent to one another and shares nothing.
Redis, Hazelcast and Memcached keep all data in RAM, which of course makes them supremely useful as a caching layer. However, Redis does not offer multi-threaded processing the same way Memcached and Hazelcast do. This may explain why for read heavy and balanced workloads in single node mode, Memcached tends to outperform Redis significantly, while Hazelcast stays right in the middle. However, when there is only a single concurrent client or low number of threads Redis demonstrated significantly higher throughput than the other two databases. Interestingly, for write heavy workloads in single node mode Hazelcast also tends to outperform Memcache, with Redis in close second.
In terms of read latency, single node Memcache also has significantly lower latency for read-heavy and balanced work loads. Whilst for Memcache also has lower read latency for write heavy workloads, the difference between the databases is less significant here and at the tail, for upwards of 24 clients Hazelcast and Redis are even faster than Memcache. Memcache also outperforms the other databases in terms of write latency, except for with write heavy workloads, here Hazelcast clearly takes the lead.
In cluster mode much of the same picture emerges; Memcached has the highest throughput for read-heavy and balanced workloads, whilst Redis triumphs in write heavy loads. Redis also triumphs with low workloads until the concurrency of requestors increases at which point Hazelcast shines. Memcached read and write latency are significantly lower than the other databases for read heavy and balanced loads, while Redis had the lowest latency for write heavy loads.
All three of the database solutions are open source and thus free to use except the cost of cloud resources such as EC2 instances or other VMs. However, through the board, regardless of whether it was read heavy, write heavy, or balanced workloads, Redis consumed a lot more memory than the other databases, making it the most expensive to run in any cloud hosted environment.
Although in essence open source software should be free, the open source versions of these database solutions generally only support a very small amount of data hosting for free and charges vary after that. Redis for example charges $338 per month for a 5GB standard database hosted on any major cloud (cache for 5GB is only $105 per month), but also offers a pay as you go option which has a base price of $338 per month plus usage. Only the pay as you go option offers a Multicore Redis.
Hazelcast does it slightly differently, having three different editions; Professional Support, Enterprise and Enterprise HD, ascending in price respectively and offering more functionality and a greater amount of features with each tier. Hazelcast also charges a flat fee per Hazelcast node or JVM (regardless of size), so it doesn’t penalize the user for running larger instances, and users can run multiple instances within one JVM counted as only one node.
Whilst Memcached is a fully free BSD license that does not incur costs when deployed on premises, users still have to consider the cost of hosting the database software if they are looking to implement a cloud solution. A common solution we’ve come across is ElastiCache (from AWS) which although reliable and easy to implement is quite costly.
Prices are calculated by the size of the EC2 instance, so the more memory and the larger the instances within the cluster the more expensive the setup becomes. For example, a current generation cache.r3.large memory optimized cache node goes for $0.228 per hour which works out to approximately $166 per month. Assuming cluster mode with automated failover is desired over single node architecture, running Memcached on AWS comes down to approximately $332 per month for a two node cluster.
Conclusively, we find that each database offering has its own unique strengths and weaknesses. In terms of performance, single node Memcached has the best latency for read and balanced work loads, but for more than 24 clients, Hazelcast and Redis are significantly faster. For write heavy workloads Hazelcast clearly takes the lead with Redis in close second. In terms of throughput, Memcache and Hazelcast outperform Redis partially explained by the lack of multi-threaded processing.
In terms of cost all three databases have comparable offerings, yet there are differences depending on the setup you want to run. Redis is priced as the most expensive DB with $338 for 5GB, and only pay as you go option (additional fees for usage) offers a multicore option. Hazelcast has 3 different price tiers, ascending in price and the amount of features included, but it has the great advantage of charging a flat fee per node regardless of size, so if you want to run multiple instances counted as a single node this may be your most cost-effective option. Finally, Memcached is a fully free BSD licensed technology that only incurs the usage costs of the cloud you run it on. For larger, high performant setups we recommend Oracle Cloud (OCI) as the most cost effective and reliant infrastructure.
In any case, selecting the best database for you will depend on your use case. Therefore, we recommend you always test different technologies side by side and find the one that suits you best before committing to any one technology.
Whilst we are avid technology geeks ourselves and love the nitty-gritty lugs and bolts, kernel profiling and digging through stack traces, we also recognize the need for a higher-level, more digestible approach to understanding the cloud computing landscape. From this origin and perceived need the AVM Consulting Business Blog series has a slightly different tone, aimed at business or management professionals and decision makers. We hope that this series of cloud business blogs will provide valuable information and new insights into the otherwise highly technical and rapidly changing cloud environment. Lastly, it is important to note that the views expressed in these blogs merely represent the opinions, perspectives, and point of view of AVM Consulting, and although some of the findings are based on facts, the meat of the content is purely subjective and open to interpretation. This is what we think, do what you will with this information.