PART 7

In the last few blogs we examined a variety of different NoSQL databases and considered some of their appropriate domains and use cases. In this edition we conclude our multi-part database blog series by considering Graph database offerings from different vendors. This is done in an attempt to highlight some of the key differences between otherwise similar technologies. We also consider the differences in performance and what costs are associated to running each database on premises or in the cloud, so that it may become clearer and easier to recognize the database that best suits your needs.

5. NoSQL - Graph Databases

Name ArangoDB Neo4j OrientDB
Description A multi-model DBMS Open source graph database Multi-model DBMS
Primary DB Model Graph DBMS Graph DBMS Graph DBMS
Additional DB Models Document Store
Key-Value Store
None Document Store
Key-Value Store
Popularity Ranking (DBs Overall) #63 #22 #49
Popularity Ranking (in Graph Stores) #5 #1 #4
Developer Arango GmbH Neo4j, Inc. OrientDB Ltd.
Initial Release 2012 2007 2010
Current Release 3.2.4, September 2017 3.3.3, February 2018 2.2.22, June 2017
License Open Source Open Source Open Source
Cloud-Based No No No
Implementation Language C, C++, JavaScript Java, Scala Java
Server Operating Systems Linux
OS X
Raspbian
Solaris
Windows
Linux
OS X
Solaris
Windows
All OS with Java JDK
Data Scheme Schema-free Schema-free and schema-optional Schema-free
Typing Yes Yes Yes
XML Support Not Standard, need JSON translator Not Standard, need JSON translator No
Secondary Indexes Yes Yes Yes
SQL No No SQL-like Query Language, No joins
APIs / Access Methods HTTP API
JSON-Style Queries
Cypher Query Language
Java API
Neo4j-OGM
RESTful HTTP API
Spring Data Neo4j
TinkerPop 3
Java API
RESTful HTTP/JSON API
TinkerPop Stack (with Blueprints, Gremlin, Pipes)
Supported Programming Languages C#
Clojure
Java
JavaScript (Node.js)
PHP
Python
Ruby
.Net
Clojure
Elixir
Go
Groovy
Haskell
Java
JavaScript
Perl
PHP
Python
Ruby
Scala
.Net
C
C#
C++
Clojure
Java
JavaScript
JavaScript (Node.js)
PHP
Python
Ruby
Scala
Server-Side Scripts JavaScript Yes, User defined procedures and functions JavaScript
Triggers No Yes Hooks
Partitioning Methods Sharding None Sharding
Replication Methods Master-Slave Replication Causal Clustering using Raft Protocol Master-Master Replication
MapReduce No (but can be done with procedures stored in JavaScript) No No (but could be achieved with distributed queries)
Consistency Concepts Eventual Consistency
Immediate Consistency
Causal and Eventual Consistency (configurable in causal cluster setup)
Immediate consistency (in stand-alone mode)
Eventual Consistency
Foreign Keys No Yes (relationships in graphs) Yes (relationships in graphs)
Transaction Concepts ACID ACID ACID
Concurrency Yes Yes Yes
Durability Yes Yes Yes
In-Memory Capabilities Yes No Yes
User Concepts Access Control Lists (ACL) per each Arango Server Users, roles and permissions, Pluggable authentication with supported standards (LDAP, Active Directory, Kerberos) Access rights for users and roles, record level security configurable

Distinguishing Features

Graph Databases have gained increasing popularity in recent years due to their ability to model complex relationships between objects. At a first glance the three open source databases compared here look quite similar, but when we looked closer it became clear why Neo4j ranks first, as the most popular graph database. All three databases support a variety of operating systems including Linux, OS X and Windows, but only Neo4j offers both schema-based and schema-free data structuring.

For ArangoDB and Neo4j, XML support can be enabled by using a JSON translator whilst OrientDB does not support either making it probably not the best option if you have a lot of XML files. However, OrientDB is the only database compared here which supports SQL-like queries, and together with Neo4j also support a lot more APIs and a wider variety of programming languages than ArangoDB. Neo4j is the only database which allows the user to define procedures and functions for server side scripts whereas the other two databases are limited to standard JavaScript. Neo4j also uses triggers where OrientDB only uses hooks.

However, Neo4j does not offer sharding or any other partitioning methods like the other two databases and it uses a proprietary Raft protocol to replicate data into causal clusters, rather than master-master or master-slave replication. This also results in the database exhibiting causal and eventual consistency across clusters when configured to replicate, and immediate consistency only in the stand-alone mode.

All three databases have ACID transactions but only Neo4j and OrientDB allow for foreign keys to be used, for example the relationships in the graphs. Finally, only ArangoDB and OrientDB have in-memory capabilities for fast information reads, but Neo4j offers pluggable authentication with supported standards such as LDAP, Active Directory or Kerberos, making it easier for corporations to manage access controls with configurations they already have.

Performance

OrientDB and ArangoDB are both native multi-model DBs whereas Neo4j is strictly a graph database. This considered it would be expected that Neo4j has been optimized for graph specific operations such as shortest path and neighbors second, and thus outperform the other two databases. What we found however, reflected a completely different picture.

For single reads ArangoDB clearly had higher throughput than Neoj4 or OrientDB, and the difference between them was fairly linear with Neo4j coming in second with 50% less throughput than ArangoDB and OrientDB 50% less than Neo4j. In terms of single writes the difference was minimal with ArangoDB performing ever so slightly better than OrientDB. For single write synchronizations ArangoDB also wins over Neo4j with 50% higher throughput.

In terms of aggregation ArangoDB clearly takes the lead outperforming Neo4j with nearly twice as much throughput. Unfortunately the figures for OrientDB were disappointing, as the database was found to be taking 25x as long as ArangoDB to aggregate over a single collection (for example, computing the age distribution for everyone in the sample network from SNAP’s Pokec).

For specific graph operations such as shortest path analysis, Neo4j scored a win over ArangoDb and OrientDB when ArangoDB used its standard storage engine configuration “MMFiles”, with ArangoDB taking 3x and OrientDB 12x as long to complete the op. However, when ArangoDB was used with the new and improved “DBRocks” storage engine, then it outperformed all other databases by a mile. When searching for distinct and direct neighbors (highly related data profiles), ArangoDB comes in first with Neo4j twice as slow and OrientDB approximately 6x as slow. An attempt to locate neighbors of neighbors with their contained data resulted in a very similar picture with Neo4j having half as much throughput as ArangoDB and OrientDB half that again.

Finally, in terms of memory usage Neo4j used 2.5x times as much memory as ArangoDB, or OrientDB. Although a very minimal difference, ArangoDB used 7% less memory on average than OrientDB. However the results for performance as a whole clearly favor ArangoDB, with Neo4j in second and OrientDB in a distant third place.

Cost

The cost aspect of this comparison is rather simple. ArangoDB Community is free, and the Basic and Enterprise versions are paid. ArangoDB Basic offers a little more functionality than Community and a basic SLA with 9x5 support, and comes in at around $15,000 per year. The full fledged multi-modal functionality with all the interesting features like S2S replication, satellite collections, smartgraphs and additional security comes with the Enterprise addition starting at $36,000 per year in licensing fees alone. There are different subscriptions for support and varying tiers of SLA some of which are included and some of which come at a surcharge. Its a big price tag but from what we have seen it promises what it delivers.

Neo4j also offers a community version for free under a GPL v3 License, and has 4 different versions under its paid license tier; Commercial, Developer, Evaluation and Educational License. The commercial licenses go from anywhere between $299 per month to $599 per month depending on the level of SLA chosen (standard, premium, or enterprise). The pricing above reflects hosting a 4GB DB on 2 cores with 32GB SSD on AWS, and naturally prices increase as more memory and cores are added. Some minor discounts can be achieved by hosting on Azure (approx. -10%) or GCP (approx. -20%)

OrientDB has a free community edition and a paid Enterprise edition which comes in at $3,125 annually per core for production/live environments and $1,600 per core per year for Test/Dev environments. There is an initial minimum purchase of 6 cores and support at a flat rate fee of $15,000 per year (for an unlimited number of cores). Assuming 4 production cores and 2 Test/dev cores, plus support, this brings the initial setup fees to start with OrientDB on an enterprise level to a grand total of $30,700, which is approximately $2,558 per month, and quite a lot for the performance it offers.

Conclusion

There are many different reasons to choose a certain type of database solution, but what we have done here is looked at some of the key features of some of the most common offerings on the market and examined the differences between them. We do this in an attempt to give a better picture of what is available and what is appropriate in which use case. While this blog series highlights some of the aspects we consider to examine the needs of our clients before making any recommendations on a Big Data solution, we acknowledge that there are many other factors that can be considered as important for each use case.

One high level approach we have found to be useful when deciding between tradeoffs of different solutions is the application of the CAP theorem. Based on availability and consistency needs of the client it can become clear if a big data or a relational database solution is more appropriate. Furthermore, the amount of writes, and the type of queries should be considered to determine if range-based queries are needed or if fast writes are needed. Answering these questions can help navigate the many different options that are out there to come up with a solution that is right for your specific needs.

To conclude this multi-part blog series, we would like to note that the views expressed in these blogs represent solely our opinions and experiences over the years, and we recommend to developers and architects that they always test their own use cases before committing to any technology.

**DISCLAIMER**
Whilst we are avid technology geeks ourselves and love the nitty-gritty lugs and bolts, kernel profiling and digging through stack traces, we also recognize the need for a higher-level, more digestible approach to understanding the cloud computing landscape. From this origin and perceived need the AVM Consulting Business Blog series has a slightly different tone, aimed at business or management professionals and decision makers. We hope that this series of cloud business blogs will provide valuable information and new insights into the otherwise highly technical and rapidly changing cloud environment. Lastly, it is important to note that the views expressed in these blogs merely represent the opinions, perspectives, and point of view of AVM Consulting, and although some of the findings are based on facts, the meat of the content is purely subjective and open to interpretation. This is what we think, do what you will with this information.

REFERENCES

  1. https://docs.arangodb.com/2.8/cookbook/JavaDriverXmlData.html
  2. https://neo4j.com/blog/bulk-data-import-neo4j-3-0/
  3. https://neo4j.com/developer/graph-database/
  4. https://orientdb.com/docs/last/
  5. https://aws.amazon.com/documentation/elasticache/
  6. https://www.arangodb.com/2018/02/nosql-performance-benchmark-2018-mongodb-postgresql-orientdb-neo4j-arangodb/
  7. https://www.arangodb.com/why-arangodb/arangodb-vs-neo4j/
  8. https://orientdb.com/support/
  9. https://www.graphstory.com/pricing
  10. https://en.wikipedia.org/wiki/CAP_theorem
  11. https://orientdb.com/orientdb-vs-neo4j/
  12. https://www.quora.com/How-does-ArangoDB-compare-to-OrientDB