A rewrite of Cassandra in C++, Scylla runs much faster and cheaper than the Java-based original Credit: Toa55 / Getty Images Imagine rewriting Cassandra from Java to C++. Cassandra is already one of the most highly available NoSQL databases, although its maximum latency under load can run on the high side, because the Java VM needs to garbage collect global memory (GC) and Cassandra needs to compact its SSTables, both at what are often inopportune times. InfoWorld People try to get around the inconsistent latency problem by combining Cassandra with Memcached or Redis. So while you’re doing the rewrite, give the new database its own cache, and allow full-scan operations to bypass the cache to avoid flushing it. Now imagine making every significant I/O operation in the new database asynchronous, to eliminate waits and spin locks. While you’re worrying about I/O, give the database its own I/O scheduler and load balancer. Finally, introduce a shard-per-core architecture and auto-tuning. Now you’ve got Scylla. Scylla has additional capabilities beyond Cassandra: materialized views, global and local secondary indexes, workload prioritization, and a DynamoDB-compatible API. The DynamoDB API is in addition to CQL (Cassandra Query Language) and a Cassandra-compatible API. Scylla also lacks a few capabilities available in DataStax Enterprise, the commercial version of Cassandra, such as the integrated graph database DSE Graph. (Janus Graph, which forked from TitanDB when DataStax took over the latter, can use Scylla as its data store, so the lack of a Scylla graph component isn’t as crucial as it might be.) Scylla boasts single-digit millisecond p99 latencies and millions of operations per second per node. Those two characteristics translate to needing fewer nodes (by a significant factor) than Cassandra. The shard-per-core architecture means that Scylla can take full advantage of multi-core CPUs and multi-CPU servers, allowing Scylla to run well on Amazon i3 and i3en high-I/O bare metal (36 core) instances, while Cassandra does better on smaller 4xlarge (eight core) instances. Scylla architecture Scylla adopted most of its scale-out architecture from Cassandra. The design of Cassandra combines the partitioning and replication of the Amazon Dynamo key-value store with the log-structured column family data model of Google Bigtable. Cassandra and Scylla scale linearly as you add nodes. Scylla/Cassandra clusters are collections of nodes, organized into a ring. Clusters may have nodes in multiple data centers (DCs). In terms of the CAP theorem, Scylla (like Cassandra) favors availability over consistency during a network partition. Keyspaces are collections of tables; replication factors are set at the keyspace level. Tables are collections of columns and rows. A partition is a subset of data that is stored on a node and replicated across nodes; it is represented by a partition key. Scylla uses a virtual node (Vnode) architecture. Physical nodes may be assigned multiple Vnodes, which need not be contiguous. Scylla automatically replicates data according to a user-selected replication strategy. The replication factor should be at least three to guarantee that a quorum will exist and the data will still be accessible to a read with quorum consistency if the node containing one copy goes down. The consistency level determines how many replicas in a cluster must acknowledge read or write operations before they are considered successful. Some of the most common consistency levels used are ANY, QUORUM, ONE, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM, and ALL. If you had geographically distributed data centers, you might read using the LOCAL_QUORUM consistency level for performance reasons, at the possible cost of missing the latest updates from the remote DCs. Like Cassandra, Scylla uses the Sorted Strings Table (SSTable) as its persistent file format. SSTables need periodic compaction to maintain performance, and Scylla has four strategies for doing so: size-tiered, leveled, time-window, and date-tiered (now deprecated in favor of time-window). Exactly which compaction strategy will give you the best performance depends on your workload. In Cassandra, SSTable compaction will often cause a large bump in latency when it occurs. In Scylla, compaction occurs in the background and has a much smaller effect on latency. A Scylla deployment optionally includes a monitoring stack (Prometheus to collect and store metrics, Alertmanager to handle alerts, and Grafana to display the dashboard) and Scylla Manager (cluster administration) in addition to the Scylla cluster. Scylla deployment options You can run Scylla on top of Docker, CentOS, RHEL, Ubuntu, or Debian. If you choose to run Scylla Enterprise on AWS, you can use a pre-built AMI for your chosen region. These AMIs are tuned for i3 and i3en instances, but you can run scylla_io_setup if you wish to use a different kind of instance. You can install Scylla open source or Scylla Enterprise either on-premises or in a cloud of your choice. You can also create a cluster in the Scylla Cloud, a fully managed database as a service, as shown in the screenshots below. Currently Scylla Cloud only runs on AWS. IDG The first step in creating a cluster in Scylla Cloud is to name it and decide on whether you want the Cassandra API or the DynamoDB API. VPC Peering allows another AWS virtual private cloud, for example one running your application, to use your Scylla VPC efficiently. IDG The second step in creating a Scylla Cloud cluster is to choose your instance size, your data replication factor, and the number of nodes you want. You can always add more nodes later if you need them. IDG The final step in creating a Scylla Cloud cluster is to launch it. Note that the estimated cost is shown right on the launch button. Scylla case studies and benchmarks Scylla has done a number of benchmarks against competing databases. That typically is not an easy thing to get right, but Scylla has explained the issues they encountered fairly well. In addition, Scylla has several customer case studies to tout, the most impressive of which is from Comcast. Comcast At the 2019 Scylla Summit, Philip Zimich gave a 20-minute talk on Comcast’s transition from Cassandra to Scylla for the X1 DVR platform. Comcast was able to replace 962 m4.2xlarge EC2 nodes of Cassandra with 78 i3.4xlarge and i3.8xlarge nodes of Scylla, for a total savings of 53 percent. Note that Cassandra is unable to make full use of all the cores in i3 instances because of the thread scalability limit in the Java VM, while Scylla can use as many cores as you give it. IDG Comcast was able to dramatically reduce the number of nodes needed to implement its X1 recording platform back-end by switching from Cassandra to Scylla, and from AWS m4 instances to i3 instances. The overall savings was 53 percent. Scylla 2.2 vs Cassandra 3.11 A benchmark that Scylla ran comparing a four-node Scylla cluster running on i3.metal instances with a 40-node Cassandra cluster running on i3.4xlarge instances helps to clarify why the Comcast migration achieved such a large reduction in nodes and costs. Also note that the four-node cluster has a tenth of the probability of a concurrent double failure of the 40-node cluster in a two-year period, while the 40-node cluster with smaller instances costs 2.5x the four-node cluster with larger instances. Scylla Scylla benchmarked a four-node i3.metal Scylla cluster against a 40-node i3.4xlarge Cassandra cluster using cassandra-stress loads. The chart above shows the configurations. Scylla Benchmark test results for Scylla Open Source 2.2 (four i3 nodes) vs Apache Cassandra 3.11 (40 m4 nodes). The SLA specification was 10 ms write latency; Cassandra exceeded that significantly at the 99.9 percent level even at the lowest of the three loads tested. Scylla Scylla latency test (300K OPS): Mixed 50 percent write/read workload (consistency level=Quorum). Each node is i3.metal. The spikes in CPU load (top) and latency (bottom) correspond to SSTable compactions (middle). Scylla Cloud vs. Amazon DynamoDB Scylla benchmarked Scylla Cloud against Amazon DynamoDB. For a specific case involving a “hot partition,” Scylla outperformed DynamoDB by a factor of 20. More generally, Scylla Cloud was significantly cheaper than DynamoDB as shown in the figure below. Scylla According to Scylla’s tests, Scylla Cloud costs much less than Amazon DynamoDB and also delivers superior performance. Scylla Cloud vs. Google Cloud Bigtable Similarly, Scylla benchmarked Scylla Cloud against Google Cloud Bigtable. Again, Scylla exhibited better latency at much lower cost. Scylla In this benchmark, Scylla Cloud was able to meet the SLA of 90 kOPS with latency under 10 ms for 95 percent of the requests with one node per zone. Google Cloud Bigtable required 12 nodes per zone and was much more expensive. Learning Scylla I used Docker on my iMac to follow the free tutorials in Scylla University. I didn’t encounter any issues, and the performance of the Scylla database was noticeably better than Cassandra or DataStax Enterprise run in the same environment. Martins-iMac:~ mheller$ docker run --name scyllaU -d scylladb/scylla:3.0.10 Unable to find image 'scylladb/scylla:3.0.10' locally 3.0.10: Pulling from scylladb/scylla 8ba884070f61: Pull complete cd4f8f8c60fc: Pull complete 2747a5fb8f41: Pull complete 07583ab71a18: Pull complete 5fcac9cdadf6: Pull complete c690c84c7597: Pull complete 63ea31381ef0: Pull complete 551655fd09ec: Pull complete a7efd0f525b1: Pull complete ba3549fdb516: Pull complete a6c1be1d6b52: Pull complete 76fef7b03810: Pull complete 26114236ac85: Pull complete 402cb8658fe9: Pull complete Digest: sha256:e7f861e62f363f9080af9369ef2831039d8aeb1d6a8c3d463824831762d37f26 Status: Downloaded newer image for scylladb/scylla:3.0.10 b08c289fe6e5d55b178bb342391540a942c9bc1aa27206f3e23c718fdb69c23f Martins-iMac:~ mheller$ docker exec -it scyllaU nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 172.17.0.2 458.45 KB 256 ? d1c04d54-4da1-46be-9f2f-e167cd1d6e95 rack1 Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless Martins-iMac:~ mheller$ docker exec -it scyllaU cqlsh Connected to at 172.17.0.2:9042. [cqlsh 5.0.1 | Cassandra 3.0.8 | CQL spec 3.3.1 | Native protocol v4] Use HELP for help. cqlsh> CREATE KEYSPACE mykeyspace WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1}; cqlsh> use mykeyspace; cqlsh:mykeyspace> CREATE TABLE users ( user_id int, fname text, lname text, PRIMARY KEY((user_id))); cqlsh:mykeyspace> cqlsh:mykeyspace> insert into users(user_id, fname, lname) values (1, 'rick', 'sanchez'); cqlsh:mykeyspace> insert into users(user_id, fname, lname) values (4, 'rust', 'cohle'); cqlsh:mykeyspace> select * from users; user_id | fname | lname ---------+-------+--------- 1 | rick | sanchez 4 | rust | cohle (2 rows) cqlsh:mykeyspace> The session above represents the first lesson. I went on to follow more lessons and take some quizzes, but I found no deviances from the tutorials. Scylla stands apart Overall, Scylla is a very impressive NoSQL database. While rewriting a database (Cassandra) from Java to C++ seems like an obvious thing to do to achieve better scalability and more consistent latency, Scylla has additional optimizations, such as self-tuning. It’s the rare product that exceeds my expectations. Whether Scylla will serve your application’s needs is a complicated question. I’d recommend following the rubric I laid out in “How to choose a database for your application”: Start with your requirements and use those as a sieve to eliminate the databases that won’t work for you. If Scylla makes your short list, then spend the time to perform a proof of concept. — Cost: Open source: Free for unlimited nodes, but Scylla Manager is limited to five nodes. Enterprise: Contact Scylla. Cloud: $191 to $17,520 per server per month, depending on server size. Minimum three servers. Platform: Docker, AWS, RHEL 7, CentOS 7, Debian, Ubuntu, VirtualBox.