CrimsonDB
A Self-Designing Key-Value Store

CrimsonDB: Zero Knobs, High Performance

Self-design

CrimsonDB decides autonomously how to change its core design to adjust to workload, hardware and other parameters. It can assume arbitrary shapes in between the core design classes of log, LSM-tree, B-tree, and their hybrids. It offers optimal reads and writes given a memory budget and an application workload.

Performance

We build CrimsonDB by mapping the whole possible design space of key-value stores. This allows us to discover new designs and optimizations. Through them we push read and write performance towards the optimal behavior, while at the same time discover the rules that govern automation.

What if?

CrimsonDB offers advanced system design exploration features such as allowing to reason about hardware properties and the potential benefit when adding new hardware or what kind of low level system design choices would bring a required performance property for a given workload.

Some of the technology is published in research papers. The Data Calculator paper in SIGMOD 2018 shows how we can synthesize more data structures than stars on the sky to pick the right one for a given problem. The Design continuums paper in CIDR 2019 shows how we can perceive all core NoSQL data structures as a single data structure! The Monkey SIGMOD 2017 and TODS 2018 papers show how to pick the optimal number of bits for each bloom filter in an LSM-tree and how to pick the optimal size ratio and merge policy. The Dostoevsky paper in SIGMOD 2018 offers better trade-offs between lookup costs, merge overheads, and storage space by identifying and removing superfluous merge operations in NoSQL systems.

Steps to be followed
1
Environment: Set the parameters for data, workload, and hardware.
2
Design: For the first three designs, set the input knobs of each. CrimsonDB does not require any input by default. CrimsonDB automatically suggests the optimal design configuration for a given environment.
3
Results: Check the graphical representation and the values of the metrics for each design.
1

data

workload

hardware


2
LSH-Table
Garbage Collection Threshold
Key Signature Size (bits)
Hash Bucket entries fraction
LSM-Tree
Leveling   Tiering  
Levels
Mbf (bits per entry)
Buffer size (MB)
Bε-tree
Buffer size (MB)
Levels
CrimsonDB
Memory footprint (GB)
Mbf (bits per entry)
Mfp (GB)
Buffer size (MB)
Hot merge threshold (K)
Cold merge threshold (Z)
Levels (L)
Growth Factor(T)

3

Write

Range Lookup

Point Lookup

Storage Space

Memory Footprint

Throughput



Publications



Niv Dayan, Stratos Idreos
The Log-Structured Merge-Bush & the Wacky Continuum.
In ACM SIGMOD International Conference on Management of Data , 2019

Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, Zichen Zhu
Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn
In Biennial Conference on Innovative Data Systems Research (CIDR), 2019

Niv Dayan, Manos Athanassoulis, Stratos Idreos
Optimal Bloom Filters and Adaptive Merging for LSM-Trees
In ACM Transactions on Database Systems , 2018

Niv Dayan, Stratos Idreos
Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging
[paper video]
In ACM SIGMOD International Conference on Management of Data , 2018

Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S. Kester, Demi Guo
The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models
[paper video]
In ACM SIGMOD International Conference on Management of Data , 2018

Niv Dayan, Manos Athanassoulis, Stratos Idreos
Monkey: Optimal Navigable Key-Value Store
[paper website] [paper video]
In ACM SIGMOD International Conference on Management of Data , 2017