CrimsonDB
A Self-Designing Key-Value Store

CrimsonDB: Zero Knobs, High Performance

Self-design

CrimsonDB decides autonomously how to change its core design to adjust to workload, hardware and other parameters. It can assume arbitrary shapes in between the core design classes of log, LSM-tree, B-tree, and their hybrids. It offers optimal reads and writes given a memory budget and an application workload.

Performance

We build CrimsonDB by mapping the whole possible design space of key-value stores. This allows us to discover new designs and optimizations. Through them we push read and write performance towards the optimal behavior, while at the same time discover the rules that govern automation.

What if?

CrimsonDB offers advanced system design exploration features such as allowing to reason about hardware properties and the potential benefit when adding new hardware or what kind of low level system design choices would bring a required performance property for a given workload.

The first major technical innovation is published in our SIGMOD 2017 Monkey paper which offers optimal Bloom filter allocation and a first set of what-if design and auto-tuning properties. The second major technical innovation of CrimsonDB is published in our SIGMOD 2018 Dostoevsky paper which offers better trade-offs between lookup costs, merge overheads, and storage space by identifying and removing superfluous merge operations. Finally, in our SIGMOD 2018 Data Calculator paper we present the first (rather) complete design space for key-value stores and learned cost models which enable what-if design.

Interactive key-value store design with CrimsonDB

We provide an interactive design demo which highlights the design exploration and auto-tuning capabilities of CrimsonDB. Users can explore arbitrary key-value store designs within the supported design continuum by manually choosing different design parameters for different workloads and environments. The system will 1) fine tune the final design (e.g., such as choosing the right bloom filter bits), 2) visualize the resulting structure, and 3) compute the achieved performance tradeoffs compared to a state-of-the-art key-value store. Users may also instruct CrimsonDB to auto-design from scratch to achieve the optimal performance for the given workload.


Environment Parameters

N: Entries
E: Entry size (bytes)
M: Total Main Memory Budget (MB)
B: Page size (entries)
F: Key size (bytes)

Workload

w: Proportion of Update Operation
c: Proportion of Long Range Lookup Operation
s: Entries of Long Range Query
r: Proportion of Non-result Point Lookup Operation
v: Proportion of Existig Point Lookup Operation
q: Proportion of Short Range Lookup Operation


T: Growth Factor
K: Bound on # FP-Runs
Z: Bound on # CFP-Runs
D: Max. Node Size
MF: Filtering Memory



-


-

Publications



Niv Dayan, Manos Athanassoulis, Stratos Idreos
Optimal Bloom Filters and Adaptive Merging for LSM-Trees
In ACM Transactions on Database Systems , 2018

Niv Dayan, Stratos Idreos
Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging
[paper video]
In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2018

Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S. Kester, Demi Guo
The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models
[paper video]
In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2018

Niv Dayan, Manos Athanassoulis, Stratos Idreos
Monkey: Optimal Navigable Key-Value Store
[paper website] [paper video]
In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2017