CrimsonDB: Zero Knobs, High Performance

Self-design

CrimsonDB decides autonomously how to change its core design to adjust to workload, hardware and other parameters. It can assume arbitrary shapes in between the core design classes of log, LSM-tree, B-tree, and their hybrids. It offers optimal reads and writes given a memory budget and an application workload.

Performance

We build CrimsonDB by mapping the whole possible design space of key-value stores. This allows us to discover new designs and optimizations. Through them we push read and write performance towards the optimal behavior, while at the same time discover the rules that govern automation.

What if?

CrimsonDB offers advanced system design exploration features such as allowing to reason about hardware properties and the potential benefit when adding new hardware or what kind of low level system design choices would bring a required performance property for a given workload.

Some of the technology is published in research papers. The Data Calculator paper in SIGMOD 2018 shows how we can synthesize more data structures than stars on the sky to pick the right one for a given problem. The Design continuums paper in CIDR 2019 shows how we can perceive all core NoSQL data structures as a single data structure! The Monkey SIGMOD 2017 and TODS 2018 papers show how to pick the optimal number of bits for each bloom filter in an LSM-tree and how to pick the optimal size ratio and merge policy. The Dostoevsky paper in SIGMOD 2018 offers better trade-offs between lookup costs, merge overheads, and storage space by identifying and removing superfluous merge operations in NoSQL systems. Finally, the Cosine paper in VLDB 2022 takes a first step to show that given a workload and a budget, how we can self-design the perfect storage-engine on cloud that optimizes cloud-cost and performance.

The online demonstration of the vision of CrimsonDB can be found here.

Publications

Subarna Chatterjee, Meena Jagadeesan, Wilson Qin, Stratos Idreos
Cosine: A Cloud-Cost Optimized Self-Designing Key-Value Storage Engine.
In Proceedings of the Very Large Databases Endowment , 2022

Stratos Idreos, Mark Callaghan
Key-Value Storage Engines.
In ACM SIGMOD International Conference on Management of Data , 2020

Stratos Idreos et al.
Learning Data Structure Alchemy.
In Bulletin of the IEEE Computer Society Technical Committee on Data Engineering , 2019

Stratos Idreos, Tim Kraska
From Auto-tuning One Size Fits All to Self-designed and Learned Data-intensive Systems.
In ACM SIGMOD International Conference on Management of Data , 2019

Niv Dayan, Stratos Idreos
The Log-Structured Merge-Bush & the Wacky Continuum.
In ACM SIGMOD International Conference on Management of Data , 2019

Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, Zichen Zhu
Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn
In Biennial Conference on Innovative Data Systems Research (CIDR), 2019

Niv Dayan, Manos Athanassoulis, Stratos Idreos
Optimal Bloom Filters and Adaptive Merging for LSM-Trees
In ACM Transactions on Database Systems , 2018

Niv Dayan, Stratos Idreos
Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging
[paper video]
In ACM SIGMOD International Conference on Management of Data , 2018

Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S. Kester, Demi Guo
The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models
[paper video]
In ACM SIGMOD International Conference on Management of Data , 2018

Niv Dayan, Manos Athanassoulis, Stratos Idreos
Monkey: Optimal Navigable Key-Value Store
[paper website] [paper video]
In ACM SIGMOD International Conference on Management of Data , 2017