Self-designing Data Systems

Vast Design Space

A half-century of database systems research yielded numerous data system architectures, each optimized for a specific set of applications. Designed along multiple dimensions, such as data layouts, storage architectures or recovery strategies, application architects and software developers are faced with a plethora of different feature sets and design options to choose from. This vast design space is still growing as changes in hardware and applications introduce new concerns that warrant new techniques.

Custom-tailored Designs vs. Limited Resources

Today, matching a scientific or commercial application with its perfect data system is a time-consuming task that not only requires expertise in the area of databases, but also a willingness to compromise. Often, off-the-shelf solutions will only provide suboptimal performance. However, building a custom-tailored system for the task at hand is an expensive endeavor. Modifying an existing system under today’s monolithic implementations is extremely complex, while designing and building a new data system from scratch requires expertise and tens of man-years worth of time.

Rather than chasing changes in workload and hardware by continually designing and implementing new systems from scratch, or forcing end-users to settle for suboptimal solutions, we envision self-designing data systems that smoothly and autonomously navigate the design space to quickly generate the optimal solution for a given application. Self-designing data systems would relieve both system designers and end-users of data management headaches, culminating in greater productivity. Moreover, a self-designing system may discover new architectures that researchers would have never even considered by synthesizing new solutions out of existing ones, mimicking the natural process data system architects are performing manually.

We are building an infrastructure that allows for design exploration and visualization of core systems components. Designers can quickly and interactively design core system components; they can easily combine design options, try out alternative designs at a fine granularity, get instant feedback on the impact of their design decisions, ask what-if design questions, get suggestions about good and bad designs, and even semi-automate the process of discovering entirely new and previously unexplored designs, that is, doing research.

Data Management and Interactive Demo

Categorization of Data Structure Design Decisions

We have summarized the data for the design space of data structures into the periodic table of data structures which categorizes design decisions: not only how they manifestate in existing designs but also how they may be combined to create new and so far unknown designs.

Interactive Demo for Exploring Design Space of Key-Value Data Structures

We provide an interactive demo for users to both explore the possible design space of key-value data structures and interactively generate the data (expected performance properties of a design given a workload and hardware) in a matter of seconds.

Source code: git@bitbucket.org:HarvardDASlab/data-calculator.git
An initial set of examples to use the Data Calculator can be found here.
Detailed examples of the data model, and input/output examples can be found in the Data Calculator Technical Report .

Publications

2019

M. Athanassoulis, K. S. Bøgh, and S. Idreos, Optimal Column Layout for Hybrid Workloads
In Proceedings of the Very Large Databases Endowment, 2019.
N. Dayan and S. Idreos, The Log-Structured Merge-Bush & the Wacky Continuum
In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2019.
S. Idreos and T. Kraska, From Auto-tuning One Size Fits All to Self-designed and Learned Data-intensive System
In ACM SIGMOD International Conference on Management of Data, 2019.
S. Idreos, et al., Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn
In Biennial Conference on Innovative Data Systems Research (CIDR), 2019.

2018

N. Dayan, M. Athanassoulis, and S. Idreos
Optimal Bloom Filters and Adaptive Merging for LSM-Trees
In ACM Transactions on Database Systems, 2018.
S. Idreos, K. Zoumpatianos, B. Hentschel, M. S. Kester, and D. Guo
The Data Calculator: Data Structure Design and Cost Synthesis From First Principles, and Learned Cost Models
In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2018.
N. Dayan and S. Idreos
Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging
In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2018.
B. Hentschel, M. S. Kester, and S. Idreos
Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation
In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2018.

2017

Michael S. Kester, Manos Athanassoulis, Stratos Idreos
Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?
[paper website] [paper video]
In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2017.
Niv Dayan, Manos Athanassoulis, Stratos Idreos
Monkey: Optimal Navigable Key-Value Store
[paper website] [paper video]
In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2017.

2015

Stratos Idreos Data Systems That Are Easy to Design. ACM SIGMOD Blog, 2015.
Sam Xi, Oreoluwa Babarinsa, Manos Athanassoulis, Stratos Idreos
Beyond the Wall: Near-Data Processing for Databases
In Proceedings of the International Workshop on Data Management on New Hardware (DaMoN), 2015.

Vast Design Space

Custom-tailored Designs vs. Limited Resources