Vast Design Space

A half-century of database systems research yielded numerous data system architectures, each optimized for a specific set of applications. Designed along multiple dimensions, such as data layouts, storage architectures or recovery strategies, application architects and software developers are faced with a plethora of different feature sets and design options to choose from. This vast design space is still growing as changes in hardware and applications introduce new concerns that warrant new techniques.

Design space

Custom-tailored Designs vs. Limited Resources

Today, matching a scientific or commercial application with its perfect data system is a time-consuming task that not only requires expertise in the area of databases, but also a willingness to compromise. Often, off-the-shelf solutions will only provide suboptimal performance. However, building a custom-tailored system for the task at hand is an expensive endeavor. Modifying an existing system under today’s monolithic implementations is extremely complex, while designing and building a new data system from scratch requires expertise and tens of man-years worth of time.

Difficult Choices

Self-designing Data Systems

Rather than chasing changes in workload and hardware by continually designing and implementing new systems from scratch, or forcing end-users to settle for suboptimal solutions, we envision self-designing data systems that smoothly and autonomously navigate the design space to quickly generate the optimal solution for a given application. Self-designing data systems would relieve both system designers and end-users of data management headaches, culminating in greater productivity. Moreover, a self-designing system may discover new architectures that researchers would have never even considered by synthesizing new solutions out of existing ones, mimicking the natural process data system architects are performing manually.

We are building an infrastructure that allows for design exploration and visualization of core systems components. Designers can quickly and interactively design core system components; they can easily combine design options, try out alternative designs at a fine granularity, get instant feedback on the impact of their design decisions, ask what-if design questions, get suggestions about good and bad designs, and even semi-automate the process of discovering entirely new and previously unexplored designs, that is, doing research.

Solution: Self-Designing Data Systems

Data Management and Interactive Demo

Categorization of Data Structure Design Decisions

We have summarized the data for the design space of data structures into the periodic table of data structures which categorizes design decisions: not only how they manifestate in existing designs but also how they may be combined to create new and so far unknown designs.

Interactive Demo for Exploring Design Space of Key-Value Data Structures

We provide an interactive demo for users to both explore the possible design space of key-value data structures and interactively generate the data (expected performance properties of a design given a workload and hardware) in a matter of seconds.

Source code: git@bitbucket.org:HarvardDASlab/data-calculator.git
An initial set of examples to use the Data Calculator can be found here.
Detailed examples of the data model, and input/output examples can be found in the Data Calculator Technical Report
.

Publications

2019
  1. M. Athanassoulis, K. S. Bøgh, and S. Idreos, Optimal Column Layout for Hybrid Workloads
    In Proceedings of the Very Large Databases Endowment, 2019.
  2. N. Dayan and S. Idreos, The Log-Structured Merge-Bush & the Wacky Continuum
    In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2019.
  3. S. Idreos and T. Kraska, From Auto-tuning One Size Fits All to Self-designed and Learned Data-intensive System
    In ACM SIGMOD International Conference on Management of Data, 2019.
  4. S. Idreos, et al., Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn
    In Biennial Conference on Innovative Data Systems Research (CIDR), 2019.
2018
  1. N. Dayan, M. Athanassoulis, and S. Idreos
    Optimal Bloom Filters and Adaptive Merging for LSM-Trees
    In ACM Transactions on Database Systems, 2018.
  2. S. Idreos, K. Zoumpatianos, B. Hentschel, M. S. Kester, and D. Guo
    The Data Calculator: Data Structure Design and Cost Synthesis From First Principles, and Learned Cost Models
    In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2018.
  3. N. Dayan and S. Idreos
    Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging
    In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2018.
  4. B. Hentschel, M. S. Kester, and S. Idreos
    Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation
    In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2018.
2017
  1. Michael S. Kester, Manos Athanassoulis, Stratos Idreos
    Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?
    [paper website] [paper video]
    In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2017.
  2. Niv Dayan, Manos Athanassoulis, Stratos Idreos
    Monkey: Optimal Navigable Key-Value Store
    [paper website] [paper video]
    In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2017.
2015
  1. Stratos Idreos Data Systems That Are Easy to Design. ACM SIGMOD Blog, 2015.
  2. Sam Xi, Oreoluwa Babarinsa, Manos Athanassoulis, Stratos Idreos
    Beyond the Wall: Near-Data Processing for Databases
    In Proceedings of the International Workshop on Data Management on New Hardware (DaMoN), 2015.

Sponsors