DL systems

Deep Learning in Data Systems

Deep neural networks are deployed at various stages of the query optimization pipeline. They are used to improve selectivity estimates for queries targeting multiple attributes (Hasan2020, Hayek2020_EDBT). There are proposals to use deep neural networks to generate query plans directly (Vu2019). Finally, deep reinforcement learning techniques are used to tune various design knobs within a database system, such as the data layout and memory allocation to various components (Kara2019, Li2019, Zhang2020).

Enhancing Access Methods

Neural networks are also used to replace or enhance access methods and index structures such as B-trees and Bloom filters. Learned indexes, for instance, can take the form of deep neural networks and learn the mapping between data items and their location (Kraska2017). For instance, SageDB is a database system that proposes a holistic database system designed around learned components (Kraska2019) and MLWeaving is an in-memory data structure that enables faster learning of low-precision data within databases (Wang2019). Our own work on self-designing data systems (Idreos2019) and a Calculus of Data structures (Idreso2018, Idreos2018J, Idreos2019) shows how to use neural networks to navigate massive complex design spaces of fine-grained system designs and learn cost models on how these primitives behave without having to code the target system designs.

Enabling Data Exploration

Data exploration is an area of active research within the data systems community to design tools and techniques to enable a data scientist to understand the various properties of new data sets (Wasay2015, Wasay2017). Deep reinforcement learning techniques are applied to learn from user interactions and automatically guide them to insights in their data sets (Li2019, Tran2020, Thirumuruganathan2020). Recurrent neural networks are also used to enable natural language querying of databases and generate exploratory queries (Sen2019, Bar2020). Lastly, techniques inspired by deep word embeddings are used to enhance similarity search within relational databases (Echihabi2020).

Compressing and Integrating Data

Finally, neural networks are also used to compress relational data sets and enhance data integration through more accurate entity matching (Mudgal2018, Ilkhechi2020). For instance, Bit-Swap, a deep learning-based lossless data compression technique, uses hierarchical latent variable models to outperform benchmark compressors.

Research Opportunities

We will present opportunities to rethink several decision-making components within database systems and extend them using deep learning models, including low-level and high-level design decisions from data structure design to query scheduling. Then, we will discuss opportunities to exploit the representational capability of deep data embeddings to learn semantic information about the data set that can inform both query processing and guide database users. Finally, there are open questions on extending, scaling, and managing deep learning-based access methods and data models.