The database research community has been building methods to store, access, and update data for more than four decades. Throughout the evolution of the structures and techniques used to access data, access methods adapt to the ever changing hardware and workload requirements. Today, even small changes in the workload or the hardware lead to redesigning access methods. This phenomenon has reached its peak as data generation and workload diversification grow exponentially, and hardware advances introduce increased complexity. New workload requirements are introduced by the emergence of new applications, and data is managed by large systems composed of more and more complex and heterogeneous hardware. As a result, it is increasingly important to develop application-aware and hardware-aware access methods.
The fundamental challenges that every researcher, systems architect, or developer faces when designing a new access method are how to minimize, i) read times (R), ii) update cost (U), and iii) memory (or storage) overhead (M). In this project we first conjecture that when optimizing the read-update-memory overheads, optimizing in any two areas negatively impacts the third. Based on the RUM Conjecture, at DASlab, we study the manifestation of the balance of the RUM overheads in state-of-the-art access methods, and we pursue a path toward RUM-aware access methods for future data systems.
When building access methods of modern systems, one is confronted with the same fundamental challenges, and design decisions. In particular, there are three quantities that researchers always try to minimize: (1) the read overhead (R), (2) the update overhead (U), and (3) the memory (or storage) overhead (M), which we term RUM overheads. Deciding which overhead(s) to optimize for and to what extent, remains a prominent part of the process of designing a new access method, especially as hardware and workloads change over time. The design space of the three RUM overheads can be seen as a three dimensional space or, if projected on a two-dimensional plane, as the triangle shown in the left hand-side, where each access method is mapped to a point or -- if it can be tuned to have varying RUM behavior -- to an area.
An ideal solution is an access method that always provides the lowest read cost, the lowest update cost, and requires no extra memory or storage space over the base data. In practice, data structures are designed to compromise between the three RUM overheads, while the optimal design depends on a multitude of factors such as hardware, workload, and user expectations.
We propose the RUM Conjecture:
When designing access methods we set an upper bound for two of the RUM overheads, this implies a hard lower bound for the third overhead which cannot be further reduced
Today, when building access methods, there is a wealth of approaches tailored for specific use cases. For example, in order to minimize the cost of updating data, one can use a design based on differential structures, allowing many queries to consolidate updates and avoid the cost of reorganizing data. Such an approach, however, increases the space overhead and hinders read cost as now queries need to merge any relevant pending updates during processing. Another example is that the read cost can be minimized by storing data in multiple different physical layouts, each layout being appropriate for minimizing the read cost for a particular workload. Update and space costs, however, increase because now there are multiple data copies. We further study how existing access methods balance the RUM overheads.
The RUM Conjecture opens the path for exciting research challenges towards the goal of creating RUM-adaptive access methods. Future data systems should include versatile tools to interact with the data the way the workload, the application, and the hardware need and not vice versa. In other words, the application, the workload, and the hardware should dictate how we access our data, and not the constraints of our systems.
Tuning access methods becomes increasingly important if, on top of big data and hardware, we consider the development of specialized systems and tools to cater data, aiming at servicing a narrow set of applications each. As more systems are built, the complexity of finding the right access method increases as well. As part of this project we design access methods with dynamic and tunable RUM behavior.
M. Athanassoulis, M. S. Kester, L. M. Maas, R. I. Stoica, S. Idreos, A. Ailamaki, and M. Callaghan
Designing Access Methods: The RUM Conjecture [slides]
In Proceedings of the International Conference on Extending Database Technology (EDBT) Conference, 2016
This project is fostered by collaboration with EPFL, Switzerland and Facebook, Inc.
This project has been supported by the Swiss National Science Foundation through a Postdoc Mobility Fellowship.