Several application workloads today contain a mix of reads and writes blurring the traditional boundaries between transactional and analytical processing, and hence, blurring the boundaries between the tools we use. As a result, new system design paradigms that can support general updates and offer competitive read performance are needed.
In this work we focus on the challenge of building an update-aware bitmap index. How can we take a traditionally read-optimized and space-optimized access method and provide at a small cost more efficient updates?
Bitmap indexes are well suited for equality and low selectivity queries as we can quickly get all qualifying rowIDs. In addition, they efficiently support analytical queries that include logical operations and multiple selections in the where clause. A basic bitmap index consists of multiple bitvectors one for each distinct value of the domain of the indexed attribute A. Without compression the storage requirements for a bitmap index are very high, however, a lot of recent research on bitvector encoding/compression greatly reduce the bitmap index size, through variations of run-length encoding.
This comes at a price: every read operation requires a bitvector decode while every update operation requires a decode followed by an encode. The latter makes in-place updates prohibitively expensive.
To address this problem we study the design principles of bitmap indexing taking into account all prior approaches and we identify and quantify three bottlenecks when updating: (i) decode and re-encode when updating in-place, (ii) update pressure on a single auxiliary bitvector when enabling out-of-place invalidation, and (iii) repetitive bitwise operation with the auxiliary bitvector to re-evaluate the value positions of the bitmap index. We introduce three new design principles to mitigate these problems.
UpBit buffers updates in auxiliary bitvectors, which distributes the update pressure to multiple bitvectors to avoid the problem of directing all updates to a single bitvector.
UpBit uses localized decoding without the need to decode the full bitvector when merging the buffered updates with the value bitvector.
Finally, UpBit follows the workload to adaptively merge the updates with value bitvectors, avoiding repetitive bitwise operations and keeping the buffer size small.
Putting together the three design elements we build UpBit, a scalable in-memory udpatable bitmap index, which offers very fast updates (several orders of magnitude faster than prior approaches), with competitive read performance.
Manos Athanassoulis, Zheng Yan, Stratos Idreos
UpBit: Scalable In-Memory Updatable Bitmap Indexing
[most reproducible paper award]
In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2016.