DASlab  ·  Harvard University

Overview

Column Sketches are a secondary indexing technique designed to provide consistent and efficient predicate evaluation over column, row, and mixed data layouts.

Each Column Sketch has two components: a compression map and a sketched column. The compression map is a lossy function from values in the base domain to numerical codes, specifically designed to help predicate evaluation. Applying that map value-by-value yields a compact sketched column with fixed-width codes.

When queries arrive, the system first evaluates them affirmatively or negatively for the vast majority of tuples using only the sketched column. Only a small remaining fraction needs to consult the base data, which keeps predicate evaluation fast while remaining robust.

Compression Map

A carefully designed lossy mapping turns original values into compact codes that remain useful for predicate evaluation.

Sketched Column

The compressed codes form a small auxiliary column that filters most tuples before touching the base layout.

Column Sketch construction overview

Highlights

Column Sketches reduce data access during predicate evaluation while preserving robust performance guarantees.

2
Core Components
Compression map plus sketched column
1 / 2b
Base Data Access
Average access rate when using b-bit sketch codes
Robust
Predicate Evaluation
Efficient over column, row, and mixed data layouts

Research

The key design goal is to compress values in a way that preserves predicate usefulness instead of optimizing only generic compression ratio.

Heavy Hitters

Frequent values receive unique codes so predicate evaluation can often decide directly in the sketch.

Uniform Codes

The code distribution is kept close to uniform so no single code dominates the sketch and weakens filtering.

Mostly Sketch-Only

Most tuples can be answered positively or negatively using the sketch before consulting base data.

Less Base Access

Larger code budgets reduce how often the base layout must be read during predicate evaluation.

Column Sketch predicate evaluation example

The compression map is the mechanism that guarantees Column Sketch performance. It first identifies the heavy hitters of the input domain and assigns each one a unique code. It then shapes the remaining mapping so the resulting code distribution is approximately uniform.

Because of these properties, a Column Sketch rarely needs to touch many base tuples. If a sketch uses b bits per code, then on average the base data is accessed once out of every 2b values. As the code budget grows, the sketch skips more base data and the resulting predicate evaluation reads less than a scan over the original layout.

Paper

Brian Hentschel, Michael S. Kester, Stratos Idreos
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2018

People