DASlab  ·  Harvard University

Cosine

A cloud-cost optimized self-designing key-value storage engine
that searches the storage-engine continuum for the best design.

Overview

Cosine is a self-designing key-value storage engine that automatically searches for the close-to-perfect engine architecture for a given input workload, cloud budget, target performance, and service-level requirements.

By identifying and formalizing the first principles of storage engine layouts and core key-value algorithms, Cosine constructs a massive design space comprising 1036 possible storage engine designs across diverse hardware and cloud pricing policies for three major providers: AWS, GCP, and Azure.

Cosine spans diverse points in the key-value design continuum, including log-structured merge-trees, B-trees, log-structured hash tables, in-memory accelerators for filters and indexes, and many hybrid combinations that do not appear as fixed systems in literature or industry.

Limousine builds on this foundation. It asks what happens when the navigational structures themselves become too large and expensive to keep in memory at cloud scale. The motivation for Limousine is straightforward: once memory cost becomes the bottleneck, a self-designing engine must also decide when to use learned, classical, or hybrid indexing structures so that performance improves without overpaying for RAM.

Why "Cosine"?

Cosine focuses on the geometry of cloud tradeoffs: cost, performance, hardware choice, and engine architecture. The system navigates those tradeoffs jointly instead of assuming one fixed storage-engine shape is good for every workload.

Why "Limousine"?

Limousine extends the journey that Cosine begins. It is designed for the point where memory-heavy navigational structures become the limiting factor, so the engine must travel farther across learned, classical, and hybrid designs to stay efficient.

Highlights

Cosine turns cloud budget and SLA targets into a concrete storage-engine design.

1036
Candidate Designs
Across layouts, hardware, and cloud pricing
3
Cloud Providers
AWS · GCP · Azure
Seconds
Search Time
Find the best design quickly
I/O + CPU
Unified Modeling
Distribution-aware I/O and learned CPU models
Budget
Cloud-Cost Aware
Designs for dollars, not just latency
SLA
Constraint Driven
Targets performance and operational requirements

The Cosine Design Space

Cosine models the storage-engine continuum instead of committing to one fixed engine.

Engine Families

LSM-trees, B-trees, log-structured hash tables, and broad hybrid variants.

Hardware Choices

Cosine searches jointly over VM types, hardware capabilities, and cloud pricing.

Workload Tuning

Designs adapt to query mix, target latency, budget, and operational constraints.

Cloud Economics

It reasons directly about dollar cost instead of optimizing only raw performance.

Existing storage engines are typically fixed points in the design space. A production system might commit to one data structure family, one set of memory tradeoffs, and one hardware configuration, even when workloads evolve or cloud economics shift.

Cosine reframes the problem: rather than selecting among a few named engines, it searches across a much larger continuum of valid storage-engine organizations. Its unified, distribution-aware I/O model and learned, concurrency-aware CPU model estimate the performance and cost of candidate designs with high accuracy, enabling rapid search.

The result is a system that can recommend not just an engine shape, but also which cloud provider and virtual machine choices best match the target workload and budget.

Try our Cosine!

From Cosine to Limousine

Limousine builds directly on Cosine and pushes the idea of self-designing storage further.

Cosine established the cloud-cost-optimized self-designing engine framework: search a huge design space, evaluate candidate engines quickly, and choose the best architecture for the workload, budget, and SLA.

Limousine extends that line of work by introducing a larger-than-memory setting where classical in-memory navigational structures can become too expensive. It expands the design space to blend learned, classical, and hybrid "clearned" structures within a single engine.

Larger-than-Memory
Key Shift
Limousine targets the regime where RAM cost becomes the bottleneck for navigational structures.

Learned + Classical

Limousine expands the design space to mix learned, classical, and hybrid indexing structures within one self-designing engine.

Publications

Subarna Chatterjee, Meena Jagadeesan, Wilson Qin, Stratos Idreos
Proceedings of the VLDB Endowment (PVLDB), 2022
Subarna Chatterjee, Mark F. Pekala, Lev Kruglyak, Stratos Idreos
Proceedings of the ACM on Management of Data (SIGMOD), Vol. 2, No. 1, Article 47, February 2024

People