Data & AI Systems Laboratory
Designing systems that design systems for the AI era
Self-Designing AI

Making highly efficient data & AI

We build System Calculators that automatically invent, design, and implement data and AI systems for end-to-end AI solutions that meet tailored workload, cloud cost, and performance targets.

What is a “System Calculator”?

Systems are the foundation of the data and AI era. This is how data is stored, how models are created, and how AI and context are managed. In short, systems define what is possible. But there is no single system that can support the massive diversity of data and AI applications, and building new systems takes years. In fact, a single AI system type can have a design space of over 10100 alternatives, yet practice still relies on a handful of “good templates,” each requiring years of manual tuning and suited to narrow scenarios.

DASlab pursues a fundamental shift: self-designing data and AI systems. We build Systems Calculators that unlock data and AI systems design and implementation by enabling understanding and reasoning about the massive design space of systems, i.e., navigating “all” possible ways to design a system.

We treat system design as a language:

  • primitives form an alphabet
  • architectures are sentences, and
  • a calculator synthesizes new blueprints on demand, often designs that humans overlook.

01 — PRIMITIVES
The Alphabet
Decompose systems into their fundamental design atoms: the smallest decisions that shape how data is laid out, accessed, and processed.
02 — COMPOSITION
The Grammar
Define rules for how primitives combine into coherent architectures, from model training pipelines to full data and inference engines.
03 — SYNTHESIS
The Calculator
Build math- and ML-driven algorithms that navigate this design space, finding optimal designs for specific workloads, hardware, and constraints.

Let us calculate. — Leibniz

Centuries ago, Leibniz imagined a universal calculus where arguments could be resolved through formal primitives and calculation, not endless debates. Inspired by this vision, we bring “let us calculate” to computer systems, synthesizing designs from primitives and grammars for each workload instead of hand-crafted engineering.

Active Research Projects

Self-designing Large Model Training Grammars for distributed-training algorithms

Large model training algorithms, automatically extracting every flop and byte from modern accelerators. show more

Self-designing Image AI Storage Grammars for Vision Pipelines

Image storage and neural networks co-design, unlocking order-of-magnitude speedups across vision pipelines. show more

Self-designing Context Management Grammars for Memory in Long-context AI

Systems that self-design how they store, compress, retrieve, and reason over long-range context. show more

Self-Designing Inference Compilers Grammars for the LLM operators

Self-designing execution plans for LLM inference under workload, latency and resource restrictions. show more

Self-designing Small Models & Small Agents Grammars for Small LLM Fine-Tuning

LLMs and Agents that have just enough reasoning power to perfectly complete their target use cases with less cost. show more

People



Join us

Interested to do research on big data systems with DASlab? We always look for strong graduate and undergraduate students to join us.

How to apply

Selected Publications

PROJECT 01

The Data Calculator

Data structures 1048 designs explored Seconds to invent

Key Publications

Stratos Idreos, Kostas Zoumpatianos, Subarna Chatterjee, Wilson Qin, Abdul Wasay, Brian Hentschel, Mike Kester, Niv Dayan, Demi Guo, Minseo Kang, Yiyou Sun IEEE Data Engineering Bulletin 2019
Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S. Kester, Demi Guo ACM SIGMOD 2018
PROJECT 02

Storage Engines for the Cloud

NoSQL 10100 candidate designs 1000× faster

Key Publications

Subarna Chatterjee, Mark Pekala, Lev Kruglyak, Stratos Idreos ACM SIGMOD 2024
Subarna Chatterjee, Meena Jagadeesan, Wilson Qin, Stratos Idreos PVLDB 2022
Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, Zichen Zhu CIDR 2019
PROJECT 03

Neural Network Systems

Key Publications

Sanket Purandare, Abdul Wasay, Animesh Jain, Stratos Idreos MLSys 2023
Abdul Wasay, Brian Hentschel, Yuze Liao, Sanyuan Chen, Stratos Idreos MLSys 2020
PROJECT 04

LegoAI & TorchTitan

Training large models Max hardware utilization Auto-scaling

Key Publications

Wanchao Liang, Tianyu Liu, Less Wright, Will Constable, Andrew Gu, Chien-Chin Huang, Iris Zhang, Wei Feng, Howard Huang, Junjie Wang, Sanket Purandare, Gokul Nadathur, Stratos Idreos ICLR 2025
Sanket Purandare (under the supervision of Stratos Idreos) Doctoral Dissertation, Harvard University 2025
PROJECT 05

The Image AI Calculator

Image AI Storage/NN co-design 10× faster

Key Publications

Utku Sirin, Victoria Kauffman, Aadit Saluja, Florian Klein, Jeremy Hsu, Stratos Idreos CIDR 2025
PROJECT 06

LSM-based Key-Value Stores for Big Data Applications

Data structures 1048 designs explored Seconds to invent

Key Publications

Niv Dayan, Stratos Idreos ACM SIGMOD 2019
Niv Dayan, Manos Athanassoulis, Stratos Idreos ACM SIGMOD 2017 Best Paper Award
PROJECT 07

Filters

Key Publications

Kyle Deeds, Brian Hentschel, Stratos Idreos PVLDB 2021
Siqiang Luo, Subarna Chatterjee, Rafael Ketsetsidis, Niv Dayan, Wilson Qin, Stratos Idreos ACM SIGMOD 2020
Niv Dayan, Manos Athanassoulis, Stratos Idreos ACM Transactions on Database Systems 2018
Brian Hentschel, Michael S. Kester, Stratos Idreos ACM SIGMOD 2018
The Periodic Table Philosophy

A periodic table for systems design

We map data-structure design into a structured space of primitives, so we can reason about what exists, predict what’s missing, and ultimately calculate new designs tailored to workload and hardware.

Explain existing designs through shared first principles.
Predict tradeoffs without implementing every candidate.
Discover new designs by exploring the gaps in the space.
Periodic Table of Data Structures

Courses

Big data systems sit in the critical path of everything we do, i.e., in businesses, in sciences, as well as in everyday life. The lab's courses offer a comprehensive introduction to modern data systems, and a research-oriented roadmap towards building systems that "scale up" and "scale out".

Undergraduate Research

So far 11 undergraduate DASlab teams have made it to the finals of the ACM SIGMOD Undergraduate Research Competition. We won the first place 6 times in 2016, 2017, 2018, 2019, 2020, and 2022. In 2020, we won both the first and the second place. In 2021, we won the third place.

If you are a Harvard undergrad interested in research with DASlab, taking CS165 and CS265 is the first step.

ACM SIGMOD Undergraduate Research Competition Finalists and Winners

Sponsors

Contact

How to find us

stratos email

Stratos' Office

4.411 SEC
(Science and Engineering Complex)
150 Western Ave
Boston, MA 02134

The Lab

4.435 SEC
(Science and Engineering Complex)
150 Western Ave
Boston, MA 02134