Stop paying the JPEG tax. A self-designing storage format that adapts to your AI task —
because images are seen by algorithms, not eyes.
Image AI is everywhere — and it is expensive. The bottleneck is not the neural network; it is the data.
Every year, billions of cameras capture trillions of images. Hospitals use image AI to detect disease. Governments use it to monitor infrastructure. Farmers use it to protect crops. The benefits are immense — but so is the cost. Inference cost alone accounts for nearly 90% of AI spending at companies like Amazon and Google.
The root cause is hiding in plain sight: JPEG. Nearly all images today are stored in JPEG, a format designed in the 1990s for the human eye. It maximally compresses images so they look good to people. But during AI inference, images are "seen" by neural networks, not humans. JPEG's assumptions are wrong for AI — and the result is wasted storage, wasted bandwidth, and wasted compute.
The deeper problem is that inference time is a sum of parts: disk I/O, CPU decoding, PCIe transfer, and GPU execution all contribute. As AI models become more efficient — reducing GPU FLOPs by up to 98% — the other parts don't shrink. Inference time stays the same, because storage determines how much data must be moved and processed at every step.
Our thesis: no single storage format is sufficient. Efficient image AI requires tailoring storage to the specific dataset, model, hardware, and performance budget. The Image Calculator is our answer.
Evaluated across diverse datasets, models, and hardware —
consistently outperforming JPEG and its modern variants.
Instead of a fixed format, the Image Calculator constructs a massive design space and finds the optimal storage format for your specific AI task.
The Image Calculator is a storage-format generator. It takes your dataset, AI model, hardware, and performance budget as input and outputs the most efficient storage format for your task. Unlike JPEG — which makes one set of compromises for every problem — the Image Calculator makes different, optimal decisions for every scenario.
It works by decomposing image storage into four fundamental design primitives: subsampling (how to reduce pixels), block size (how to partition the image), DCT coefficient selection (which frequency components to keep), and quantization (how aggressively to reduce precision). Each primitive has a carefully analyzed domain, and their combinations define a design space of ~6,000 storage formats — reduced from a practically infinite space via sensitivity analysis.
The key insight: AI models need only low-frequency information. High-frequency DCT coefficients contribute little to model accuracy but consume significant storage and compute. By removing them, the Image Calculator achieves scalable image representation — smaller images that are faster to read, decode, transfer, and execute on GPU.
6,000 candidate storage formats constructed from four fundamental primitives.
Performance models using sampling, interpolation, and transfer learning — 2.7–21x faster than brute force.
Frequency-domain images scale in memory with data size — GPU time shrinks proportionally.
What-if analysis: explore time, accuracy, and storage trade-offs in real time.
What if multiple AI applications could share the same image data, each reading only what it needs? Frequency-Store makes this possible with the first column-store for images.
Images stored column-by-column by frequency component, not file-by-file.
A family of storage formats where every format is a subset of a richer one — share data across apps.
Over 95% of inference cost is data movement. Read only the columns you need.
Image frequency coefficients have varying distributions — store each column with the right data width.
Today's image storage is designed for a single user reading a single file. But modern AI deployments run many applications simultaneously over the same data, each with different resource budgets and accuracy requirements. Frequency-Store is the first storage system built for this reality.
The core idea: images are broken into frequency components (columns) and stored column-by-column rather than file-by-file. An application that needs only low-frequency data reads just those columns — skipping everything else. Two applications with different accuracy budgets automatically share the columns they have in common. This eliminates redundant data movement, which accounts for over 95% of image AI inference cost on networked storage.
Frequency-Store introduces shareable storage formats: a family of formats where each is a strict subset of the next, ordered by frequency content. Any format can be derived from the richest available copy — no duplication required. Combined with multimodal columnar encoding (different bit widths for different frequency ranges), Frequency-Store achieves up to 11x speedup and 2.2x better compression versus state-of-the-art storage.