Spiral: A Machine-Scale Database Built on Vortex for the AI Era
Read ArticleRead Original Articleadded Sep 11, 2025September 11, 2025
Spiral is a new database designed for machine-scale AI workloads, addressing the limitations of lakehouse-era tools that optimized for human-facing analytics. Built on the open Vortex columnar format, it delivers large speedups over Parquet and supports direct S3-to-GPU decoding, unified governance, and an API that handles data from small embeddings to large videos. The company has raised $22M to bring this Third Age architecture—throughput-first, object-store native, security-unified—to teams in vision, robotics, and multimodal AI.
Key Points
- AI has created a Third Age of data where machines require high-throughput, fine-grained access to entire datasets; legacy lakehouse/warehouse stacks optimized for human outputs are insufficient.
- The 1KB–25MB data range (e.g., embeddings, small images, large documents) is poorly served by Parquet on object storage, causing massive latency overhead, GPU starvation, and complex, costly pipelines.
- Security failures (overbroad access, leaked credentials, weak auditability) stem from the same architectural gaps as performance issues; speed and security are not trade-offs but both require the right primitives.
- Vortex, an open columnar format donated to the Linux Foundation, delivers Parquet-like compression with 10–20x faster scans, 5–10x faster writes, and 100–200x faster random reads, and is designed for direct S3-to-GPU decoding.
- Spiral, built on Vortex, is an object-store–native database with unified governance and one API for all data types, engineered to saturate GPUs and eliminate the false choice between inlining data or storing pointers.
Sentiment
Mixed, leaning skeptical: genuine interest in a GPU-friendly Parquet alternative, but significant pushback on hype, missing benchmarks, and unclear positioning.
In Agreement
- The core problem is real: GPUs are often underutilized because storage and CPU decode cannot feed data fast enough; improving end-to-end throughput matters.
- A columnar format designed for machine consumption (GPU-ready decoding, better random access, S3-friendly batching/streaming) could deliver significant gains.
- Spiral/Vortex’s emphasis on random reads and fast scans aligns with AI workloads that require large input-to-large output processing.
- Replacing or improving on Parquet’s performance characteristics—especially for sub-file random access—would be welcome in many data engineering workflows.
- Open-sourcing Vortex via the Linux Foundation and publishing a repo/docs gives credibility and a path for experimentation.
Opposed
- The announcement and websites are heavy on hype and light on concrete technical details, benchmarks, and reproducible evidence.
- “AI scale” and “Third Age of data” read as marketing jargon; many users aren’t convinced they need this scale or these features.
- Unclear product positioning (OLTP vs. OLAP) and commercialization strategy raises concerns about eventual proprietary forks and lock-in.
- Even with a better format, fundamental bottlenecks like the CPU–GPU bus (PCIe) remain; claims about saturating GPUs may be overstated.
- Potential complexity (e.g., embedded WASM encoders, C++ integration) is a turnoff; some prefer simpler formats and features (like multiple tables in a single file).
- Site reliability and accessibility issues (WebGL-only landing pages) and vague claims (e.g., 100x improvements without linked metrics) reduce trust.