Spiral: A Machine-Scale Database Built on Vortex for the AI Era

Added Sep 11, 2025
Article: PositiveCommunity: NegativeMixed

Spiral is a new database designed for machine-scale AI workloads, addressing the limitations of lakehouse-era tools that optimized for human-facing analytics. Built on the open Vortex columnar format, it delivers large speedups over Parquet and supports direct S3-to-GPU decoding, unified governance, and an API that handles data from small embeddings to large videos. The company has raised $22M to bring this Third Age architecture—throughput-first, object-store native, security-unified—to teams in vision, robotics, and multimodal AI.

Key Points

  • AI has created a Third Age of data where machines require high-throughput, fine-grained access to entire datasets; legacy lakehouse/warehouse stacks optimized for human outputs are insufficient.
  • The 1KB–25MB data range (e.g., embeddings, small images, large documents) is poorly served by Parquet on object storage, causing massive latency overhead, GPU starvation, and complex, costly pipelines.
  • Security failures (overbroad access, leaked credentials, weak auditability) stem from the same architectural gaps as performance issues; speed and security are not trade-offs but both require the right primitives.
  • Vortex, an open columnar format donated to the Linux Foundation, delivers Parquet-like compression with 10–20x faster scans, 5–10x faster writes, and 100–200x faster random reads, and is designed for direct S3-to-GPU decoding.
  • Spiral, built on Vortex, is an object-store–native database with unified governance and one API for all data types, engineered to saturate GPUs and eliminate the false choice between inlining data or storing pointers.

Sentiment

The community is predominantly skeptical. While the underlying Vortex format generates genuine technical curiosity, the marketing-heavy launch with vague buzzwords and lack of concrete technical detail draws significant criticism. Multiple commenters express frustration at being unable to determine what Spiral actually is or does from the blog post and landing pages. The overall tone suggests cautious interest in the technology undermined by distrust of the presentation.

In Agreement

  • The Vortex file format addresses real performance limitations of Parquet, particularly for random access and GPU-direct data decoding that bypasses CPU bottlenecks in AI data pipelines.
  • GPU utilization is genuinely limited by storage and CPU decoding bottlenecks, and a format designed for direct S3-to-GPU data transfer could meaningfully improve AI training throughput.
  • Donating Vortex to the Linux Foundation demonstrates commitment to open standards and prevents future re-licensing, which builds trust in the ecosystem.
  • The AnyBlox paper from TUM provides academic validation of Vortex's approach to composable data architectures.
  • The 'uncanny valley' between small and large data sizes is a real pain point that current systems handle poorly.

Opposed

  • The blog post and landing pages are all marketing bluster with no technical substance—no accessible benchmarks, no clear product description, and no honest discussion of tradeoffs.
  • 'AI scale' is a meaningless buzzword reminiscent of MongoDB's 'web scale' marketing, and the 'three eras' framing is historically oversimplified and self-serving.
  • Existing tools like Lance, DuckDB, and Iceberg could achieve similar benefits by simply adding Vortex format support, making a whole new database unnecessary.
  • The CPU may not actually be the primary bottleneck between S3 and GPU—network bandwidth and cost are more likely limiting factors for most users.
  • Engineering announcements should honestly discuss what the product is bad at, not just make grandiose claims about being revolutionary while claiming to be skeptical of revolutionary claims.