Airweave: Open-source semantic search across all your apps for agents
Article: PositiveCommunity: PositiveMixed

Airweave unifies data from 25+ apps into a searchable knowledge base for AI agents, exposed via REST and MCP. It manages auth, extraction, embeddings, incremental updates, and versioning, with SDKs for Python and TypeScript. Deploy in the cloud or self-host with Docker, backed by a modern stack (FastAPI, PostgreSQL, Qdrant, React/TS).
Key Points
- Transforms multiple apps and data sources into a unified, semantically searchable knowledge base for agents, accessible via REST API or MCP.
- Offers both managed cloud and easy self-hosting with Docker; local dashboard on port 8080 and Swagger API docs on port 8001.
- Supports 25+ integrations and manages auth, extraction, embedding, incremental syncing (content hashing), and versioning end-to-end.
- Provides official Python and TypeScript SDKs for creating collections, syncing data, and performing searches.
- Built with React/TypeScript (frontend) and FastAPI (backend), uses PostgreSQL and Qdrant, is MIT licensed, and encourages community contributions.
Sentiment
The HN community is generally supportive and constructive toward Airweave. Most commenters ask genuine technical questions rather than expressing hostility. The main concerns about RBAC, pricing, and privacy are raised as constructive feedback rather than dismissals. The appearance of the Onyx co-founder offering congratulations reinforces the positive tone.
In Agreement
- The unified search layer across multiple SaaS apps is a genuinely useful capability for agent developers
- The Cursor integration demo impressed commenters and showed practical value
- The distinction between Airweave as dev infrastructure and Onyx/Glean as end-user apps is meaningful and well-positioned
- Building fine-grained search rather than thin MCP wrappers addresses a real gap in the agent tooling ecosystem
Opposed
- RBAC and permissions are fundamentally hard to solve; per-user syncs are a workaround, and 'search anything' apps are inherently leaky unless indexing on the fly with passthrough credentials
- The pricing model is complex and prohibitive for building multi-user products; true usage-based pricing is needed
- Sending user data to external servers poses serious privacy risks given the industry's track record of data breaches
- OpenAI and Anthropic building similar connectors into their products may commoditize this capability over time