CauseNet: An Open 11M-Relation Causality Graph from the Web

Read Articleadded Sep 4, 2025

CauseNet is a large, open-domain graph of over 11M claimed causal relations extracted from web sources with detailed provenance and an estimated 83% precision. It offers Full, Precision, and Sample editions, tools for Neo4j loading, and datasets for training a concept spotter. The resource demonstrates benefits for causal QA and targets future applications in reasoning and argumentation.

Key Points

  • CauseNet aggregates over 11 million claimed causal relations into an open-domain causality graph with an estimated 83% extraction precision.
  • Data are mined from multiple web sources (ClueWeb12, Wikipedia sentences, lists, and infoboxes) and each relation includes detailed provenance metadata.
  • Three dataset editions balance coverage and quality: Full, Precision subset, and a small Sample without provenance; example code supports Neo4j loading.
  • A sequence-tagging concept spotter identifies multi-word causal concepts; associated training/evaluation datasets are publicly available (80/10/10 splits).
  • The resource supports causal QA and broader reasoning tasks, with initial QA gains demonstrated and permissive licensing encouraging community use and extension.

Sentiment

Mostly skeptical to negative, with some cautious optimism about niche or downstream uses when paired with richer causal modeling, uncertainty, and LLM-based filtering.

In Agreement

  • As a seed resource of claimed causal statements with provenance, it can aid exploration, contradiction mapping, hypothesis generation, and basic causal question answering.
  • It could augment LLMs and GraphRAG systems, with models helping refine, annotate, and filter the ontology into something more useful.
  • In limited domains or with added qualifiers (conditions, confidence, population), causal graphs can provide value.
  • Open semantic web/knowledge graph experiments remain promising; provenance is useful for auditing and improvement.
  • Contradictory links can signal areas needing higher-resolution causal chains and deeper investigation.

Opposed

  • This resembles past ontology projects like Cyc—brittle, labor-intensive, and unsuccessful at broad real-world reasoning.
  • Edges are too vague and unqualified; cause→effect pairs without mechanisms, conditions, or strengths are practically useless and produce nonsensical multi-hop paths.
  • It captures beliefs, correlations, or definitions rather than verified causation, risking amplification of misinformation (e.g., “vaccines→autism”).
  • Extraction appears regex/pattern-based; the claimed precision is doubtful, and tautologies/definition-edges (e.g., “influenza virus causes influenza”) slip in.
  • Reliance on Wikipedia and generic web text undermines reliability; causality cannot be deduced this way without rigorous methodology.
  • Causality is high-dimensional and context-dependent; vector/LLM approaches may represent it better than symbolic triples.
  • Proper causal analysis needs uncertainty modeling, causal diagrams/DoWhy, and operationalized variables—absent here.
CauseNet: An Open 11M-Relation Causality Graph from the Web