Epicure: Mapping the Culinary and Chemical Geometry of Food

Epicure is a suite of ingredient embedding models trained on a massive multilingual dataset of over 4 million recipes. By combining recipe co-occurrence data with chemical compound information from FlavorDB, the researchers created three distinct models that map food relationships. These models allow for a nuanced analysis of ingredients based on either their practical culinary context or their underlying molecular chemistry.

Key Points

Development of a multilingual recipe corpus containing 4.14 million recipes across seven languages.
Implementation of an LLM-augmented pipeline to normalize diverse ingredient strings into 1,790 canonical entries.
Creation of three distinct embedding models (Cooc, Chem, Core) using Metapath2Vec to explore the spectrum between culinary usage and chemical composition.
Integration of FlavorDB chemical compound data with recipe-based co-occurrence graphs to provide a multi-modal understanding of ingredients.

Sentiment

The overall sentiment is mixed and slightly skeptical. HN is interested in the computational gastronomy idea and sees practical value in ingredient embeddings, but the community broadly disagrees with the grand framing and repeatedly narrows the claim to a useful ingredient map rather than a representation of cooking itself.

In Agreement

Ingredient embeddings can be a useful resource for flavor pairing, substitutions, and exploratory culinary maps.
The normalization pipeline and multilingual ingredient mapping are valuable for people building recipe, food, and cooking tools.
Cross-cultural co-occurrence patterns and chemical-flavor relationships can reveal why familiar pairings work and suggest unexpected combinations.
Specialized food models and interfaces could make recipe search, pantry-aware ideation, and recipe visualization better than generic recipe pages.
LLMs and related tools can help in the kitchen when users provide clear constraints, pantry context, cuisine expectations, and technique requirements.
Compact recipe schematics, dependency graphs, and visual maps are promising ways to make cooking processes easier to scan and coordinate.

Opposed

The headline and framing are misleading because the work maps ingredients, not the full practice of cooking.
Preparation methods, ratios, timing, heat control, sequence, smell, texture, nutrition, culture, and experience are central to cooking and are not represented by ingredient co-occurrence alone.
The corpus appears uneven across languages and cuisines, making broad claims about human cooking feel unjustified.
Canonical ingredient labels can erase important distinctions among varieties, local names, and taxonomic differences that matter to actual cooking.
LLM-generated recipes inherit noisy source data and cannot taste, so they still require human judgment and repeated cooking to become reliable.
Automating cooking or treating cuisine as a compact abstraction risks flattening the human, cultural, and expressive parts of food.
The project needs stronger empirical validation, such as having generated recipes judged independently rather than relying on attractive embeddings or demos.