AI Carb Counting: A Dangerous Gamble for Insulin Dosing

A study of nearly 27,000 AI queries revealed that leading models provide inconsistent and often dangerously inaccurate carbohydrate estimates from food photos. While some models like Claude are more consistent, all exhibit systematic biases and provide unreliable confidence scores that do not reflect actual accuracy. Because a single outlier estimate can lead to a severe hypoglycemic emergency, the author warns against using AI autonomously for insulin calculations.

Key Points

AI models are stochastically inconsistent, often providing widely different carbohydrate estimates for the exact same photo and prompt.
There is a systematic bias toward overestimating carbohydrates, which increases the clinical risk of insulin overdose and hypoglycemia.
AI-generated confidence scores are misleading and have little to no correlation with the actual accuracy of the carbohydrate estimate.
Even when a model is consistent, it can be 'precisely wrong,' such as multiple models independently converging on the same incorrect value for a simple cheese sandwich.
The variation in estimates is large enough to be clinically dangerous, with some models producing outliers that could lead to fatal insulin doses.

Sentiment

The community is broadly sympathetic to the article's core conclusion that AI should not be trusted for insulin-dosing carb estimates, but many commenters are frustrated with the study's methodology and presentation. A significant faction argues the study proves something obvious and would have been more valuable if it tested commercial apps directly. The overall mood is one of agreement with the warning but criticism of how the case was made.

In Agreement

The study serves an important public health function by quantifying what technical people consider obvious but millions of non-technical app users do not understand
Estimating carbs from photos is fundamentally an ill-posed problem — hidden ingredients, portion sizes, and preparation methods are invisible in images
AI confidence scores showing no correlation with accuracy is particularly dangerous because users cannot self-filter bad estimates
The study replicated real production prompts from the iAPS open-source insulin delivery app, making it a valid test of how these tools are actually used
AI companies bear responsibility for marketing their models as capable of everything without adequate limitation disclosures
LLMs should be trained to say 'I don't know' rather than confidently answering questions they cannot reliably answer

Opposed

The study is 'water is wet' — anyone with basic LLM understanding knows you cannot get reliable numerical outputs from a photo alone without grounding tools
Testing raw frontier models rather than the commercial apps that claim to do carb counting makes the study less useful and actionable
A proper system would use food databases, barcode scanning, weight measurements, and multi-model consensus rather than a single LLM call
For general calorie counting and weight loss, AI estimates are good enough since the real benefit comes from mindfulness about food consumption, not precision
Future models with better reasoning, tool use, and fine-tuning could potentially solve this problem, so the study only proves current models cannot do it
The variance issue could be mitigated by averaging multiple runs or using temperature 0, making the problem more tractable than presented