A Skeptic’s Guide to Running Local LLMs on macOS

A skeptical but practical guide to running LLMs locally on Apple Silicon Macs using llama.cpp or LM Studio. It explains why local use matters (experimentation, privacy, ethics), how to pick models (size, runtime, quantization, vision/reasoning), and how to safely use tools via MCPs. The author stresses fact-checking, avoiding anthropomorphism, and using compaction to manage context.
Key Points
- Run LLMs locally to experiment freely, protect sensitive data, and avoid funding companies with questionable practices.
- Two main options on macOS: llama.cpp (open-source, flexible) and LM Studio (closed-source, easier UI with guardrails and MCP/tooling).
- Choose models based on RAM-constrained size, correct runtime/format (GGUF vs. MLX), 4-bit quantization, and whether you need vision or reasoning.
- Use tools/MCPs cautiously (confirm tool calls, beware data exfiltration); they’re powerful but quickly pollute context.
- LLMs are helpful for summarization and brain-dumps but hallucinate; always fact-check and avoid anthropomorphizing.
Sentiment
The community reception is broadly positive about local LLMs as a concept and appreciates the article's practical, skeptical perspective. Enthusiasm for the topic is tempered by realistic acknowledgment that local models lag behind frontier cloud services in quality and that the hardware requirements remain steep. The Apple CEO/strategy debate generated the most heat but is tangential to the article's core message. Overall, the discussion is constructive and information-rich, with many commenters sharing practical tips and model recommendations.
In Agreement
- Running local LLMs on Apple Silicon feels magical in its simplicity — download a file and get a working AI assistant
- Apple Silicon's unified memory architecture gives it a real advantage for local inference compared to traditional GPU setups
- Privacy is a legitimate and important benefit of running models locally, especially for personal journals and sensitive data
- LM Studio and llama.cpp are solid tool recommendations for beginners and power users respectively
- The article's skeptical, measured tone is refreshing compared to typical AI hype — always fact-checking and acknowledging limitations is the right approach
- Small local models are genuinely useful for summarization, brainstorming, and tasks where perfect accuracy isn't critical
Opposed
- Local models still hallucinate badly and require meticulous verification, making them more time-consuming than doing tasks manually for many use cases
- Prompt processing speed on Apple Silicon is a major hidden bottleneck — demo tweets showing fast token generation hide the minutes-long wait to ingest context
- The cost of high-end hardware for local inference ($5-12k) is hard to justify versus cloud subscriptions, potentially taking years to break even
- Apple's locked-down ANE prevents full utilization of the hardware's potential for LLM workloads
- Most consumers don't care about privacy enough to run local models, as proven by the success of Facebook, TikTok, and cloud AI services
- Battery drain makes running LLMs on laptops impractical for mobile use without a power outlet