Can I Run AI: The Local LLM Hardware Compatibility Guide

Can I Run AI is a comprehensive compatibility database that matches local hardware specs with the requirements of various Large Language Models. It provides detailed VRAM estimates and performance grades for models ranging from lightweight edge AI to massive frontier systems. The tool serves as a technical guide for users looking to deploy open-weight AI models locally on GPUs or mobile chips.

Key Points

Hardware-Specific Benchmarking: The tool allows users to select specific GPUs or SoCs to see real-time estimates of how different LLMs will perform on their local machines.
Comprehensive Model Database: It catalogs a wide spectrum of models, from ultra-tiny 0.8B parameter models for edge devices to massive 1T parameter Mixture-of-Experts (MoE) models.
Performance Grading System: A simplified S-to-F grading system helps users quickly identify if a model is a 'tight fit' or 'too heavy' for their current VRAM and memory bandwidth.
Technical Metadata: Each model entry provides deep technical insights, including quantization levels, active vs. total parameters, and estimated tokens per second.
Task-Based Filtering: Users can sort and filter models based on specific capabilities such as reasoning, coding, vision, or multilingual support.

Sentiment

HN was broadly enthusiastic about the concept but technically experienced users were frustrated by inaccurate estimates and missing hardware. The tone was constructive—criticism was framed as suggestions for improvement. Most users saw value in the idea even while pointing out gaps. The local-vs-cloud economics debate was substantive but not particularly heated.

In Agreement

Small local models like Qwen3.5:9B are genuinely useful for embedded tasks including OCR, information extraction, email categorization, and log parsing where cloud dependency is undesirable
The concept of a hardware compatibility guide for local LLMs fills a real need, particularly for purchase decisions comparing hardware options
Privacy and offline access are legitimate and compelling reasons to run models locally, regardless of economic comparisons to cloud APIs
AMD's Ryzen AI Max+ 395 (Strix Halo) is a powerful and underappreciated platform for local LLM inference, competitive with Apple Silicon
For frontier-level coding and reasoning tasks, cloud models remain substantially better than what can practically run locally on consumer hardware

Opposed

The site's performance estimates are substantially inaccurate for MoE models, which are treated as dense models—causing real-world performance to far exceed the tool's pessimistic predictions
Critical hardware is missing from the database: RTX Pro 6000, NVIDIA Spark, Radeon VII, RTX 5060/5060Ti, Ryzen AI Max+ 395, GH200, and more
The site doesn't account for quantized models, displaying full-precision memory requirements that make many models appear incompatible when quantized versions work fine
The site lists a non-existent 'M4 Ultra' chip and has incorrect memory ceilings for real chips (e.g., M3 Ultra capped at 192GB instead of 512GB)
Local inference economics rarely justify the hardware cost compared to cloud APIs, which offer higher capability, faster speeds, and often lower total cost per token