The LLM Architecture Gallery: Mapping the Evolution of Open-Weight Models

This technical gallery provides a detailed comparison of prominent open-weight LLM architectures released between 2024 and 2026. It tracks the evolution of model design from standard dense decoders to complex Sparse Mixture-of-Experts and hybrid systems like Mamba-transformer blends. By documenting specific parameters and attention mechanisms, the resource highlights the industry's intense focus on inference efficiency and long-context performance.

Key Points

The industry is rapidly transitioning from traditional dense architectures to Sparse Mixture-of-Experts (MoE) to optimize the ratio of total parameters to active inference costs.
Advanced attention mechanisms such as Multi-Head Latent Attention (MLA) and sliding-window attention are becoming standard for reducing KV cache overhead in long-context models.
A new wave of hybrid architectures is emerging, blending transformer blocks with alternative layers like Mamba-2, DeltaNet, or Lightning Attention for improved scaling.
Normalization and stability techniques, including QK-Normalization and variations in residual layouts (pre-norm vs. post-norm), remain critical differentiators in modern training recipes.
The gallery emphasizes transparency by providing direct links to tech reports, configuration files, and 'from scratch' implementation code for open-weight models.

Sentiment

Overwhelmingly positive toward the resource itself, with strong appreciation for the quality and presentation. The intellectual discussion leans toward a gentle counterpoint: while architecture matters, the gallery highlights how convergent modern designs are, and the real story of LLM progress lies in scale and training. This is more of a 'yes, and...' nuance than genuine disagreement.

In Agreement

The gallery is an excellent, well-executed reference for understanding the internal architecture of modern LLMs, comparable to the Neural Network Zoo for earlier architectures.
Architectural details like attention mechanisms and normalization techniques matter for practical concerns such as context window behavior, inference efficiency, and prompt pattern effectiveness.
The resource fills an important gap for practitioners and researchers wanting a modular understanding of how real-world models are engineered.

Opposed

The gallery inadvertently reveals how little fundamental architectural innovation has occurred since GPT-2 — modern models are still stacked attention and feed-forward layers, with the real gains coming from scaling and training methods like RLVR.
Competitiveness in LLMs comes from scale, data quality, and fine-tuning rather than architecture, making the architectural differences cataloged less important than they might appear.
The gallery would be more useful with evolutionary context — a family tree or influence layout showing how architectures descended from each other, rather than presenting them as independent entries.