Transformer Models

Architecture and applications of transformer neural networks, including BERT, GPT, and their variants for various machine learning tasks.

Reading List

Under the Hood

TimesFM: Google's Foundation Model for Time-Series Forecasting

Mar 31, 2026319

Google Research's TimesFM is a pretrained decoder-only foundation model that brings large-scale transformer efficiency to time-series forecasting.

Foundation Models Time-Series Forecasting Transformer Models Google Open Source

Under the Hood

MSA: Scaling LLM Context to 100M Tokens via Sparse Latent Memory

Mar 24, 2026

MSA is an end-to-end trainable framework that enables LLMs to process 100 million tokens efficiently using sparse attention and latent memory.

LLM Context Management Retrieval-Augmented Generation AI Architecture LLM Inference Transformer Models

Under the Hood

The LLM Architecture Gallery: Mapping the Evolution of Open-Weight Models

Mar 16, 2026383

A comprehensive technical reference gallery documenting the architectural evolution and specifications of modern open-weight large language models.

AI Architecture Foundation Models Mixture of Experts LLM Inference Transformer Models

Under the Hood

Turning BERT’s MLM Into a Text Diffusion Generator

Oct 20, 2025455

BERT-style MLM is a single-step text diffusion process, and extending it to multiple masking steps turns RoBERTa into a workable text generator.

Diffusion Models Natural Language Processing Text Generation Transformer Models

Under the Hood

When ‘Seahorse + Emoji’ Hits an Empty Token: Why LLMs Invent the Seahorse Emoji

Oct 6, 2025734

Models compose “seahorse + emoji,” but with no matching token the unembedding snaps to a nearby emoji, causing confident errors and occasional feedback loops.

AI Hallucinations AI Interpretability Transformer Models Tokenization

Under the Hood

SimpleFold: Scalable Flow-Matching Transformers for Protein Folding

Sep 27, 2025471

A large-scale, transformer-only, flow-matching approach makes protein folding simpler while staying competitive and practical.

AI for Science Computational Biology Transformer Models Open Source

Under the Hood

Why Embeddings Got Bigger—and Where Efficiency Pulls Them Next

Sep 5, 2025113

Embeddings got bigger with Transformers and APIs, but new efficiency techniques and infrastructure mean the future is about smarter—not just larger—dimensions.

Vector Embeddings Transformer Models Vector Databases Natural Language Processing

Under the Hood

Inside a Tiny GPT: A Visual Walkthrough of Autoregressive Prediction

Sep 5, 2025640

A visual, end-to-end demo of a tiny GPT that turns tokens into embeddings, runs them through transformers, and autoregressively predicts the next token to solve a simple sorting task.

Transformer Models LLM Inference Interactive Web Tools AI Interpretability