Model Quantization

Techniques for reducing the numerical precision of AI model weights to shrink memory footprint and accelerate inference, enabling large models to run on consumer hardware.

Reading List