Inference Optimization

Techniques for running models efficiently at inference time, from quantization to batching and sampling strategies.