Sequence & State-Space Models

Emerging architecture alternatives to transformers for processing long sequences efficiently, including state-space models and mixture-of-experts.