Transformer & Attention Internals

How the transformer architecture works internally, from attention mechanisms to positional encoding and encoder-decoder designs.