Here’s a small library I made to explain the Transformer architecture in PyTorch: willGuimont/transformers. The code is heavily commented and should be easy to follow.

This repository also contains some popular transformer-based models, such as Vision Transformer.