Definition:
Transformer /ˈtræns.fɔː.mə/ noun — In deep learning, a Transformer is a neural network architecture designed to process sequential data using a mechanism called self-attention, which enables the model to capture relationships between all elements in a sequence simultaneously, regardless of their distance.
Introduced in the landmark 2017 paper “Attention is All You Need” by Vaswani et al., Transformers have since become the dominant architecture for tasks in:
- Natural language processing (e.g., BERT, GPT, T5)
- Machine translation
- Text summarization and question answering
- Speech and audio modeling
- Vision tasks (e.g., Vision Transformers or ViTs)
Key components of a Transformer include:
- Self-attention mechanisms: weigh the importance of words in context
- Multi-head attention: captures information from different representation subspaces
- Positional encoding: adds information about token order
- Feed-forward layers and residual connections for deep representation learning
Transformers do not rely on recurrence or convolution, making them highly parallelizable and scalable. They enable large-scale pretraining and have set new performance benchmarks across numerous AI domains.
« Back to dictionary

