Transformer attention is typically multi-head to:
Answer options
A
Reduce model parameters
B
Capture different relations using different projection subspaces
C
Remove positional info
D
Enforce Gaussian priors
Correct answer: Capture different relations using different projection subspaces
Explanation
Quick AnswerThe correct answer is Capture different relations using different projection subspaces because it directly addresses the core logic of Generative AI.
The correct answer is: Capture different relations using different projection subspaces.