Linformer

Last updated:26th January 2026

Created:26th January 2026

Linformer (2020) is built on the observation that the self-attention matrix is "low-rank" and so contains a lot of redundancy. The authors introduce two learned projection matrices $E$ and $F$ . These matrices project the Key and Value inputs along the sequence length dimension, rather than the feature dimension, reducing them from the typical $n \times d$ to $k \times d$ , where $k$ is a much smaller fixed-size number. The Query matrix (Q) remains its original size $n \times d$ to ensure the output sequence maintains the correct length. The attention calculation ends up being:

\text{Attention} = \text{softmax}\left(\frac{Q⋅(K_\text{projected})^T}{\sqrt{d_k}}\right) \cdot V_\text{projected}

Because $k$ is a fixed constant much smaller than $n$ , the complexity drops from $O(n^2)$ to $O(nk)$ . Since $k$ does not grow with the input, this is effectively linear complexity $O(n)$ .

Tags: AI