Linformer (2020) is built on the observation that the self-attention matrix is "low-rank" and so contains a lot of redundancy. The authors introduce two learned projection matrices and . These matrices project the Key and Value inputs along the sequence length dimension, rather than the feature dimension, reducing them from the typical to , where is a much smaller fixed-size number. The Query matrix (Q) remains its original size to ensure the output sequence maintains the correct length. The attention calculation ends up being:

Because is a fixed constant much smaller than , the complexity drops from to . Since does not grow with the input, this is effectively linear complexity .
Tags: AI