∫ntegrabℓε ∂ifferentiαℓs
Tian et al (2019) FCOS
Digital Camera Noise
Vasu et al (2023) MobileOne
Kirillov et al (2023) Segment Anything
Dosovitskiy et al (2021) An Image is Worth 16x16 Words
An Intuitive Analogy to Attention Operation
Kwon et al (2023) PagedAttention
Ainslie et al (2023) GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Shazeer (2019) Fast Transformer Decoding. One Write-Head is All You Need
Normalization Zoo