∫ntegrabℓε ∂ifferentiαℓs

follow: @[email protected]

Posts

Tian et al (2019) FCOS

Digital Camera Noise

Vasu et al (2023) MobileOne

Kirillov et al (2023) Segment Anything

Dosovitskiy et al (2021) An Image is Worth 16x16 Words

An Intuitive Analogy to Attention Operation

Kwon et al (2023) PagedAttention

Ainslie et al (2023) GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Shazeer (2019) Fast Transformer Decoding. One Write-Head is All You Need

Normalization Zoo