Johnny's Software Lab
The messy reality of SIMD (vector) functions
An optimizing compiler doesn’t help much with long instruction dependencies
Growing Buffers to Avoid Copying Data
Performance Debugging with llvm-mca: Simulating the CPU!
FIYA – Flamegraphs in Your App
Memory Subsystem Optimizations – The Remaining Topics
Speeding Up Convergence Loops. Or, on Vectorization and Precision Control
Latency-Sensitive Application and the Memory Subsystem Part 2: Memory Management Mechanisms
Latency-Sensitive Applications and the Memory Subsystem: Keeping the Data in the Cache
The pros and cons of explicit software prefetching