Ash's Blog
Fork Union: Beyond OpenMP in C++ and Rust?
Calling CUDA in 3000 Words
The Longest Nvidia PTX Instruction
Hiding x86 Port Latency for 330 GB/s/core Reductions π«£
Parsing JSON in C & C++: Singleton Tax
10x Faster C++ String Split, 16 Years Later π΄π»
The Next 31 Years of Developing Unum
Understanding SIMD: Infinite Complexity of Trivial Problems π₯
5x Faster Set Intersections: SVE2, AVX-512, & NEON π€
35% Discount on Keyword Arguments in Python π
NumPy vs BLAS: Losing 90% of Throughput
The Painful Pitfalls of C++ STL Strings π§΅
USearch Molecules: 28 Billion Chemical Embeddings on AWS βοΈ
Binding a C++ Library to 10 Programming Languages π
Python, C, Assembly - 2'500x Faster Cosine Similarity π
GCC Compiler vs Human - 119x Faster Assembly π»ππ§βπ»
Accelerating JavaScript arrays by 10x for Vector Search πΉ
Our CPython bindings got 5x faster without PyBind11 π
SciPy distances... up to 200x faster with AVX-512 & SVE π
Combinatorial Stable Marriages for DBMS Semantic Joins π
StringZilla: 5x faster strings with SIMD & SWAR π¦
Abusing Vector Search for Texts, Maps, and Chess βοΈ
Counting Strings in C++: 30x Throughput Difference π¬
We went through life with a smile π
Mastering C++ with Google Benchmark β±οΈ
Failing to Reach DDR4 Bandwidth π
Crushing CPUs with 879 GB/s Reductions in CUDA
Apple to Apple Comparison: M1 Max vs Intel π
Hyperscaler Shopping List: 2022 Data Center Tech Frenzy βοΈ
Only 1% of Software Benefits from SIMD Instructions
Artsakh Must Be Independent πΊοΈ
The 7 Sins of Turkish Autocracy πΉπ·
Armenia, Azerbaijan, Turkey. Who's the Aggressor? βοΈ
Come to Armenia π¦π²
Positive Outlook on the COVID-19 Crisis π·
Building AI Safely
What's Wrong with WWDC 2016 Keynote?
Hey, I'm Ash!
Talks & Lectures