Wojciech Muła --- website
Change case of UTF-32-encoded strings
LoongArch64 subjective higlights
SIMD binary heap operations
AVX512: printing u64 as binary
Drawing trees
Building full-text search in Javascript
SIMD parallel bits deposit/extract
Dividing unsigned 16-bit numbers
Dividing unsigned 8-bit numbers
Myriad sequences of RISC-V code
RISC-V Vector Extension overview
Simple suggestions using popcount
AVX-512 conflict detection without resolving conflicts
Modern perfect hashing for strings
SIMD-ized faster parse of IPv4 addresses
SWAR find any byte from set
AVX512: finding first byte in lanes
Finding lowest common ancestor of two nodes
Faster fractional exponents
Converting binary fraction to ratio
AVX512: count trailing zeros
AVX512: check if value belongs to a set
AVX512: generating constants
AVX512: histogram of sixteen nibbles
Faster hack
Fast parsing HTTP verbs
DDoS-ed by a service runs on AWS
AVX512VBMI2 and packed varuint format
Parsing hex numbers with validation
Bit test and reset vs compilers
Conversion uint32 into decimal without division nor multiplication
How to check if any word is zero
Autovectorization status in MSVC in 2021
Counting byte in byte stream with AVX512BW instructions
How to detect if all bytes in SIMD register are the same?
Autovectorization status in GCC & Clang in 2021
Use AVX512 to calculate binomial coefficient
Use AVX512 Galois field affine transformation for bit shuffling
AVX512 8-bit positional population count procedure
SIMDization of switch statements
Malloc internal memory fragmentation footprint
Auto-vectorization status in GCC, Clang, ICC and MSVC
SIMDized counting byte in byte stream
std::function and overloaded functions
pyahocorasick stabilisation story
C++ --- how to read a file into a string
AVX512VBMI --- remove spaces from text
Python --- file modification time perils
SIMDized sum of all bytes in the array --- part 2: signed bytes
How many uops are there?
A short report from code::dive 2018
Speeding up multiple vector operations using SIMD
SIMD --- why you shouldn't use static vector constants
SIMDized sum of all bytes in the array
SIMDized check which bytes are in a set
Finding index of the minimum value using SIMD instructions
AVX512 mask registers support in compilers
AVX512 implementation of JPEG zigzag transformation
Be careful with directory_iterator
Parsing series of integers with SIMD
Accidental recursion
Is sorted using SIMD instructions
When lock does not lock --- C++ story
An awful part of C++17
Intersection of ordered sets
SSE/AVX: absolute value of difference of unsigned integers
Is power of two --- BMI1 version
A short report from code::dive 2017
ARM Neon and Base64 encoding & decoding
AVX512 --- first bit set in a large array
SWAR check if all chars are digits
Population count using XOP instructions
SIMD-friendly algorithms for substring searching
What does AVX512 conflict detection do?
Detecting bit patterns with series of zeros followed by ones
Byte-wise alignr in AVX512F
GNU std::string::find is very slow
Sorting an AVX512 register
AVX512F base64 coding and decoding
SIMD bit mask
Building a bitmask
Base64 encoding & decoding using AVX512BW instructions
Implementing byte-wise lookup table with PSHUFB
Base64 decoding with SIMD instructions
Base64 encoding with SIMD instructions
Speeding up letter case conversion
Fast conversion of floating-point values to string
Base64 encoding --- implementation study
Benefits from the obsession
Implicit conversion --- the enemy
Another C++ nasty feature
Short report from code::dive 2015
Boolean function for the rescue
Tricky mistake
Speeding up bit-parallel population count
SIMD-ized searching in unique constant dictionary
SIMD: detecting a bit pattern
Compiler warnings are your future errors
AVX512: ternary functions evaluation
SSE/AVX2: Generating mask where n leading (trailing) bytes are set
Not everything in AVX2 is 256-bit
Using SSE to convert from hexadecimal ASCII to number
Parsing decimal numbers --- part 2: SSE
Parsing decimal numbers --- part 1: SWAR
Using PEXT to convert from hexadecimal ASCII to number
Using PEXT to convert from binary ASCII to number
Conversion numbers to octal representation
Determining if an integer is a power of 2 --- part 2
Conditionally fill word (for limited set of input values)
Small win over compiler
Interpolation search revisited
Software emulation of PDEP
Conversion numbers to hexadecimal representation
Conversion numbers to binary representation
C++ bitset vs array
Quick and dirty ad-hoc git hosting
Is const-correctness paranoia good?
Scalar version of SSE move mask instruction
SIMD-friendly Rabin-Karp modification
C++ standard inaccuracy
Integer log 10 of an unsigned integer --- SIMD version
Mask for zero/non-zero bytes
GCC --- asm goto
Slow-paths in GNU libc strstr
Penalties of errors in SSE floating point calculations
x86 - ISA where 80% of instructions are unimportant
I accidentally created an infinite loop
Calculate floor value without FPU/SSE instruction
Convert float to int without FPU/SSE
fopen a directory
x86 extensions are useless
Problems with PDO for PostgreSQL on 32-bit machines
Encoding array of unsigned integers
FBSTP --- the most complex instruction in x86 ISA
Short story about PostgreSQL SUM function
PostgreSQL --- faster reads from static tables
PostgreSQL: printf in PL/pgSQL
SSE: trie lookup speedup
Detecting intersection of convex polygons in 2D
PHP quirk
Average of two unsigned integers
Speeding up LIKE '%text%' queries (at least in PostgeSQL)
SSE: conversion integers to decimal representation
Traversing DAGs
DAWG as dictionary? Yes!
Python: C extensions --- sequence-like object
Efficient trie representation
Python: test if object is iterable
Traversing tree without stack
Branchless set mask if value greater or how to print hex values
Speedup reversing table of bytes
Determining if an integer is a power of 2
Brenchless conditional exchange
STL: map with string as key --- access speedup
Fill word with selected bit
Branchless signum
Transpose bits in byte using SIMD instructions
PostgreSQL: get selected rows with given order
Join locate databases
SSE4.1: PHMINPOSUW --- insertion sort
SSSE3: PMADDUBSW and image crossfading
SSE: conversion uint32 to float
Floating point tricks
RDTSC on Core2
PABSQ --- absolute value of two singed 64-bit numbers
GCC asm constraints
SSSE3/SSE4: alpha blending --- operator over
SSE4: grater/less or equal relations for unsigned bytes/words
16bpp/15bpp to 32bpp pixel conversions --- different methods
SSE: modify 32bpp images with lookup tables
SSE4 string search --- modification of Karp-Rabin algorithm
SSSE3: fast popcount
SSSE3: printing hex values