Wojciech Muła --- website

follow: @[email protected]

Posts

Change case of UTF-32-encoded strings

LoongArch64 subjective higlights

SIMD binary heap operations

AVX512: printing u64 as binary

Drawing trees

Building full-text search in Javascript

SIMD parallel bits deposit/extract

Dividing unsigned 16-bit numbers

Dividing unsigned 8-bit numbers

Myriad sequences of RISC-V code

RISC-V Vector Extension overview

Simple suggestions using popcount

AVX-512 conflict detection without resolving conflicts

Modern perfect hashing for strings

SIMD-ized faster parse of IPv4 addresses

SWAR find any byte from set

AVX512: finding first byte in lanes

Finding lowest common ancestor of two nodes

Faster fractional exponents

Converting binary fraction to ratio

AVX512: count trailing zeros

AVX512: check if value belongs to a set

AVX512: generating constants

AVX512: histogram of sixteen nibbles

Faster hack

Fast parsing HTTP verbs

DDoS-ed by a service runs on AWS

AVX512VBMI2 and packed varuint format

Parsing hex numbers with validation

Bit test and reset vs compilers

Conversion uint32 into decimal without division nor multiplication

How to check if any word is zero

Autovectorization status in MSVC in 2021

Counting byte in byte stream with AVX512BW instructions

How to detect if all bytes in SIMD register are the same?

Autovectorization status in GCC & Clang in 2021

Use AVX512 to calculate binomial coefficient

Use AVX512 Galois field affine transformation for bit shuffling

AVX512 8-bit positional population count procedure

SIMDization of switch statements

Malloc internal memory fragmentation footprint

Auto-vectorization status in GCC, Clang, ICC and MSVC

SIMDized counting byte in byte stream

std::function and overloaded functions

pyahocorasick stabilisation story

C++ --- how to read a file into a string

AVX512VBMI --- remove spaces from text

Python --- file modification time perils

SIMDized sum of all bytes in the array --- part 2: signed bytes

How many uops are there?

A short report from code::dive 2018

Speeding up multiple vector operations using SIMD

SIMD --- why you shouldn't use static vector constants

SIMDized sum of all bytes in the array

SIMDized check which bytes are in a set

Finding index of the minimum value using SIMD instructions

AVX512 mask registers support in compilers

AVX512 implementation of JPEG zigzag transformation

Be careful with directory_iterator

Parsing series of integers with SIMD

Accidental recursion

Is sorted using SIMD instructions

When lock does not lock --- C++ story

An awful part of C++17

Intersection of ordered sets

SSE/AVX: absolute value of difference of unsigned integers

Is power of two --- BMI1 version

A short report from code::dive 2017

ARM Neon and Base64 encoding & decoding

AVX512 --- first bit set in a large array

SWAR check if all chars are digits

Population count using XOP instructions

SIMD-friendly algorithms for substring searching

What does AVX512 conflict detection do?

Detecting bit patterns with series of zeros followed by ones

Byte-wise alignr in AVX512F

GNU std::string::find is very slow

Sorting an AVX512 register

AVX512F base64 coding and decoding

SIMD bit mask

Building a bitmask

Base64 encoding & decoding using AVX512BW instructions

Implementing byte-wise lookup table with PSHUFB

Base64 decoding with SIMD instructions

Base64 encoding with SIMD instructions

Speeding up letter case conversion

Fast conversion of floating-point values to string

Base64 encoding --- implementation study

Benefits from the obsession

Implicit conversion --- the enemy

Another C++ nasty feature

Short report from code::dive 2015

Boolean function for the rescue

Tricky mistake

Speeding up bit-parallel population count

SIMD-ized searching in unique constant dictionary

SIMD: detecting a bit pattern

Compiler warnings are your future errors

AVX512: ternary functions evaluation

SSE/AVX2: Generating mask where n leading (trailing) bytes are set

Not everything in AVX2 is 256-bit

Using SSE to convert from hexadecimal ASCII to number

Parsing decimal numbers --- part 2: SSE

Parsing decimal numbers --- part 1: SWAR

Using PEXT to convert from hexadecimal ASCII to number

Using PEXT to convert from binary ASCII to number

Conversion numbers to octal representation

Determining if an integer is a power of 2 --- part 2

Conditionally fill word (for limited set of input values)

Small win over compiler

Interpolation search revisited

Software emulation of PDEP

Conversion numbers to hexadecimal representation

Conversion numbers to binary representation

C++ bitset vs array

Quick and dirty ad-hoc git hosting

Is const-correctness paranoia good?

Scalar version of SSE move mask instruction

SIMD-friendly Rabin-Karp modification

C++ standard inaccuracy

Integer log 10 of an unsigned integer --- SIMD version

Mask for zero/non-zero bytes

GCC --- asm goto

Slow-paths in GNU libc strstr

Penalties of errors in SSE floating point calculations

x86 - ISA where 80% of instructions are unimportant

I accidentally created an infinite loop

Calculate floor value without FPU/SSE instruction

Convert float to int without FPU/SSE

fopen a directory

x86 extensions are useless

Problems with PDO for PostgreSQL on 32-bit machines

Encoding array of unsigned integers

FBSTP --- the most complex instruction in x86 ISA

Short story about PostgreSQL SUM function

PostgreSQL --- faster reads from static tables

PostgreSQL: printf in PL/pgSQL

SSE: trie lookup speedup

Detecting intersection of convex polygons in 2D

PHP quirk

Average of two unsigned integers

Speeding up LIKE '%text%' queries (at least in PostgeSQL)

SSE: conversion integers to decimal representation

Traversing DAGs

DAWG as dictionary? Yes!

Python: C extensions --- sequence-like object

Efficient trie representation

Python: test if object is iterable

Traversing tree without stack

Branchless set mask if value greater or how to print hex values

Speedup reversing table of bytes

Determining if an integer is a power of 2

Brenchless conditional exchange

STL: map with string as key --- access speedup

Fill word with selected bit

Branchless signum

Transpose bits in byte using SIMD instructions

PostgreSQL: get selected rows with given order

Join locate databases

SSE4.1: PHMINPOSUW --- insertion sort

SSSE3: PMADDUBSW and image crossfading

SSE: conversion uint32 to float

Floating point tricks

RDTSC on Core2

PABSQ --- absolute value of two singed 64-bit numbers

GCC asm constraints

SSSE3/SSE4: alpha blending --- operator over

SSE4: grater/less or equal relations for unsigned bytes/words

16bpp/15bpp to 32bpp pixel conversions --- different methods

SSE: modify 32bpp images with lookup tables

SSE4 string search --- modification of Karp-Rabin algorithm

SSSE3: fast popcount

SSSE3: printing hex values