RSS.Social

spatters.ca

follow: @[email protected]

Posts

Improving FP16/16 matmul accuracy with two-stage accumulation

Implementing a fast Tensor Core matmul on the Ada Architecture