Beating NumPy's Matrix Multiplication in 150 lines of C code
This blog post has been archived and is no longer accessible. For the updated version with improved performance, please check Beating OpenBLAS in FP32 Matrix Multiplication.
This blog post has been archived and is no longer accessible. For the updated version with improved performance, please check Beating OpenBLAS in FP32 Matrix Multiplication.