Welcome to the blog! Here, we discuss various topics on optimization techniques and high-performance computing.
A deep dive into various optimization techniques for matrix multiplication, including tiling, loop unrolling, parallelization, and AVX-512 SIMD on modern CPUs.
A Deep Dive into Matrix Multiplication on A100 GPUs: Exploring Naive, Shared Memory, and WMMA Tensor Core Approaches.
We explore the roofline analysis of matrix multiplication. The roofline plot provides insights into whether a program is memory-bound (limited by data transfer rate) or compute-bound (limited by available computational throughput).
We explore the anti-diagonal parallelization technique for the Needleman-Wunsch algorithm.