



In MobileAI CVPRW 2022 
Paper 
Poster 
C++ Implementation 


We present a novel approach for accelerating convolutions during inference for CPUbased architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix multiplication (GEMM) with a matrix of weights. This results in two main drawbacks: (a) im2col requires a large memory buffer and can experience inefficient memory access, and (b) while GEMM is highly optimized for scientific matrices multiplications, it is not well suited for convolutions. We propose an approach that takes advantage of scalarmatrix multiplication and reduces memory overhead. Our experiments with commonly used network architectures demonstrate a significant speedup compared to existing indirect methods. 
Execution times (in seconds) 

The 2D output of convolution of an input tensor 𝐼 of size ℎ × 𝑤
with kernel of size k_{h} × k_{w}
can be considered as summation of k_{h} ∗ k_{w} shifted versions of the input tensor 𝐼,
with corresponding submatrices of size ℎ′ × 𝑤′ multiplied by corresponding coefficient.
Therefore, we consecutively extract the submatrices 𝑇_{𝑗} 

Try our code! 


SMMConv: Scalar Matrix Multiplication with Zero Packing for Accelerated Convolution, A. Ofir, G. BenArtzi IEEE 
Acknowledgements 