SMM-Conv Accelerated Convolution

SMM-Conv: Scalar Matrix Multiplication with Zero Packing for Accelerated Convolution

Amir Ofir

Gil Ben-Artzi

Ariel University, Israel

In Mobile-AI CVPRW 2022

Paper

Poster

C++ Implementation

Highlights

Acceleration of convolutions on CPU-based architectures.
Reduction of memory overhead.
Parallel version as well.

Abstract

We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix multiplication (GEMM) with a matrix of weights. This results in two main drawbacks: (a) im2col requires a large memory buffer and can experience inefficient memory access, and (b) while GEMM is highly optimized for scientific matrices multiplications, it is not well suited for convolutions. We propose an approach that takes advantage of scalar-matrix multiplication and reduces memory overhead. Our experiments with commonly used network architectures demonstrate a significant speedup compared to existing indirect methods.

Performance

Execution times (in seconds)

Method

The 2D output of convolution of an input tensor 𝐼 of size ℎ × 𝑤 with kernel of size k_h × k_w can be considered as summation of k_h ∗ k_w shifted versions of the input tensor 𝐼, with corresponding sub-matrices of size ℎ′ × 𝑤′ multiplied by corresponding coefficient. Therefore, we consecutively extract the sub-matrices 𝑇_𝑗𝑐, 𝑗∈[𝑘_𝑤] which consist of all the rows of the 𝐼 and 𝑤’ columns, 𝐼[𝑐, 1:ℎ, 𝑗:𝑗+𝑤′−1] and multiply each sub-matrix of size ℎ′ × 𝑤′ with the corresponding kernel weight and sum.

Try our code!

Scalability

Paper

SMM-Conv: Scalar Matrix Multiplication with Zero Packing for Accelerated Convolution, A. Ofir, G. Ben-Artzi
IEEE

Acknowledgements

The website template is from here.