LayerDropBack: A Universally Applicable Approach for Accelerating Training of Deep Networks

Evgeny Hershkovitch Neiterman

Gil Ben-Artzi

School of Computer Science, Ariel University

Paper

Code

Highlights

Significant speedup: Reducing training time by over 20%, with mean speedups ranging from 16.93% for fine-tuning to 23.3% for training from scratch.
Universal applicability: Compatible with various architectures (ViT, Swin Transformer, EfficientNet, DLA) without requiring modifications.
Forward consistency: Preserves forward pass integrity and ensures identical network behavior during training and inference.
Performance preservation: Maintains or enhances model accuracy while significantly reducing training time.

Abstract

Training very deep convolutional networks is challenging, requiring significant computational resources and time. Existing acceleration methods often depend on specific architectures or require network modifications. We introduce LayerDropBack (LDB), a simple yet effective method to accelerate training across a wide range of deep networks. LDB introduces randomness only in the backward pass, maintaining the integrity of the forward pass, guaranteeing that the same network is used during both training and inference. LDB can be seamlessly integrated into the training process of any model without altering its architecture, making it suitable for various network topologies. Our extensive experiments across multiple architectures (ViT, Swin Transformer, EfficientNet, DLA) and datasets (CIFAR-100, ImageNet) show significant training time reductions of 16.93% to 23.97%, while preserving or even enhancing model accuracy. Code is available at https://github.com/neiterman21/LDB.

Method

LDB aims to provide a balance between full network training and the regularization effects of partial backpropagation. It alternates between standard Stochastic Gradient Descent (SGD) and a modified SGD with layer-wise dropout during backpropagation, with updated batch size and learning rate for stabilization of the dropping process.

Highlights

Abstract

Method

Results