CTrGAN: Cycle Transformers GAN for Gait Transfer

Shahar Mahpod, Noam Gaash, Hay Hoffman and Gil Ben-Artzi
Ariel University, Israel

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023

Paper

Github
(Coming soon ...)


CTrGAN transfers the poses of unseen source to the target, while maintaining the natural gait of the target. From left to right: (a) The source’s image is converted to (b) DensePose’s [20] IUV format. (c) Our model translates the IUV of the source to the corresponding most natural IUV pose of the target by synthesizing a novel pose. (d) The generated pose is very similar (but not identical) to an exiting real pose in the dataset. (e) The generated pose is rendered to a corresponding RGB image of the target.


Highlights


Abstract

We introduce a novel approach for gait transfer from unconstrained videos in-the-wild. In contrast to motion transfer, the objective here is not to imitate the source's motions by the target, but rather to replace the walking source with the target, while transferring the target's typical gait. Our approach can be trained only once with multiple sources and is able to transfer the gait of the target from unseen sources, eliminating the need for retraining for each new source independently. Furthermore, we propose novel metrics for gait transfer based on gait recognition models that enable to quantify the quality of the transferred gait, and show that existing techniques yield a discrepancy that can be easily detected. We introduce Cycle Transformers GAN (CTrGAN), that consist of a decoder and encoder, both Transformers, where the attention is on the temporal domain between complete images rather than the spatial domain between patches. Using a widely-used gait recognition dataset, we demonstrate that our approach is capable of producing over an order of magnitude more realistic personalized gaits than existing methods, even when used with sources that were not available during training. As part of our solution, we present a detector that determines whether a video is real or generated by our model.

Architecture



The generators of CTrGAN are based on transformers. The inputs to each generator are IUVA gait images from the training set and Keys. The outputs are natural gait poses.



CTrGAN consists of two branches that are connected cyclically, feature encoders and decoders and Transformers which perform self- and cross-attention between the features.


Results

Source
Object
Source
Pose (IUV)
V2V only
Re-targeted
Object
CTrGAN
Re-targeted Pose
CTrGAN + V2V
Re-targeted
Object




Gait recognition tools as Gait Transfer metric



GaitSet's distance matrix for one subject in the training set with (B) and without (A) applying our method. The darker the color the lower the value. It can be seen that before deploying our model, GaitSet easily distinguishes between the generated and real gait and can identify the true sources. After applying our approach, GaitSet identifies for most of the cases the generated gait as the real gait of subject three.

A. Without using CTrGAN


B. Using CTrGAN



Paper


CTrGAN: Cycle Transformers GAN for Gait Transfer, Shahar Mahpod, Noam Gaash, Hay Hoffman, Gil Ben-Artzi
Arxiv



Acknowledgements

The website template is from here.