Deep Cross Modal Learning for Caricature Verification and Identification (CaVINet)

Jatin Garg*
Skand Vishwanath Peri*
Himanshu Tolani*
Narayanan.C.Krishnan
Indian Institute of Technology, Ropar, India
* equal contribution
In ACM Multimedia, 2018
[Download Paper]
[Github Code]


Learning from different modalities is a challenging task that involves determining a shared space that bridges the two modalities. In this paper, we look at the challenging problem of cross modal face verification and recognition between caricature and visual image modalities. Caricature is a modality with images having exaggerations of facial features of a person. Due to the significant variations in the caricatures, building vision models for recognizing and verifying data from this modality is an extremely challenging task. Visual images with significantly lesser amount of distortions can act as a bridge for the analysis of caricature modality. To advance the research in this field, we have created a publicly available large Caricature-VIsual dataset [CaVI] with images from both the modalities. The dataset captures the rich variations in the caricature of an identity. This paper presents the first cross modal architecture that is able to handle extreme distortions present in caricatures using a deep learning network that learns similar representations across the modalities. We use two convolutional networks along with transformations that are subjected to orthogonality constraints to capture the shared and modality specific representations. In contrast to prior research, our approach neither depends on manually extracted facial landmarks for learning the representations, nor on the identities of the person for performing verification. The learned shared representation achieves 91% accuracy for verifying unseen images and 75% accuracy on unseen identities. Further, recognizing the identity in the image by knowledge transfer using a combination of shared and modality specific representations, resulted in an unprecedented performance of 85% rank-1 accuracy for caricatures and 95% rank-1 accuracy for visual images.



CaVI Dataset

Please find the dataset in the link below. Note that the dataset has to be used for research purpose only .
[Link To Dataset]


CaVINet Architecture

A brief summary of our approach is shown below.





Paper and Bibtex

[Paper]  [ArXiv]

Citation
 
Jatin Garg*, Skand Vishwanath Peri*, Himanshu Tolani*, Narayanan.C.Krishnan
Deep Cross Modal Learning for Caricature Verification and Identification (CaVINet)
In Proceedings of the 2018 ACM Multimedia..

[Bibtex]
  @inproceedings{CaVInetACMMM18,
      Author = {Garg, Jatin and
      Peri, Skand Vishwanath and Tolani, Himanshu and
      Krishna, Narayanan.C },
      Title = {Deep Cross Modal Learning for Caricature Verification and Identification (CaVINet)},
      Booktitle = {Proceedings of the 2018 ACM Conference on Multimedia},
      Year = {2018},
      publisher = {ACM}
  }


Acknowledgements

The authors gratefully acknowledge NVIDIA for the hardware grant. This research is supported by the Department of Science and Technology, India under grant YSS/2015/001206


Template : this