The Exponential Progress of Contrastive Learning in Self-Supervised Tasks
This blog was originally published by Ta-Ying Cheng on Towards Data Science.
After a few years of research steered towards the supervised domain of image recognition tasks, many have now turned to a much more unexplored territory: performing the same tasks through a self-supervised learning manner. One of the cornerstones that lead to the dramatic advancements in this seemingly impossible task is the introduction of contrastive learning losses. This blog dives into some of the recently proposed contrastive losses that have pushed the results of unsupervised learning to heights similar to supervised learning.
One of the earliest contrastive learning losses proposed was the InfoNCE loss by Oord et al. Their paper Representation Learning with Contrastive Predictive Coding proposed the following loss:
where the numerator is essentially the output of a positive pair, and the denominator is the sum of all value of positive and negative pairs. Ultimately, this simple loss forces the positive pairs to have a greater value (pushing the log term to 1 and hence less to 0) and negative pairs further apart.
SimCLR is the first paper to suggest using contrastive loss for self-supervised image recognition learning through image augmentations.
By generating positive pairs by doing data augmentation on the same image and vice versa, we can allow models to learn features to distinguish between images without explicitly providing any ground truths.
Momentum Contrast (MoCo)
The previous InfoNCE loss is proposed on a mini-batch of one positive and a number of negatives. He et al. extended this concept by portraying the contrastive learning as analogous to learning to match the best key with a given queue. The intuition led to the foundation of momentum contrast (MoCo), which is essentially a dictionary/memory network of key and values with key stored across multiple batches and slowly eliminating the oldest batch in a queue-like manner. This allows the training to be more stable as it is similar to a momentum where the change in keys is less drastic.
Decoupled Contrastive Learning
Previous papers in contrastive learning either required large batch sizes or a momentum mechanism. The recent paper decoupled contrastive learning (DCL) hope to change this by bringing a simple change to the original InfoNCE loss: simply removing the positive pair from the denominator.
While seemingly simple, DCL actually allows better convergence and ultimately formed a even better baseline compared to previous papers SimCLR and MoCo.
Testing Each Concept
Codes of the papers above have been provided by the authors. To test these concepts, one can simply download different datasets to see how well the unsupervised learning method works.
Our Open Datasets Community is particularly useful for retrieving datasets. The community organizes all the popular datasets (e.g., ImageNet, CIFAR100) so that you could easily find them and be redirected to our official websites. It is especially helpful when you are trying to build your own dataloaders.