Content
summary Summary

CLIPN teaches CLIP the "semantics of negations". This should help computer vision to recognize classes that were not part of the training data.

Ad

Computer vision models recognize objects in the images on which they were trained. In real-world applications, however, these models often encounter unknown objects outside their training data, leading to poor results. AI researchers have proposed several techniques to enable AI models to recognize when input is "out-of-distribution" (OOD)-that is, from unknown classes not seen during training. However, existing methods have limitations in identifying OOD examples that resemble known classes.

Researchers at the Hong Kong University of Science and Technology have now developed a new technique called CLIPN that aims to improve OOD detection by teaching the well-known CLIP model to reject unknown inputs. The basic idea is to use both positive and negative text cues along with user-defined training techniques to give CLIP a sense of when an input is OOD.

The challenge: hard-to-distinguish unknowns

Suppose a model has been trained on images of cats and dogs. If it is asked to process an image of a squirrel, the squirrel is an out-of-distribution class because it does not belong to the known classes of cats and dogs.

Ad
Ad

Many OOD detection methods evaluate how well an input matches known classes. However, these methods may misclassify the image of the squirrel as a cat or a dog because it has (some) visual similarities.

Image: Wang et al.

CLIPN therefore extends CLIP with new learnable "no" prompts and "no" text encoders to capture the semantics of negations. In this way, CLIP learns when and how to say "no" to recognize when an image falls outside its known classes. For example, the CLIPN technique teaches the model to say "No, that's not a cat/dog" in the case of the squirrel, marking the class as OOD.

In experiments, the team shows that CLIPN identifies OOD examples that standard CLIP does not. According to the researchers, CLIPN improves OOD detection in 9 reference datasets by up to nearly 12 percent compared to existing methods.

However, they say it is still unclear whether the method works on specialized datasets, such as medical or satellite images, and whether it is suitable for other applications, such as image segmentation.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at the Hong Kong University of Science and Technology are developing CLIPN, a technique for recognizing unknown object classes outside the training data of computer vision models.
  • CLIPN extends the well-known CLIP model by adding learnable "no" prompts and "no" text encoders to capture the semantics of negations and thereby reject unknown classes.
  • The technique shows improvements in detecting out-of-distribution examples compared to existing methods, although it is still unclear whether it works for specialized datasets such as medical or satellite imagery.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.