Content
summary Summary

With Segment Anything, Meta releases a powerful AI model for image segmentation that can serve as a central building block for future AI applications.

Meta's Segment Anything Model (SAM) has been trained on nearly 11 million images from around the world and a billion semi-automated segmentations. The goal was to develop a "foundation model" for image segmentation, and Meta says it has succeeded. Such foundation models are trained on large amounts of data, achieving generalized capabilities that allow them to be used in many specialized use cases with little or no training. The success of large pre-trained language models such as GPT-3 sparked the trend toward such models.

Video: Meta

Once trained, SAM can segment previously unknown objects in any image and can be controlled by various inputs: SAM can automatically scan the entire image, users can mark areas to be segmented, or click on specific objects. SAM should also be able to handle text since Meta integrates a CLIP model into its architecture in addition to the Vision Transformer, which initially processes the image.

Ad
Ad

Nvidia researcher Jim Fan calls SAM the "GPT-3 moment" in computer vision.

Meta's SAM for everything and the XR future

Meta sees many applications for SAM, such as being part of multimodal AI systems that can understand visual and text content on web pages or segment small organic structures in microscopy.

Video: Meta

In the XR domain, SAM could automatically segment objects, view a human wearing an XR headset, and selected objects could then be converted into 3D objects by models such as Meta's MCC.

Recommendation

Video: Meta

SAM could also be used to aid scientific study of natural occurrences on Earth or even in space, for example, by localizing animals or objects to study and track in video. We believe the possibilities are broad, and we are excited by the many potential use cases we haven’t even imagined yet.

Meta

In the accompanying paper, the authors compare SAM to CLIP: like OpenAI's multimodal model, they say SAM is explicitly designed to serve as a building block in larger AI models, enabling numerous applications.

Segment Anything dataset and demo available

At one point, Fan's GPT-3 comparison gets stuck: unlike OpenAI's language model, Meta's SAM is open source. In addition to the model, Meta also releases the SA-1B training dataset used.

It contains six times more images than previously available datasets and 400 times more segmentation masks. The data was collected in a human-machine collaboration in which SAM iteratively generated better and better segmentations from human-generated training data, which were then repeatedly corrected by humans.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

SAM is available on GitHub and can be tried out via a demo.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Meta's Segment Anything Model (SAM) is a foundational model for image segmentation that can segment virtually any object in any image.
  • Meta sees applications for SAM in many areas, such as understanding web pages, XR headsets, and scientific research in biology or space.
  • Meta releases the model, the huge training dataset, and a demo.
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.