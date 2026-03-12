Meta's JEPA architecture outperforms standard AI methods in noisy medical imaging
Researchers have built an AI model for cardiac ultrasound based on Meta's JEPA architecture that outperforms common approaches like Masked Autoencoders and contrastive learning in their benchmarks.
Most AI models for image and video analysis either reconstruct masked pixels or learn by matching image-text pairs. Both approaches dominate computer vision. An international research team from the University of Toronto, the Vector Institute, and the University of Chicago now claims a third method can beat both: the JEPA architecture proposed by Yann LeCun and his team during his time at Meta.
Their model, EchoJEPA, was trained on 18 million ultrasound videos from 300,000 patients, according to the paper. Standard approaches like Masked Autoencoders hide parts of an image and force the model to reconstruct the missing pixels as faithfully as possible. The model has to learn exactly what the image looks like, including all noise and artifacts. JEPA takes a different approach: it also masks parts of the image, but instead of trying to reconstruct the actual pixels, it predicts an abstract representation of the hidden region - essentially a compressed summary of what's there conceptually. The model doesn't need to know what a patch looks like exactly, just what it means.
Ultrasound is a stress test for vision models
Ultrasound images are full of noise. Speckle patterns, shadows, and intensity fluctuations obscure the actual cardiac anatomy. A model that has to reconstruct pixels inevitably learns this noise as well. JEPA is designed to sidestep this problem because, according to the researchers, it focuses on temporally stable structures like heart chambers and wall motion.
To isolate the effect of the architecture, the researchers trained a JEPA model and a pixel reconstruction model with identical data, identical size, and identical compute budget. The JEPA model performed 27 percent better at estimating cardiac pump function, according to the paper. For ultrasound view classification, it reached 79 percent accuracy with just one percent of labeled data, while the best alternative managed only 42 percent with all labeled data. Under simulated image corruptions, EchoJEPA's performance dropped by 2.3 percent, while competing models degraded by up to 16.8 percent.
Without any training on pediatric hearts, the model outperformed all baselines that had been explicitly fine-tuned for that task, according to the researchers.
A strong data point, but not yet proof
The results come from the researchers' own benchmarks. The strongest model was trained on proprietary data and isn't publicly available. Only a smaller variant trained on public data has been released. The robustness tests used synthetic corruptions, not real clinical conditions. Whether the sometimes dramatic advantages hold up in practice remains to be seen.
That said, the controlled comparison using identical architecture, data, and compute budget is methodologically sound and delivers more than anecdotal evidence. The paper represents one of the first large-scale real-world tests of the JEPA architecture outside of Meta's own benchmarks. Whether the approach proves superior in other domains - or whether cardiac ultrasound, with its high noise levels, is a particularly favorable edge case - remains an open question. V-JEPA 2 is another model that has shown promising results, though.
LeCun is not involved in EchoJEPA, but he's now using the ideas behind JEPA to build world models at his new AI startup AMI Labs. The company recently raised close to a billion dollars in Europe's largest seed funding round.
