Skip to main content
21:16

LeJEPA: Self-Supervised Learning Gets a Theoretical Foundation

Audio overview of LeJEPA — how Balestriero and LeCun proved isotropic Gaussian embeddings are optimal and distilled it into a 50-line self-supervised method.

LeJEPA: Self-Supervised Learning Gets a Theoretical Foundation
0:000:00

Self-supervised learning has worked remarkably well in practice, with methods like DINO and I-JEPA pushing the frontier. The problem: nobody fully understood why the specific combination of stop-gradients, EMA teachers, and asymmetric augmentation was necessary. Remove one piece and training collapses.

This episode covers LeJEPA, Balestriero and LeCun’s paper that provides a theoretical answer. The core result: isotropic Gaussian embeddings are provably optimal for downstream tasks. From that, they derive SIGReg — a differentiable Gaussian regulariser using the Epps-Pulley characteristic function test — and build a full self-supervised method in roughly 50 lines of PyTorch. No stop-gradient. No teacher network. No EMA schedule.

The audio covers the theory, the implementation, and the competitive ImageNet results from a method that’s refreshingly principled.

Read the full post →