LeJEPA: Self-Supervised Learning Gets a Theoretical Foundation
Audio overview of LeJEPA — how Balestriero and LeCun proved isotropic Gaussian embeddings are optimal and distilled it into a 50-line self-supervised method.
Companion to article: Lejepa Self Supervised Learning Gets A Theoretical Foundation
Self-supervised learning has worked remarkably well in practice, with methods like DINO and I-JEPA pushing the frontier. The problem: nobody fully understood why the specific combination of stop-gradients, EMA teachers, and asymmetric augmentation was necessary. Remove one piece and training collapses.
This episode covers LeJEPA, Balestriero and LeCun’s paper that provides a theoretical answer. The core result: isotropic Gaussian embeddings are provably optimal for downstream tasks. From that, they derive SIGReg — a differentiable Gaussian regulariser using the Epps-Pulley characteristic function test — and build a full self-supervised method in roughly 50 lines of PyTorch. No stop-gradient. No teacher network. No EMA schedule.
The audio covers the theory, the implementation, and the competitive ImageNet results from a method that’s refreshingly principled.