05/31/2026
Maybe the data-efficiency gap is not a scaling problem.
Maybe it is an objective problem.
A striking preprint by Daniel J. Korchinski, Alessandro Favero, and Matthieu Wyart offers a sample-complexity theory for this shift:
Learn from your own latents and not from tokens.
The core problem is familiar:
modern generative models are extraordinary, but brutally data-hungry.
LLMs train on 10¹³–10¹⁴ tokens.
Children do not.
So the question is not only:
How do we scale models?
It is:
What are we asking them to predict?
Most of modern AI trains on the visible surface: next tokens, masked tokens, pixels, noise.
That works.
But it may be statistically inefficient for learning hierarchy.
The authors study a tractable hierarchical grammar where visible tokens are generated from a hidden latent tree of depth L — a stylized model for the compositional structure of language and images.
The result reframes the debate:
token-level learning requires samples exponential in L to recover the hidden tree.
latent prediction recovers it with sample complexity essentially constant in L, up to logarithmic factors.
In plain English:
predicting tokens forces the model to infer the hierarchy through the leaves.
predicting latents lets the model climb the tree.
Once one abstraction level is learned, it becomes the substrate for learning the next.
This is why data2vec and JEPA-style objectives are so interesting.
They do not merely reconstruct the input.
They train a network to predict its own latent representation of another view or masked region.
The target is no longer the surface.
The target is the model’s own emerging abstraction.
The paper validates the theory three ways:
a hierarchical clustering algorithm
an end-to-end neural architecture trained by gradient descent
a sample-complexity analysis of data2vec, showing it implicitly performs hierarchical latent prediction
One implication is provocative:
if data2vec already discovers hierarchy implicitly, explicit stacking schemes such as H-JEPA may be partly redundant.
This is not “next-token prediction is dead.”
Next-token prediction built the current era.
But if the goal is biological-level data efficiency, surface reconstruction may be the expensive path.
The strategic frontier may be latent self-prediction:
models learning not only from what they see,
but from the abstractions they are forming.
Full credit to the authors:
Daniel J. Korchinski, Alessandro Favero, Matthieu Wyart.
Paper:
Learn from your own latents and not from tokens: A sample-complexity theory
https://arxiv.org/abs/2605.27734
I’m attaching the first page because the abstract is worth reading closely.
The future of data-efficient AI may not be more tokens.
It may be better targets.