Montreal.AI

Montreal.AI MONTREAL.AI
Public intelligence for the AGI-first era. Email Us : [email protected]
Website : http://www.montreal.ai/

Frontier AI · AGI · Sovereign AI · AI Debate · Machine labor · Agents · Governance · Institutional memory

Official website:
https://montreal.ai Welcome to the www.Montreal.AI FB Page :

Moteur économique de en

MONTRÉAL.AI (EST’B’D 2003) : Montreal AI-First Conglomerate SOLVING Toughest Challenges with Deep Learning. Under Montréal.AI’s umbrella, multiple companies are being structu

red to push boundaries and to orchestrate tangible benefits for everybody :

— Montréal.AI Academy
— Montréal.AI Consultancy
— Montréal.AI Institute
— Montréal.AI Law
— Montréal.AI VR

Montréal.AI is explicitly working toward the development of transformative AI.

Maybe the data-efficiency gap is not a scaling problem.Maybe it is an objective problem.A striking preprint by Daniel J....
05/31/2026

Maybe the data-efficiency gap is not a scaling problem.

Maybe it is an objective problem.

A striking preprint by Daniel J. Korchinski, Alessandro Favero, and Matthieu Wyart offers a sample-complexity theory for this shift:

Learn from your own latents and not from tokens.

The core problem is familiar:

modern generative models are extraordinary, but brutally data-hungry.

LLMs train on 10¹³–10¹⁴ tokens.

Children do not.

So the question is not only:

How do we scale models?

It is:

What are we asking them to predict?

Most of modern AI trains on the visible surface: next tokens, masked tokens, pixels, noise.

That works.

But it may be statistically inefficient for learning hierarchy.

The authors study a tractable hierarchical grammar where visible tokens are generated from a hidden latent tree of depth L — a stylized model for the compositional structure of language and images.

The result reframes the debate:

token-level learning requires samples exponential in L to recover the hidden tree.

latent prediction recovers it with sample complexity essentially constant in L, up to logarithmic factors.

In plain English:

predicting tokens forces the model to infer the hierarchy through the leaves.

predicting latents lets the model climb the tree.

Once one abstraction level is learned, it becomes the substrate for learning the next.

This is why data2vec and JEPA-style objectives are so interesting.

They do not merely reconstruct the input.

They train a network to predict its own latent representation of another view or masked region.

The target is no longer the surface.

The target is the model’s own emerging abstraction.

The paper validates the theory three ways:

a hierarchical clustering algorithm

an end-to-end neural architecture trained by gradient descent

a sample-complexity analysis of data2vec, showing it implicitly performs hierarchical latent prediction

One implication is provocative:

if data2vec already discovers hierarchy implicitly, explicit stacking schemes such as H-JEPA may be partly redundant.

This is not “next-token prediction is dead.”

Next-token prediction built the current era.

But if the goal is biological-level data efficiency, surface reconstruction may be the expensive path.

The strategic frontier may be latent self-prediction:

models learning not only from what they see,

but from the abstractions they are forming.

Full credit to the authors:
Daniel J. Korchinski, Alessandro Favero, Matthieu Wyart.

Paper:
Learn from your own latents and not from tokens: A sample-complexity theory
https://arxiv.org/abs/2605.27734

I’m attaching the first page because the abstract is worth reading closely.

The future of data-efficient AI may not be more tokens.

It may be better targets.

Language models may not need longer context.They may need sleep.A fascinating new paper by Sangyun Lee, Sean McLeish, To...
05/27/2026

Language models may not need longer context.

They may need sleep.

A fascinating new paper by Sangyun Lee, Sean McLeish, Tom Goldstein, and Giulia Fanti proposes one of the most biologically resonant ideas in long-context AI:

sleep-like memory consolidation.

The problem is subtle.

Transformers can store recent context in a KV cache, but attention scales poorly with length.

SSM-attention hybrids offer fixed-size fast-weight memories, but memory capacity is not the same as reasoning capacity.

A model may “store” evicted context and still fail to compute with it.

That distinction matters.

The authors show that when reasoning depth increases, vanilla SSM-attention hybrids can degrade even when the amount of information to store is held fixed.

So the bottleneck is not just memory.

It is the computation required to transform transient context into a useful internal state.

Their solution:

Let the model sleep.

When the context window fills, the model enters an offline consolidation phase. It performs N recurrent passes over the accumulated context, updates persistent fast weights inside its SSM blocks through a learned local rule, then clears the KV cache.

During wake-time inference, prediction remains single-pass.

Extra compute is paid during sleep, not at the moment of response.

That is the key architectural move:

attention for recent high-fidelity access
fast weights for compressed long-range memory
sleep for transforming memory into reasoning-ready state

The biological analogy is more than branding.

In animals, sleep helps consolidate short-term memories into longer-term cortical structure.

Here, “sleep” consolidates evicted context into fast weights before the model wakes up and continues.

The paper tests this on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, plus GSM-Infinite, a realistic math reasoning benchmark.

Across these settings, increasing sleep duration improves performance most clearly on examples requiring deeper reasoning.

The deeper lesson:

Long-context AI is not only about retaining more tokens.

It is about allocating enough computation to metabolize context.

A static cache remembers.

A sleeping model reorganizes.

If this line of work holds, future agents may not process endless streams in one uninterrupted wake state.

They may alternate between wakeful interaction and offline consolidation.

Read.
Sleep.
Compress.
Reason.
Wake.

Full credit to the authors:
Sangyun Lee, Sean McLeish, Tom Goldstein, Giulia Fanti.

Paper:
Language Models Need Sleep
https://arxiv.org/abs/2605.26099

I’m attaching the first page because the abstract is worth reading closely.

The future of long-horizon AI may not be infinite attention.

It may be models that know when to sleep.

Mathematics may be entering a new regime:not AI you believe,AI you verify.A major Google DeepMind paper presents AlphaPr...
05/25/2026

Mathematics may be entering a new regime:

not AI you believe,

AI you verify.

A major Google DeepMind paper presents AlphaProof Nexus, a framework for AI-driven formal proof search in Lean.

The point is not that an LLM can write convincing mathematical prose.

That has always been the weak version of the story.

The point is that the system must produce proof code that survives a formal verifier.

LLMs generate.
Lean checks.
Search continues.
Only machine-verified proofs remain.

That changes the epistemic contract.

In informal mathematics, an AI-generated proof can look elegant while hiding a fatal gap.

In Lean, every step must compile. No rhetoric. No handwaving. No “seems plausible.”

The authors report the first large-scale evaluation of this approach on open research-level problems.

Their most capable agent autonomously resolved 9 of 353 open Erdős problems, including two questions open for 56 years, at a per-problem inference cost of a few hundred dollars.

It also proved 44 of 492 OEIS conjectures and is being deployed in combinatorics, optimization, graph theory, algebraic geometry, and quantum optics.

The architecture is fascinating.

A mathematician provides a Lean formalization.
The agent refines proof sketches.
LLM subagents propose lemmas, decompositions, constructions, and edits.
Lean rejects invalid steps.
AlphaProof can be called as a focused prover.
An evolutionary population of proof sketches is ranked and reused.
The final output is a sorry-free Lean proof.

This is not “chatbot solves math.”

It is closer to a new research instrument:

a search engine over formal proof space,
guided by generative models,
grounded by a compiler,
and audited by mathematics itself.

The deeper lesson is general:

AI systems become far more powerful when unreliable generation is wrapped in reliable verification.

For mathematics, the verifier is Lean.

For other domains, the frontier question becomes:

what is the equivalent of a compiler for truth?

Full credit to the authors:
George Tsoukalas, Anton Kovsharov, Sergey Shirobokov, Anja Surina, Moritz Firsching, Gergely Bérczi, Francisco J. R. Ruiz, Arun Suggala, Adam Zsolt Wagner, Eric Wieser, Lei Yu, Aja Huang, Miklós Z. Horváth, Andrew Ferrauiolo, Henryk Michalewski, Codrut Grosu, Thomas Hubert, Matej Balog, Pushmeet Kohli, Swarat Chaudhuri.

Paper:
Advancing Mathematics Research with AI-Driven Formal Proof Search
https://arxiv.org/abs/2605.22763

I’m attaching the first page because the abstract is worth reading closely.

The future of AI in mathematics may not be models we trust.

It may be agents whose work can be checked.

The next clue in AI reasoning:answers may be attractors.A new paper from Benhao Huang, Zhengyang Geng, and Zico Kolter i...
05/23/2026

The next clue in AI reasoning:

answers may be attractors.

A new paper from Benhao Huang, Zhengyang Geng, and Zico Kolter introduces Equilibrium Reasoners (EqR) — a sharp mechanistic view of test-time scaling in latent reasoning models.

The core idea is simple, but deep:

Reasoning is not only generation.
Reasoning can be convergence.

EqR repeatedly updates a latent state. The authors hypothesize that generalizable reasoning emerges when training shapes the model’s latent dynamics so that stable attractors correspond to valid solutions.

In other words, the answer is not merely “produced.”

It is reached.

This matters because test-time compute only helps when the model’s internal dynamics know how to use it. More iterations can improve reasoning — or make it worse — depending on whether the trajectory moves toward a solution-aligned attractor or falls into a spurious one.

EqR scales along two axes:

Depth: run more iterations so a trajectory can settle.

Breadth: run multiple stochastic trajectories from different initializations and select/aggregate the ones that converge best.

The first-page figure captures the punchline beautifully: training is capped at 16 iterations, yet the learned dynamics extrapolate beyond 1,024 iterations at test time. As fixed-point residual falls, accuracy rises.

On Sudoku-Extreme, the paper reports a jump from 2.6% exact accuracy for feedforward models to over 99% with scalable latent reasoning — equivalent to unrolling up to ~40,000 layers. On Maze, EqR reaches 93.0%.

But the benchmark is not the most interesting part.

The most interesting part is the lens:

Correct answers must become stable.
They must be reachable.
And convergence itself can become a signal.

That gives the field a more precise language for test-time compute than “let the model think longer.”

Not longer text.
Not an external verifier.
Not task-specific search priors.

A learned attractor landscape.

This feels important because modern AI is moving from static inference toward adaptive computation. The question is no longer only “how much compute should we spend?”

It is:

What internal dynamics make extra compute useful?

Full credit to the authors:
Benhao Huang, Zhengyang Geng, Zico Kolter.

Paper:
Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning
https://arxiv.org/abs/2605.21488v1

I’m attaching the first page because Figure 1 is worth studying closely.

The future of reasoning may not only be models that generate better answers.

It may be models whose internal states learn where correct answers live — and how to converge there.

The tokenizer is an architectural prior disguised as preprocessing.And almost everyone has been treating it like plumbin...
05/23/2026

The tokenizer is an architectural prior disguised as preprocessing.

And almost everyone has been treating it like plumbing.

A new paper by Jan Tempus, Philip Whittington, Craig W. Schmidt, Dennis Komm, and Tiago Pimentel changes the frame:

Tokenisation via Convex Relaxations

The core question:

What are the units a language model is allowed to think in?

Most systems still use greedy tokenizers like BPE or Unigram. They merge what looks best locally, step by step, until the vocabulary budget is spent.

That works surprisingly well.

But it is not global optimization.

ConvexTok asks: can tokenization be formulated as a real optimization problem?

The answer is yes.

The authors recast tokenization as a graph problem, express the compression objective as an integer program, relax it into a linear program, solve it with convex optimization tools, then round the fractional solution back into a usable tokenizer.

That is the shift:

from greedy merges
to global structure

from heuristic vocabulary construction
to polyhedral optimization

from “this tokenizer seems good”
to “we can certify how close it is to optimal”

That last point matters most.

ConvexTok gives a lower bound on the best possible compression under the chosen objective. So the field can ask a rigorous question:

How much optimality are we leaving on the table?

At common vocabulary sizes, the authors empirically find ConvexTok within about 1% of optimal. It consistently improves intrinsic tokenization metrics and language-model bits-per-byte. Downstream task gains are less uniform, which is the right nuance: compression matters, but it is not the whole story.

This is not “BPE was foolish.”

Actually, one of the most interesting takeaways is that BPE is a strong greedy baseline, often close to optimal at larger vocabulary sizes.

But ConvexTok gives us something BPE cannot:

a certificate
a geometry
a measurable gap

Tokenization shapes sequence length, compression, vocabulary use, training dynamics, multilingual behavior, and the granularity of representation.

It is not a neutral front end.

It is one of the first inductive biases imposed on every language model.

Full credit to the authors:
Jan Tempus, Philip Whittington, Craig W. Schmidt, Dennis Komm, Tiago Pimentel.

Paper: Tokenisation via Convex Relaxations
https://arxiv.org/abs/2605.22821

I’m attaching the first page because the abstract is worth reading closely.

The next gains in AI may not only come from bigger models.

They may come from optimizing the layers we mistook for infrastructure.

⚜️✨ MONTRÉAL.IA / MONTREAL.AI is live on Eventbrite.Public Intelligence for the AGI-first → ASI-first Era.Briefings. Deb...
05/23/2026

⚜️✨ MONTRÉAL.IA / MONTREAL.AI is live on Eventbrite.

Public Intelligence for the AGI-first → ASI-first Era.

Briefings. Debates. Archives. Public record.

MONTREAL.AI convenes the public-intelligence forum for frontier AI, sovereign intelligence, governance, safety, assurance, and institutional memory — alongside QUEBEC.AI, Québec’s sovereign AI flagship enterprise for AI-first transformation, sovereign AI infrastructure, autonomous agents, and strategic AI governance.

Follow for upcoming events:
https://MontrealAI.eventbrite.com

Science has a hidden frontier.Not the frontier of what is true.The frontier of what is thinkable.A remarkable new prepri...
05/23/2026

Science has a hidden frontier.

Not the frontier of what is true.

The frontier of what is thinkable.

A remarkable new preprint by Alejandro H. Artiles, Martin Weiss, Levin Brinkmann, Iyad Rahwan, Bernhard Schölkopf, Christopher Pal, Hugo Larochelle, Anirudh Goyal, and Nasim Rahaman gives that frontier a name:

The Alien Space of Science.

The premise is powerful:

Some research directions are coherent with the literature, but cognitively unavailable to the people currently working in the field.

Not because researchers lack intelligence.

Because fields have structure.

They have preferred tools, shared intuitions, fashionable problems, common datasets, institutional incentives, collaborator networks, and inherited taste.

Those structures make some ideas easy to imagine.

And others almost invisible.

Modern LLMs inherit the same bias. Ask for “novel ideas,” and they often recombine the dense regions of the literature—the concepts the field already finds natural.

This paper targets the complementary region.

Not nonsense.
Not random novelty.
Not weirdness for its own sake.

Coherent directions outside the community prior.

The method is beautifully concrete.

The authors analyze 16,068 peer-reviewed LLM papers and distill them into 273 reusable “idea atoms.”

Then they learn two models:

Coherence: could this combination of atoms form a viable research direction?

Availability: is any existing author community positioned to produce it?

Alien sampling searches for the frontier between them:

high coherence
low availability

That is the move.

Separate “scientifically plausible” from “currently likely to be imagined.”

The results are striking: the sampler explores a 3.5–7× broader effective atom vocabulary than frontier LLM ideation baselines, without sacrificing coherence, and produces ideas that match or exceed those baselines under blind LLM, human, and downstream experimental evaluation.

This is not just an idea generator.

It is a blind-spot detector for science.

And the implication is bigger than one domain.

If AI-assisted discovery only accelerates what current communities already know how to ask, it increases throughput.

But if it can surface coherent directions that no existing community is naturally positioned to see, it changes the search space itself.

That is the difference between faster science and expanded science.

Full credit to the authors:
Alejandro H. Artiles, Martin Weiss, Levin Brinkmann, Iyad Rahwan, Bernhard Schölkopf, Christopher Pal, Hugo Larochelle, Anirudh Goyal, Nasim Rahaman.

Paper: The Alien Space of Science
https://arxiv.org/abs/2603.01092

I’m attaching the first page because the abstract is worth reading closely.

The frontier is not only what science knows.

It is what science is not yet organized to think.

Address

350 Prince-Arthur West, Suite #2105
Montreal, QC
H2X3R4

Alerts

Be the first to know and let us send you an email when Montreal.AI posts news and promotions. Your email address will not be used for any other purpose, and you can unsubscribe at any time.

Contact The Business

Send a message to Montreal.AI:

Share