Vishal Misra: Transformers learn correlations, not causations, the significance of in-context learning, and the role of Bayesian updating in AI

Key Takeaways

Transformers primarily learn correlations, not causations, limiting their ability to achieve true intelligence.
Achieving AGI requires models that can transition from learning correlations to understanding causations.
Large language models generate text by predicting the next token based on probability distributions.
The context provided in prompts significantly influences the output of language models.
Language models operate on sparse matrices where many token combinations are nonsensical.
In-context learning allows LLMs to solve problems in real-time using examples.
Domain-specific languages (DSLs) can simplify complex database queries into natural language.
In-context learning in LLMs is similar to Bayesian updating, adjusting probabilities with new evidence.
The debate between Bayesian and frequentist approaches affects the perception of new machine learning models.
The Bayesian wind tunnel concept offers a controlled environment for testing machine learning architectures.
Understanding the mechanics of LLMs is crucial for leveraging their applications effectively.
The transition from correlation to causation is a significant hurdle in AI development.
Contextual relevance in LLMs highlights the importance of prompt selection.
Sparse matrices in language models enhance efficiency by filtering out irrelevant token combinations.
The Bayesian wind tunnel provides a novel framework for evaluating machine learning models.

Guest intro

Vishal Misra is Professor of Computer Science and Electrical Engineering and Vice Dean of Computing and AI at Columbia University’s School of Engineering. He returns to the a16z Podcast to discuss his latest research revealing how transformers in LLMs update predictions in a precise, mathematically predictable manner as they process new information. His work highlights the gap to AGI, emphasizing the need for continuous post-training learning and causal understanding over pattern matching.

Understanding transformers and LLMs

Transformers update their predictions in a mathematically predictable way

— Vishal Misra
LLMs primarily learn correlations rather than causations, which limits their intelligence.
Pattern matching is not intelligence; LLMs learn correlation, not causation

— Vishal Misra
Achieving AGI requires models that can learn causations, not just correlations.
To get to AGI, we need the ability to keep learning after training

— Vishal Misra
LLMs generate text by constructing a probability distribution for the next token.
Given a prompt, it’ll come up with a distribution of what the next token should be

— Vishal Misra
Understanding the mechanics of LLMs is crucial for leveraging their applications effectively.

The role of context in language models

The behavior of language models is influenced by the prior context provided in prompts.
Depending on whether you pick synthesis or shake, the next row looks very different

— Vishal Misra
Contextual relevance in LLMs highlights the importance of prompt selection.
Language models operate on a sparse matrix where many combinations of tokens are nonsensical.
Fortunately, this matrix is very sparse because an arbitrary combination of these tokens is gibberish

— Vishal Misra
Sparse matrices enhance efficiency by filtering out irrelevant token combinations.
The context provided can drastically change the output of language models.
Understanding how language models generate text based on input prompts is essential.

In-context learning and real-time problem solving

In-context learning allows LLMs to learn and solve problems in real-time.
In-context learning is when you show the LLM something it has kind of never seen before

— Vishal Misra
LLMs process and learn from new information through examples.
In-context learning resembles Bayesian updating, adjusting probabilities with new evidence.
LLMs are doing something which resembles Bayesian updating

— Vishal Misra
This mechanism is crucial for understanding the capabilities of LLMs.
Real-time problem solving in LLMs is enabled by in-context learning.
The ability to learn from examples showcases the adaptability of LLMs.

Domain-specific languages and data accessibility

Domain-specific languages (DSLs) convert natural language queries into a processable format.
I designed DSL, a domain-specific language, which converted queries about cricket stats

— Vishal Misra
DSLs simplify complex database queries into natural language.
The creation of DSLs showcases innovation in using AI for specific applications.
Understanding the challenges of querying complex databases is essential.
DSLs enhance user interactions with data by simplifying query processes.
The development of DSLs highlights the role of AI in data accessibility.
This approach provides a technical solution to common problems in data accessibility.

Bayesian updating and statistical approaches in AI

In-context learning in language models resembles Bayesian updating.
You see something, you see new evidence, you update your belief about what’s happening

— Vishal Misra
Understanding Bayesian inference is crucial for grasping how LLMs process information.
The distinction between Bayesian and frequentist approaches affects AI model perceptions.
There have been camps of Bayesian and frequentist in probability and machine learning

— Vishal Misra
The debate between these approaches impacts the reception of new research.
Bayesian updating provides a clear mechanism for in-context learning in LLMs.
This statistical concept links well-established methodologies with modern AI processes.

The Bayesian wind tunnel and model testing

The Bayesian wind tunnel concept allows for testing machine learning architectures.
We came up with this idea of a Bayesian wind tunnel

— Vishal Misra
This concept provides a controlled environment for evaluating models.
Testing architectures like transformers, MAMBA, LSTMs, and MLPs is facilitated by this framework.
Understanding the concept of a wind tunnel in aerospace helps grasp its application in AI.
The Bayesian wind tunnel offers a novel framework for advancing machine learning.
This approach is critical for evaluating and improving AI models.
The controlled testing environment enhances the reliability of model assessments.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

Source link

Vishal Misra: Transformers learn correlations, not causations, the significance of in-context learning, and the role of Bayesian updating in AI

Key Takeaways

Guest intro

Understanding transformers and LLMs

The role of context in language models

In-context learning and real-time problem solving

Domain-specific languages and data accessibility

Bayesian updating and statistical approaches in AI

The Bayesian wind tunnel and model testing

Leave feedback about this Cancel Reply

PROS

CONS

Popular Categories

Recent Posts

$600B gone in 30 minutes — inside crypto’s fastest-ever flash crash

CZ Releases Memoir | Satoshi Revealed? | Crypto Catch Up | April 5 – 11, 2026

Vishal Misra: Transformers learn correlations, not causations, the significance of in-context learning, and the role of Bayesian updating in AI

Key Takeaways

Guest intro

Understanding transformers and LLMs

The role of context in language models

In-context learning and real-time problem solving

Domain-specific languages and data accessibility

Bayesian updating and statistical approaches in AI

The Bayesian wind tunnel and model testing

Share This Post:

Leave feedback about this Cancel Reply

PROS

CONS

Related Post