Meta AI Chief LeCun: LLMs Simplistic, Focus on Real-World AI
In the rapidly evolving field of Artificial Intelligence, much public attention remains fixed on Large Language Models (LLMs). However, Yann LeCun, Chief AI Scientist at Meta, is advocating for a shift in focus, asserting that the future of advanced AI lies beyond the current capabilities of LLMs.
LeCun, a pioneer in deep learning, has openly expressed his diminishing interest in LLMs, viewing them as a “simplistic way of viewing reasoning.” While acknowledging their incremental improvements through increased data and computational power, he believes the truly transformative advancements in AI will emerge from four critical areas:
Understanding the Physical World: Developing machines that can intuitively grasp the nuances of real-world physics and interactions.
Persistent Memory: Creating AI systems with the capacity for long-term, accessible memory.
Reasoning: Moving beyond current rudimentary forms of reasoning to more sophisticated, intuitive methods.
Planning: Enabling AI to plan sequences of actions to achieve specific goals, mirroring human cognitive processes.
LeCun suggests that the tech community, currently captivated by LLMs, will likely turn its attention to these “obscure academic papers” within the next five years.
The Limitations of Token-Based Systems
A fundamental limitation of current LLMs, according to LeCun, stems from their token-based approach. Tokens, which typically represent a finite set of discrete possibilities (like words or sub-word units), are well-suited for language. However, the physical world is “high-dimensional and continuous.”
Humans acquire “world models” early in life, enabling an intuitive understanding of cause and effect – for instance, how pushing an object from different points yields different results. Replicating this intuitive grasp of physics with systems designed to predict discrete tokens is profoundly difficult. Attempts to train AI by predicting high-dimensional, continuous data like video at a pixel level have largely proven inefficient, consuming vast resources to invent unpredictable details. Many aspects of reality are inherently unpredictable at a granular level, making pixel-level reconstruction a wasteful endeavor.
Introducing Joint Embedding Predictive Architectures (JAPA)
LeCun posits that the solution lies in Joint Embedding Predictive Architectures (JAPA). Unlike generative models that attempt detailed pixel-level reconstruction, JAPA focuses on learning “abstract representations” of data.
In JAPA, an input (e.g., a video segment or an image) is processed by an encoder to create an abstract representation. A transformed version of the input is also encoded. The system then makes predictions within this “representation space” (or latent space), rather than in the raw input space. This is akin to “filling in the blank” in a more abstract, semantic manner. This approach helps overcome the “collapse problem” where systems might ignore input and produce uninformative representations.
For agentic systems capable of reasoning and planning, JAPA offers a powerful mechanism. A JAPA-based predictor could observe the current state of the world and anticipate the “next state given a hypothetical action.” This enables planning sequences of actions to achieve desired outcomes, mirroring human cognitive processes. LeCun contrasts this with current “agentic reasoning systems” that generate numerous token sequences and then select the best one – a method he deems “completely hopeless” for anything beyond short sequences due to its exponential scaling. True reasoning, he argues, occurs in an abstract mental state, not by “kicking tokens around.”
A practical example is Meta’s Video Joint Embedding Predictive Architecture (VJA) project. Trained on short video segments, VJA can predict representations of full videos from masked versions, demonstrating an ability to detect whether a video is “physically possible or not.” By measuring prediction error, it can flag “unusual” events, much like a baby is surprised by objects defying gravity.
The Road to Advanced Machine Intelligence (AMI)
LeCun prefers the term Advanced Machine Intelligence (AMI) over Artificial General Intelligence (AGI), acknowledging the specialized nature of human intelligence. He estimates that a “good handle” on AMI at a small scale could be achieved within three to five years, with human-level AI potentially arriving within a decade. However, he cautions against historical over-optimism, dismissing the notion that merely scaling LLMs or generating thousands of token sequences will lead to human-level intelligence as “nonsense.”
A significant bottleneck is data. LLMs are trained on vast amounts of text, equivalent to hundreds of thousands of years of reading. In contrast, a four-year-old child processes an equivalent amount of data through vision in just 16,000 hours, highlighting the immense efficiency of visual learning. This disparity underscores that AGI cannot be achieved solely by training from text. The key to unlocking AMI, according to LeCun, is discovering the “good recipe” for training JAPA architectures at scale, similar to the foundational breakthroughs that enabled deep neural networks and transformers.
AI’s Current Impact and Future Challenges
Despite the focus on future paradigms, LeCun emphasizes AI’s already immense positive impact. In science and medicine, AI is transforming drug design, protein folding, and medical imaging, reducing MRI scan times and pre-screening for tumors. In automotive, AI-powered driving assistance and emergency braking systems are significantly reducing collisions. AI is primarily serving as a “power tool,” augmenting human productivity and creativity across various domains.
However, widespread deployment faces challenges in “accuracy and reliability,” particularly in applications where mistakes can be critical, such as autonomous driving. LeCun notes that AI often falters not in basic techniques but in reliable integration. Yet, for many applications where errors are not disastrous (e.g., entertainment, education), AI that is “right most of the time” is already highly beneficial.
Regarding the “dark side” of AI like deepfakes, LeCun expresses optimism. Meta’s experience suggests no significant increase in nefarious generative content, despite LLM availability. He believes the “countermeasure against misuse is just better AI” – systems with common sense, reasoning capacity, and the ability to assess their own reliability.
The Indispensable Role of Open Source and Global Collaboration
A core tenet of LeCun’s philosophy is the absolute necessity of open-source AI platforms. He emphasizes that “good ideas come from the interaction of a lot of people and the exchange of ideas,” as no single entity holds a monopoly on innovation. Meta’s commitment to open-source, exemplified by PyTorch and LLaMA, fosters a thriving ecosystem of startups and enables global collaboration.
Open-source AI is crucial for the future because it allows for:
Diversity of AI Assistants: A single handful of companies cannot provide the diversity of AI assistants needed for a future where AI mediates nearly every digital interaction. Diverse assistants are required to understand varied languages, cultures, and value systems.
Distributed Training: No single entity will collect all the world’s data. Future models will be open-source foundation models trained in a distributed fashion, with global data centers accessing subsets of data to train a “consensus model.”
Fine-Tuning on Proprietary Data: Open-source models like LLaMA allow companies to download and fine-tune them on their own proprietary data without uploading it, supporting specialized vertical applications and startup business models.
Hardware: Fueling the Next AI Revolution
The journey towards AMI and sophisticated world models will demand ever-increasing computational power. While GPUs have seen exponential advancements, the computational expense of reasoning in abstract space means that continuous hardware innovation is essential.
LeCun remains largely skeptical of neuromorphic hardware, optical computing, and quantum computing for general AI tasks in the near future, citing the digital semiconductor industry’s deep entrenchment. However, he sees promise in Processor-in-Memory (PIM) or analog/digital processor and memory technologies for specific “edge computation” scenarios, such as low-power visual processing in smart glasses. This approach mimics biological systems like the retina, which processes immense visual data at the sensor to compress it before transmission, highlighting that data movement, not just computation, often consumes the most energy.
Ultimately, LeCun envisions a future where AI systems serve as “power tools” that augment human capabilities, not replace them. Our relationship with future AI will be one of command, with humans directing a “staff of super-intelligent virtual people.” This collaborative future, driven by open research and open-source platforms, will leverage global contributions to create a diverse array of AI assistants that enhance daily life.