Why Teachers for AI is The Next Big Thing

AI Edited Photo by Yan Krukau: https://www.pexels.com/photo/a-class-having-a-recitation-8199166/

Have you ever felt like your brand new, top-of-the-line smartphone is already starting to feel a little… slow? That’s kind of what’s happening in the world of Artificial Intelligence right now, according to a recent discussion. The dominant technology powering most of today’s impressive AI, the Transformer model, might be heading for early retirement, replaced by something far more efficient and powerful: subquadratic architectures.

Think of today’s AI models, built on Transformer technology, You talk to them and they are brilliant. They can access and process vast amounts of information to answer your questions and generate creative text. But when it comes to real-world, ongoing tasks, they can fall short. Why? Because, unlike those human graduates who will gain experience and become increasingly useful, these AI models have a tragically short lifespan in terms of continuous interaction. Their life experience, in essence, is often limited to a single chat session.

The core issue lies in how Transformer models process information. Imagine you meet someone new and they say Hi. Now, picture yourself not just responding instinctively, but meticulously comparing that single hi to every single life experience you’ve ever had to decide on the perfect reply. Sounds exhausting, right? That’s essentially what a Transformer does with every word in a sequence. It compares every single word… with every single other word, including itself. This quadratic approach, where the computational effort increases exponentially with the length of the input, works fine for shorter interactions. But as the conversation (or context window) grows, it becomes computationally overwhelming – leading to that metaphorical head explosion and the need to essentially restart the AI’s memory.

We’ve seen clever workarounds, like trying to patch this by expanding the context window through techniques like caching. Think of it like giving our overthinking hi-responder a notebook with summarized past experiences to speed things up. While helpful, it’s still a patch on the underlying architecture. As the video points out, even a massive potential 10 million token context window isn’t close to how humans process the constant stream of sensory information and years of accumulated knowledge.

This is where subquadratic architectures come in, promising a fundamental shift. Instead of remembering and comparing everything, these new designs aim to mimic the human brain’s remarkable ability to ignore, forget, compress, and reconstruct previous interactions. Imagine your brain filtering out the countless sights, sounds, and thoughts of a day, retaining only the truly relevant information. You don’t remember every single detail of this morning’s commute, but you recall the important bits that might inform future decisions. Subquadratic architectures strive for this kind of selective processing.

The goal is to prevent the AI’s state (its accumulated understanding) from growing indefinitely, just like our brains don’t store every single fleeting thought with equal weight. Google’s research, hinted at in the Titans paper, suggests they’re at the forefront of developing these new architectures. While the exact details remain under wraps (cutting-edge tech papers are becoming less common), the direction is clear.

What does this mean for the future? The implications are huge.

Firstly, we could see vastly more capable and efficient AI models that can handle much longer and more complex interactions without needing constant resets. Imagine an AI that can truly learn and retain information across days, weeks, or even longer conversations, just like a human assistant would.

Secondly, there are exciting possibility of more valuable personalized models. If an AI has a virtually limitless context window, you could essentially educate your own instance, feeding it specific knowledge and experiences to make it uniquely tailored to your needs. This could even lead to a market for educated AI models.

The Creator 2023
We may find ourselves raising AI like our children to teach them (The Creator 2023).

Finally, subquadratic architectures could pave the way for more distributed AI. Currently, training large AI models requires massive centralized computing power due to the need for rapid communication between processing units. With architectures that can handle much larger chunks of computation before needing to communicate, the physical proximity of these units becomes less critical. This could democratize AI training, allowing for contributions from a wider range of sources.

I predict a significant shift within the next couple of years. By the end of 2025, every major player is expected to be working on subquadratic foundation models, and by the end of the following year, Transformer models could become largely a thing of the past.

So, just like we eventually traded in our clunky old phones for smarter, more efficient devices, the AI landscape is poised for a major upgrade. Subquadratic architectures aren’t just a minor improvement; they represent a fundamental rethinking of how AI processes information, promising a future of more capable, personalized, and potentially more distributed artificial intelligence.