Why Teachers for AI is The Next Big Thing

Image Credit: Roger Filomeno
The Transformer model, currently the dominant architecture in AI, faces an efficiency crisis. While effective at processing information and generating text, its quadratic complexity limits its ability to handle continuous, real-world tasks. Unlike human learning, which accumulates experience, standard AI models typically reset after each interaction.
Transformer models process information by comparing every word in a sequence to every other word. This quadratic approach causes computational effort to increase exponentially with input length. As the context window grows, the system becomes computationally overwhelmed, necessitating a memory reset. Workarounds like context window expansion and caching offer only temporary patches.
Subquadratic architectures propose a fundamental shift. These designs mimic the human brain’s ability to filter, compress, and reconstruct interactions, retaining only relevant information. Google’s research on Titans suggests active development in this area.
These architectures promise three major advancements. First, models could handle longer, complex interactions without constant resets, enabling AI to retain information across extended periods. Second, limitless context windows allow for personalized models that accumulate specific user knowledge and experiences.

Finally, subquadratic architectures facilitate distributed AI. Current large models require centralized computing for rapid communication. Architectures that handle larger computation chunks reduce reliance on physical proximity, potentially democratizing AI training.
A significant shift is likely within the next few years. Major laboratories are expected to transition to subquadratic foundation models, moving away from Transformer dominance. This represents a fundamental rethinking of information processing, leading to more capable, personalized, and distributed artificial intelligence.
Last modified: 23 Jan 2026