Ever since deep learning burst into the mainstream in 2012, the hype around AI research has often outpaced its reality. Over the past year though, a series of breakthroughs and major milestones suggest the technology may finally be living up to its promise.
Despite the obvious potential of deep learning, over the past decade the regular warnings about the dangers of runaway superintelligence and the prospect of technological unemployment were tempered by the fact that most AI systems were preoccupied with identifying images of cats or providing questionable translations from English to Chinese.
In the last year, however, there has been an undeniable step change in the capabilities of AI systems, in fields as varied as the creative industries, fundamental science, and computer programming. What’s more, these AI systems and their outputs are become increasingly visible and accessible to ordinary people.
Nowhere have the advances been more obvious than in the burgeoning field of generative AI, a catch-all term for a host of models muscling in on creative tasks.
This has been primarily thanks to a kind of model called a transformer, which was actually first unveiled by Google in 2017. Indeed, many of the AI systems that have made headlines this year are updates of models that their developers have been working on for some time, but the results they have produced in 2022 have blown previous iterations out of the water.
Most prominent among these is ChatGPT, an AI chatbot based on the latest version of OpenAI’s GPT-3 large language model. Released to the public at the end of November, the service has been wowing people with its uncanny ability to engage in natural-sounding conversations, answer complicated technical questions, and even produce convincing prose and poetry.
Earlier in the year, another OpenAI model called DALL-E 2 took the internet by storm with its ability to generate hyper-realistic images in response to prompts as bizarre as “a raccoon playing tennis at Wimbledon in the 1990s” and “Spider-Man from ancient Rome.” Meta took things a step further in September with a system that could produce short video clips from text prompts, and Google researchers have even managed to create an AI that can generate music in the style of an audio clip it is played.
The implications of this explosion in AI creativity and fluency are hard to measure right now, but they have already spurred predictions that it could replace traditional search engines, kill the college essay, and lead to the death of art.
This is as much due to the improving capabilities of these models as their increasing accessibility, with services like ChatGPT, DALL-E 2, and text-to-image generator Midjourney open to everyone for free (for now, at least). Going even further, the independent AI lab Stable Diffusion has even open-sourced their text-to-image AI, allowing anyone with a modestly powerful computer to run it themselves.
AI has also made progress in more prosaic tasks over the last year. In January, Deepmind unveiled AlphaCode, an AI-powered code generator that the company said could match the average programmer in coding competitions. In a similar vein, GitHub Co-pilot, an AI coding tool developed by GitHub and OpenAI, moved from a prototype to a commercial subscription service.
Another major bright spot for the field has been AI’s increasingly prominent role in fundamental science. In July, DeepMind announced that its groundbreaking AlphaFold AI had predicted the structure of almost every protein known to science, setting up a potential revolution in both the life sciences and drug discovery. The company also announced in February that it had trained its AI to control the roiling plasmas found inside experimental fusion reactors.
And while AI seems to be increasingly moving away from the kind of toy problems the field was preoccupied with over the past decade, it has also made major progress in one of the mainstays of AI research: games.
In November, Meta showed off an AI that ranked in the top 10 percent of players in the board game Diplomacy, which requires a challenging combination of strategy and natural language negotiation with other players. The same month, a team at Nvidia trained an AI to play the complex 3D videogame Minecraft using only high-level natural language instructions. And in December, DeepMind cracked the devilishly complicated game Stratego, which involves long-term planning, bluffing, and a healthy dose of uncertainty.
It’s not all been plain sailing, though. Despite the superficially impressive nature of the output of generative AI like ChatGPT, many have been quick to point out that they are highly convincing bullshit generators. They are trained on enormous amounts of text of variable quality from the internet. And ultimately all they do is guess what text is most likely to come after a prompt, with no capacity to judge the truthfulness of their output. This has raised concerns that the internet may soon be flooded with huge amounts of convincing-looking nonsense.
This was brought to light with the release of Meta’s Galactica AI, which was supposed to summarize academic papers, solve math problems, and write computer code for scientists to help speed up their research. The problem was that it would produce convincing-sounding material that was completely wrong or highly biased, and the service was pulled in just three days.
Bias is a significant problem for this new breed of AI, which is trained on vast tracts of material from the internet rather than the more carefully-curated datasets previous models were fed. Similar problems have surfaced with ChatGPT, which despite filters put in place by OpenAI can be tricked into saying that only white and Asian men make good scientists. And popular AI image generation app Lensa has been called out for sexualizing women’s portraits, particularly those of Asian descent.
Other areas of AI have also had a less-than-stellar year. One of the most touted real-world use cases, self-driving cars, has seen significant setbacks, with the closure of Ford and Volkswagen-backed Argo, Tesla fending off claims of fraud over its failure to deliver “full self-driving,” and a growing chorus of voices claiming the industry is stuck in a rut.
Despite the apparent progress that’s been made, there are also those, such as Gary Marcus, who say that deep learning is reaching its limits, as it’s not capable of truly understanding any of the material it’s being trained on and is instead simply learning to make statistical connections that can produce convincing but often flawed results.
But for those behind some of this year’s most impressive results, 2022 is simply a taste of what’s to come. Many predict that the next big breakthroughs will come from multi-modal models that combine increasingly powerful capabilities in everything from text to imagery and audio. Whether the field can keep up the momentum in 2023 remains to be seen, but either way this year is likely to go down as a watershed moment in AI research.
Image Credit: DeepMind / Unsplash