Many of the most exciting discoveries in science involve highly specialised knowledge and making connections between far-flung facts. Scientists must combine deep analysis with broad reasoning strategies.
As in many information-rich tasks, researchers are looking to artificial intelligence (AI) systems to speed up their work. AI tools may be able to support key steps such as generating ideas, reviewing existing work and analysing data.
The latest systems use large language models (LLMs) to allow scientists to interact naturally and directly with the vast body of knowledge captured in words in the scientific literature.
But as two new systems described in papers just published in Nature show, when it comes to science, language alone can only go so far.
What AI is doing to science
A number of organisations, such as Sakana AI, are trying to automate the entire scientific process. To date, these efforts have largely focused on computer science, where “experiments” mainly involved designing and writing code.
However, the Agents4Science conference organised at Stanford last October showcased a broader range of AI-generated papers. They covered topics from mechanical engineering and protein design to a system called BadScientist which deliberately produced “convincing but unsound” research.
I have previously raised concerns about the impacts of AI scientists on the scientific ecosystem. Recent work validates these concerns, showing increased quantity but lower quality of both papers and peer reviews, identifying fabricated references in published works, finding fabricated and misleading images, and more.
What scientists are doing with AI
AI systems clearly can’t be trusted to conduct the full process of science on their own. But how about using AI to help scientists get more done more quickly?
This is the intent of the two new systems described in Nature: Robin, made by non-profit Future House, and Co-Scientist, from Google DeepMind.
Both systems aim to accelerate scientific discovery, working in collaboration with a scientist. Both are also “multi-agent” AI systems, meaning they are built as a collection of specialised agents each targeting specific steps of the scientific discovery process, coordinated by a “supervisor” agent.
The agents that comprise Co-Scientist aim to mirror abstract cognitive tasks, such as a “reflection agent” that acts as a critical scientific peer reviewer assessing the quality of a hypothesis. “Ranking agents” debate research hypotheses in “tournaments”, using multiple interacting LLMs to simulate a discussion about the relative merits of two hypotheses.
Robin’s agents, on the other hand, are more tuned to specific tasks relevant to drug repurposing, aiming to identify new drugs for a given disease. One agent focuses on selecting experimental tests, while another analyses complex biomedical data.
How do the results stack up?
Co-Scientist can assess the quality of its generated proposals, using a method called the Elo rating which is best known for ranking chess players. Co-Scientist’s self-ratings of the novelty and impact of its outputs align quite well with the preferences of human experts and judgements by other LLM systems.
In a drug repurposing experiment, Co-Scientist selected 30 drug candidates as promising treatments for a kind of cancer called acute myeloid leukemia. Expert (human) oncologists refined the list, and five drugs were tested in the lab. Of these, three showed some positive results and one seemed to show particular promise.
Other experiments showed the potential of Co-Scientist to explore combinations of multiple drugs.
Notably, the predictions of Co-Scientist were not compared with the plethora of targeted computational and machine learning methods for drug repurposing that have been developed over decades of computational biology research. This means we don’t know whether the new general-purpose tool outperforms more specific AI approaches.
Both systems stop short of validating their hypotheses directly, which would involve real physical experiments. Both also rely heavily on human input to define the key scientific question, sense-check predictions, and prioritise predictions for further investigation.
Co-Scientist focuses primarily on generating hypotheses through elaborate reasoning agents, leaving validation and interpretation to subsequent steps. Robin also uses an agent to analyse data produced from real-world experiments.
Robin was used to propose 30 drug candidates for a condition called dry age-related macular degeneration. The top five were selected for testing.
Robin also made proposals for the experiments, with several suggestions overridden by the human scientists. Through several rounds of brainstorming and analysis, two drugs were identified as promising.
Testing of Robin’s individual agents showed those that dug through earlier research were better at the task than general-purpose LLMs. The analytical agent did less well on questions about statistics and bioinformatics, and relied heavily on human-supplied prompts.
The limits of language alone
AI can help scientists to navigate the vast amount of documented knowledge humans have acquired over the millennia. Use of computation to find patterns in large datasets, to integrate dispersed information, and to drive new discoveries from existing literature has already contributed to scientific progress for decades.
New models such as Robin and Co-Scientist represent a shift towards working directly in the realm of the language of science, rather than the realm of raw data. This allows more natural collaborations between scientist and machine, through language-based “discussions”.
However, more natural doesn’t necessarily mean more effective. Language-based communication can be imprecise and ambiguous, where science must be specific.
Models that combine the best of these worlds are on the horizon. These aim to link structured quantitative data to the concepts and relationships that describe the core facts beneath it.
Such models ground scientific reasoning in the structure of knowledge. They allow scientific evidence ranging from genomic sequences and protein structures to cellular imaging to be connected.
Words are how science is communicated. AI tools that facilitate making sense of the information that is hidden in all of those words are surely valuable. But the complexity of the natural world means that AI (co-) scientists will only be truly effective when they can go beyond connecting words together, to modelling the full complexity of the systems those words describe.
