AI trained on monkey DNA predicts genetic disease risk for humans

A group of international researchers has shed more light on genetic variants responsible for human disease by analyzing primate DNA data with a new AI algorithm.

Initially, the scientists analyzed more than 800 individual samples from 233 species of non-human primates representing all 16 families, from lemurs to gorillas. To interpret the data, they developed a new algorithm: PrimateAI-3D.

PrimateAI-3D is built on deep-learning language architectures similar to those used in ChatGPT, but designed to model genomic rather than linguistic sequences. The team used natural selection to train its parameters, presenting it with mutations that rule out disease in our primate relatives. In this way, the algorithm learned to recognize benign genetic variants and, through elimination, mutations likely to cause disease.

Next, the scientists applied PrimateAI-3D to identify potentially harmful mutations in humans, using medical records and gene variant data from more than 400 people who donated samples to the UK Biobank project. They found that the algorithm showed “impressive improvements” in predicting people’s increased genetic risk for common diseases.

The method’s claimed ability to more accurately identify pathogenic mutations than existing techniques is also related to its ability to overcome genetic biases due to white European ancestry.

Tickets are officially 90% sold out

Don’t miss your chance to be part of Europe’s leading technical event

“Even though we are 8 billion, our genetic diversity still resembles the original population of 10,000 common ancestors from which we all descend.” said Kyle Farh, co-author of the study and VP of Artificial Intelligence at collaborating company Illumina.

“There just isn’t enough information to extract from the human species. It became clear several years ago that, to really understand the human genome, the data in human genome sequencing was not enough,” he added.

Combining data on human and non-human primates is key to this, especially since living primates share more than 90% of our DNA with each other. Illumina research has shown that a genetic variant tolerated by natural selection in another primate is 99% unlikely to cause disease in humans.

The study’s findings could be used to support health research, such as helping scientists prioritize variants most likely to pose a risk to humans. They can also help maintain the populations of the other primates.

“I think we’re just getting started,” Farh noted. “There is a lot to learn here. And the idea that you can learn more about our own species from other species is, I think, very romantic.”

The full study has been published in the journal Science.