By Victoria Bela
Copyright scmp
Chinese researchers have harnessed artificial intelligence (AI) to uncover hidden shared protein features among species with similar functions that evolved independently.
From echolocation in bats and dolphins to the ability to fly, found in both birds and insects, convergent evolution – or the independent emergence of similar traits in unrelated species – has long been of interest to the scientific community.
The repeated emergence of the same functional trait provides the opportunity to investigate how genes and proteins relate to this process. Traditional investigation methods examine small sequence similarities rather than complex differences, such as 3D structure.
With the help of an advanced AI protein language model, a team from the Chinese Academy of Sciences has been able to examine these complex high-order features in proteins that evolved separately but served a similar function.
“The findings emphasise an underrated sequence basis for functional trait convergence in evolution,” the team said in a paper published in the peer-reviewed journal Proceedings of the National Academy of Sciences on September 23.
Convergent evolution, also known as convergence, is the repeated, independent evolutionary emergence of the same trait in two or more species, thought to be driven by adaptation to similar environments or lifestyles.
Both bats and toothed whales are capable of echolocation, despite this ability being absent in the common ancestors of these distant lineages, suggesting that convergent evolution may be behind their emergence.
Interest in this phenomenon has birthed research into whether the convergence of a function also has to do with convergence at a molecular level.
Protein higher-order features include elements of a protein beyond the order of amino acids that make up its sequence, such as its 3D structure, interaction with water and charge.
These higher-order features are crucial to a protein’s function, as they play a direct role in its biological activity and interactions with other molecules.
Current methods to investigate adaptive convergence at the molecular level focus on individual sites of protein sequences; however, they fail to cover higher-order features. The team chose to use the recently developed protein language models (PLMs) to overcome this constraint.
PLMs generate numerical embeddings, or a conversion of real-world data into a number-based representation, that allows them to be compared to other sets of data.
These embeddings can give an overall picture of data, such as protein features, which allows for direct comparisons between species.
The team applied this model to reported cases of functionally convergent proteins and found that proteins with different amino acid sequences but similar functions tended to have similar embeddings.
They developed a statistical pipeline called “adaptive convergence by embedding of proteins” or ACEP, which allows for genome-wide detection of adaptive convergence in higher-order protein features.
This pipeline was applied to the case of echolocation in bats and toothed whales, as well as a plant metabolism pathway found in some plants in arid regions, and returned significant results for known candidate proteins and also found new candidate genes.
“In conclusion, PLM embeddings can indicate adaptive convergence of high-order protein features beyond site identities, demonstrating the power of deep learning tools for investigating the complex mapping between molecular sequences and functions,” the team said.
“With the promising capacity of deep learning models, innovative strategies may be developed to help us elucidate the genetic basis of phenotypic and functional evolution.”
Passionate about science? Dive deeper with the Dark Matters newsletter, a weekly in-depth analysis on China’s rise in science, technology and military that goes beneath the surface. Sign up for free now.