A study conducted by a team of researchers from Human Longevity, Inc. (HLI) found that whole-genome sequencing data and machine learning can be used in the prediction of individual faces and other physical traits.
Examples of real (Left) and predicted (Right) faces from the Human Longevity study predicting face and other physical traits from whole genome sequencing data.
Christoph Lippert, Ph.D., the lead author, and J. Craig Venter, Ph.D., senior author commented that, this study offers innovative methods for forensics; and has significant consequences for de-identification, data privacy, and sufficiently informed consent. They concluded that considerably greater public deliberation is required as progressively more genomes are created and stored in public databases.
The study approved by IRB comprised of 1,061 participants, aged between 18 and 82, of different ethnic backgrounds, and whose genomes were subjected to sequencing at a minimum depth of 30x. Phenotype data of these participants were collected in the form of eye and skin color, height, age, weight, 3-D facial images, and voice samples.
Skin color, eye color, and sex were accurately predicted by the researchers, but they faced difficulties while predicting other complex genetic traits. Large cohorts were required by these researchers to improve the predicting efficiency, though their predictive models were effective.
A novel algorithm, known as maximum entropy algorithm, has been developed by the team to find the optimal predictive models combination in order to match whole-genome sequencing data with demographic and phenotypic data. On an average, 8 out of 10 participants of different ethnic backgrounds, and 5 out of 10 Afro-American or European participants were identified properly by this algorithm.
Venter, HLI’s co-founder stated: “We set out to do this study to prove that your genome codes for everything that makes you, you. This is clearly a proof of concept with a limited cohort but we believe that as we increase the numbers of people in this study and in the HLI database to hundreds of thousands we will be able to accurately predict all that can be predicted from individuals’ genomes.”
He further remarked that the scientific community as well as the general public was not too concerned about the requirement of policies and safeguards for genomic data privacy of an individual and emphasized better technical solutions, continued discussion, and in-depth analysis.
According to Lippert, data scientist at HLI, this study indicates the efficiency of imaging techniques used in screening the traits of more number of people. Machine learning plays a vital role in scientific discovery and allows complete automated data interpretation.