Researchers develop new statistical method that generates significant genetic insights

NewsGuard 100/100 Score

Pleiotropy analysis, which provides insight on how individual genes result in multiple characteristics, has become increasingly valuable as medicine continues to lean into mining genetics to inform disease treatments. Privacy stipulations, though, make it difficult to perform comprehensive pleiotropy analysis because individual patient data often can't be easily and regularly shared between sites.

However, a statistical method called Sum-Share, developed at Penn Medicine, can pull summary information from many different sites to generate significant insights. In a test of the method, published in Nature Communications, Sum-Share's developers were able to detect more than 1,700 DNA-level variations that could be associated with five different cardiovascular conditions. If patient-specific information from just one site had been used, as is the norm now, only one variation would have been determined.

Full research of pleiotropy has been difficult to accomplish because of restrictions on merging patient data from electronic health records at different sites, but we were able to figure out a method that turns summary-level data into results that are exponentially greater than what we could accomplish with individual-level data currently available. With Sum-Share, we greatly increase our abilities to unveil the genetic factors behind health conditions that range from those dealing with heart health, as was the case in this study, to mental health, with many different applications in between."

Jason Moore, PhD, Senior Authors, Director, Institute for Biomedical Informatics, Professor of Biostatistics, Epidemiology and Informatics

Sum-Share is powered by bio-banks that pool de-identified patient data, including genetic information, from electronic health records (EHRs) for research purposes. For their study, Moore, co-senior author Yong Chen, PhD, an associate professor of Biostatistics, lead author Ruowang Li, PhD, a post-doc fellow at Penn, and their colleagues used eMERGE to pull seven different sets of EHRs to run through Sum-Share in an attempt to detect the genetic effects between five cardiovascular-related conditions: obesity, hypothyroidism, type 2 diabetes, hypercholesterolemia, and hyperlipidemia.

With Sum-Share, the researchers found 1,734 different single-nucleotide polymorphisms (SNPs, which are differences in the building blocks of DNA) that could be tied to the five conditions. Then, using results from just one site's EHR, only one SNP was identified that could be tied to the conditions.

Additionally, they determined that their findings were identical whether they used summary-level data or individual-level data in Sum-Share, making it a "lossless" system.

To determine the effectiveness of Sum-Share, the team then compared their method's results with the previous leading method, PheWAS. This method operates best when it pulls what individual-level data has been made available from different EHRs. But when putting the two on a level playing field, allowing both to use individual-level data, Sum-Share was statistically determined to be more powerful in its findings than PheWAS. So, since Sum-Share's summary-level data findings have been determined to be as insightful as when it uses individual-level data, it appears to be the best method for determining genetic characteristics.

"This was notable because Sum-Share enables loss-less data integration, while PheWAS loses some information when integrating information from multiple sites," Li explained. "Sum-Share can also reduce the multiple hypothesis testing penalties by jointly modeling different characteristics at once."

Currently, Sum-Share is mainly designed to be used as a research tool, but there are possibilities for using its insights to improve clinical operations. And, moving forward, there is a chance to use it for some of the most pressing needs facing health care today.

"Sum-Share could be used for COVID-19 with research consortia, such as the Consortium for Clinical Characterization of COVID-19 by EHR (4CE)," Yong said. "These efforts use a federated approach where the data stay local to preserve privacy."​

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Study findings better define the genetic landscape of neuroendocrine tumors of the cervix