Salk researchers create detailed genetic atlas of cannabis to date

Cannabis has been a globally important crop for millennia. While best known today as marijuana for its psychoactive cannabinoid THC (tetrahydrocannabinol), historically, cannabis has been a cornerstone of human civilization, providing seed oil, textiles, and food for more than 10,000 years. Today, cannabis remains an understudied and underutilized resource, but United States legislation passed in 2014 and 2018 have re-energized cannabis crop development for medicinal, grain, and fiber applications.

Researchers from the Salk Institute have created the most comprehensive, high-quality, and detailed genetic atlas of cannabis to date. The team analyzed 193 different cannabis genomes (entire sets of genetic information), revealing an unprecedented diversity, complexity, and untapped opportunity within this foundational agricultural species. This landmark achievement was the result of a multi-year collaboration with Oregon CBD, Oregon State University, and the HudsonAlpha Institute of Biotechnology.

The findings, published in Nature on May 28, 2025, set the stage for transformative advances in cannabis-based agriculture, medicine, and industry.

Cannabis is one of the most extraordinary plants on Earth. Despite its global importance as a source of medicine, food, seed oil, and fiber for at least the last 10,000 years, it remains one of the least developed major crops of modern times, largely due to a century of legal restrictions. Our team constructed the most complete genetic map, or pangenome, of the plant to date by analyzing nearly 200 diverse cannabis genomes, showing that we are just starting to see the full potential of this amazing plant. Those same legal restrictions spurred an underground breeding revolution, revealing cannabis's power as a chemical factory. With this new genomic blueprint, we can now apply modern breeding to unlock novel compounds and traits across agriculture, medicine, and biotechnology."

Todd Michael, senior author of the study and research professor at Salk

Background: Cannabis as a chemical powerhouse

Cannabis sativa, also known as hemp, is a flowering plant native to Asia. Cannabis has many unique features that have made it a prominent crop over the course of human history, like its ability to produce strong fibers for textiles or its medicinal qualities stemming from its being one of the few plants to make high quantities of cannabinoids. Innovators today suggest that cannabis oil could rival canola or soybean with the right breeding, or that cannabis derivatives could even be used as a sustainable alternative to jet fuel.

Cannabis is a chemical powerhouse. It can produce more than 30% by dry weight as terpenes and cannabinoids, small chemicals that the plant makes to protect against predators, yet humans leverage for mood-altering purposes. Terpenes create the exquisite aromas that attract us to fruits and flowers, while cannabinoids interact with the human body to provide many therapeutic properties. One cannabinoid, the non-psychedelic cannabidiol (CBD), expanded the public's view of cannabis when the "Charlotte's Web" strain was used to treat epileptic seizures. CBD, tetrahydrocannabinol (THC), and more than 100 other poorly studied cannabinoids have been used to treat a variety of ailments, including pain, arthritis, nausea, asthma, depression, and anxiety.

Importantly, the impact this selective breeding has had on cannabis's genomic diversity has remained a mystery. Solving this mystery has proven difficult since cannabis has a complicated genome. First, cannabis is among the less than 5% of plants to have distinct female and male sexes on separate plants. Second, cannabis genomes contain many transposable elements, which are repetitive stretches of DNA that can "jump" around the genome and are thus difficult to track.

Key discoveries: New, and surprisingly diverse, genetic patterns

Scientists use a technology called sequencing to determine the patterns of nucleic acids, which connect across DNA's double helix to form base pairs, along DNA strands. Traditional short-read sequencing methods chop up the DNA to investigate it piece by piece, just a few hundred base pairs at a time. Newer long-read sequencing techniques can capture thousands of base pairs at once.

"There are limits to what you can discover with short-read sequencing technologies, since those short genetic excerpts are impossible to stitch together in any meaningful way when looking at complex regions of the genome, especially repetitive DNA sequences," says co-first author Lillian Padgitt-Cobb, a postdoctoral researcher in Michael's lab. "We're among the first to harness this long-read technology at scale in the pangenome context, and with that comes all these insights into structural variation and gene ordering that can inform end-game decisions about breeding favorable traits into cannabis plants."

The study isn't the first to use long-read sequencing-in fact, Michael himself was the first researcher to generate a chromosome-level genome of cannabis using long-read sequencing back in 2018, which revealed complex genetic architecture where cannabinoids are synthesized, and explained the breeding history behind anti-epileptic Charlotte's Web. Where this new study stands apart is its completeness. It contains the most genomes yet, and is the first to include sex chromosomes and, relatedly, the first to have haplotype resolution.

Cannabis is a diploid plant. This means that, like humans, it contains two sets of chromosomes, one set inherited from a male plant and the other from a female plant. While most genomes published to date have only been able to decode one chromosome, also known as haplotype resolution, the team resolved both sets of cannabis chromosomes. By looking at both chromosome sets, the researchers revealed an unprecedented amount of genetic variation-possibly up to 20 times that of humans.

"With this haplotype resolution," Padgitt-Cobb explains, "we can look at what was inherited from just one of the parent plants and start to understand the breeding and background of that plant."

The team's study collected genomes from 144 different cannabis plants from around the world to assemble 193 total genomes-181 of which had never been catalogued before. The genome total is greater than the plant total owing to that haplotype resolution, since each plant that had both chromosome sets investigated produced two genome assemblies. Collectively, these many genomes make up the pangenome, which was analyzed to understand the full extent of genetic diversity within the cannabis species.

The high quality of the collected genomes allowed the researchers to resolve previously unseen genetic patterns, including the architecture of genes responsible for cannabinoid synthesis, and, by incorporating sex chromosomes, a first look at cannabis Y chromosomes.

Their first discovery was that there is unexpected diversity within the species. Across the pangenome, 23% of genes were found in every genome, 55% were nearly universal (seen in 95%–99% of genomes), 21% were in between 5% and 94% of genomes, and less than 1% were entirely unique. Some of the most universal genes were those that produce cannabinoids.

While cannabinoid genes were consistent across genomes, genes related to fatty acid metabolism, growth, and defense were not. These variable genes are an untapped breeding pool, and their selective breeding could also make cannabis more robust in the field or improve the nutritional content of hemp oil to make it a competitor among existing seed oils. Notably, the research team discovered that structural variation in the fatty acid biosynthetic pathway contributes to the production of tetrahydrocannabivarin (THCV), a rare varin-type cannabinoid gaining attention for its non-psychoactive, energizing effects.

Looking more closely at cannabinoid genes across the pangenome, the researchers concluded that two genes, THCAS and CBDAS, are likely under strong selective pressure from human-directed breeding for THC and CBD content. Importantly, they found that cannabinoid genes are located in transposable elements. Selectively breeding for genes inside these "jumping" transposable elements has, in turn, created immense diversity among cannabis plants.

Looking ahead: Optimizing plants for health and industry

The researchers also identified interesting targets for agricultural optimization. First, by looking at the differences between European and Asian genomes, they concluded there is likely an ancient cannabis relative somewhere in Asia waiting to be discovered. This wild relative will have novel genetic adaptations related to its unique environmental history, making it a wealth of information for breeding cannabis plants that are more resilient crops.

Finally, the novel insight into sex chromosomes revealed that there are genes only present in "father" plants that can be used to breed better-performing offspring. Modern marijuana breeding leverages "feminization," where farmers induce a female plant to make male flowers-entirely bypassing the Y chromosome. These new findings suggest that breeding programs may be missing valuable genetic diversity and trait potential encoded in those bypassed male genomes. Incorporating true male plants into breeding strategies could unlock overlooked genetic gains and expand opportunities for crop improvement.

"Over the last 10 years, breeders have already done a decent job of getting yields up and making cannabis an economically viable crop," says co-first author Ryan Lynch, a postdoctoral researcher in Michael's lab. "Once there's market interest there, paired with these new insights into cannabis genomes that can guide breeding efforts, I can see hemp and hemp oils really booming in both human health and industry applications."

In the short term, the team hopes the pangenome will serve as a dynamic resource for researchers around the world to build upon and use to inform cultivation strategies, helping to realize the untapped potential of cannabis as a valuable multi-use crop grown for fiber, seed oil, and medicine.

More about this paper

Other authors include Nolan Hartwick, Nicholas Allsing, Anthony Aylward, Allen Mamerto, Justine Kitony, Kelly Colt, Emily Murray, Tiffany Duong, Heidi Chen of Salk; Andrea Garfinkel, Aaron Trippe, and Seth Crawford of Oregon CBD; Brian Knaus and Kelly Vining of Oregon State University; and Philip Bentz, Sarah Carey, and Alex Harkess of the HudsonAlpha Institute for Biotechnology.

The work was supported by the Tang genomics fund, National Science Foundation (NSF-IOS PRFB 2209290, IOS-PGRP CAREER 2236530), Bill and Melinda Gates Foundation (INV-040541), and US Department of Agriculture (USDA NIFA 2022-67012-38987, USDA NIFA 2023-67013-39620).

Source:
Journal reference:

Lynch, R. C., et al. (2025). Domesticated cannabinoid synthases amid a wild mosaic cannabis pangenome. Nature. doi.org/10.1038/s41586-025-09065-0.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Combined use of alcohol and cannabis could fuel addiction risks