In a recent study posted to the bioRxiv* preprint server, researchers investigated whether low-complexity regions (LCRs) were more accountable for the current 2022 IIb clade monkeypox (MPX) virus (MPXV) outbreak rather than single-nucleotide polymorphisms (SNPs).
The ongoing MPXV outbreak has occurred due to infection with the IIb subclade MPXV. In contrast to clade I- and clade IIa-MPXV-caused MPX cases, the current outbreak prognosis is largely favorable, despite considerably more efficient human-to-human MPXV transmission. MPXV has evolved due to selective pressures from hosts and host-interacting gene losses. To date, there have been unsatisfactory genomic explanations for SNPs accounting for the increased transmissibility of MPXV.
About the study
The present study investigated whether LCR variations were mainly responsible for MPXV genome alterations and the unexpected MPXV 2022 outbreak epidemiology.
The II subclade B.1 MPXV lineage sequences were assembled de novo using a mapping method that involved the use of shotgun metagenomics and short-reads sequencing of ribonucleic acid (RNA) and deoxyribonucleic acid (DNA) extracted from swabs from vesicular lesions of MPX patients diagnosed between 18 May and 14 July 2022 in Spain.
LCR resolution was performed based on reference genome mapping, and in silico analyses were performed. The findings were applied to publicly available MPXV NCBI (national center for biotechnology information) SRA (sequence read archive) datasets (n=35) of single-molecule raw reads. For determining the actual LCR sequence, a combination of different sequencing strategies was used.
LCR distribution among different major functional protein orthologous poxvirus gene (OPG) groups and the extent of diversity among the 21 identified LCRs was compared. For all LCRs across all data samples, the allelic frequencies were characterized and comparatively assessed.
An MPXV genome HQS 353R that represented the current 2022 MPXV outbreak was accurately determined based on LCR variations with significant STR variations in the LCRs. In the MPXV genome, LCR entropy was significantly higher than SNP entropy. In silico analyses indicated that the expression, translation, stability, or function of MPXV OPGs 153, 204, and 208 could be impacted by the genomic evolutionary accordion involving rhythmic genome expansions and contractions.
A total of 48 MPXV genome sequences were determined with ≥10X read depth and 39,697,742 HQ reads for each swab. One contig, two contigs, three contigs, and one contig belonged to MPXV with 101%, 97%, and 97% coverage, respectively, of the MPXV-M5312_HM12_Rivers sequence. In total, 21 LCRs were identified with LCR pairs 10/11 and 1/4 having similar copies in the reverse complementary formations.
LCR3 contained a TR with the ATAT [ACATTATAT]n sequence, and the analysis indicated that n=52. None of the MPXV orthopoxvirus genome sequences available publicly had a similarly long TR. Four IIb clade B.1 lineage MPXV genome sequences of the current 2022 MPXV outbreak showed n=54 to 62 LCR3 repeats, and the number of STRs differentiated the sequences from the IIb subclade A lineage genome sequences (12 to 42 STRs in LCR3 regions), indicative of high genetic variability in LCR3.
Likewise, IIb subclade MPXV lineage-specific STR differences were detected in the 1/4 LCR pair. The pair contained an STR with the [AACTAACTTATGACTT]n sequence, and the analysis findings indicated that n=16. LCR3 appeared to have greater length since the viral spillover, whereas the length of the 1/4 LCR pair appeared to have reduced, behaving as a genomic accordion with time.
The II subclade B.1 lineage MPXV_USA_2022_MA001 and 353R genomic sequences had 67 SNPs against the II subclade A lineage reference isolate sequences. In addition, the 353R HQS contained two more SNP pairs on inverted repetitions (ITRs) in the right and left, resulting in an OPG015 gene stop codon. The MPXV_USA_2022_MA001 and 353R sequences also differed by two indels (insertions-deletions) of bases situated at positions 077,133 and 273,173, corresponding to LCR2 region and LCR5 region differences, respectively.
The 353R HQS differed by 1,338 base pair (bp) and 1,342 bp in genomic lengths compared to the MPXV M5312_HM12_Rivers and MPXV_USA_2022_MA001 sequences, respectively. MPXV LCRs were distributed non-randomly with a significant purifying force of selection against LCR introduction in the central conserved sites. In 353R HQS, LCR regions 2, 5, 7, 10, 11, and 21 showed intrahost genomic diversity, with values for entropy ranging between 0.2 and 1.7, with significantly greater variety in LCRs compared to those in SNPs. The average Euclidean distance between samples for LCRs ranged between 0.1 (LCR21) and 0.7 (LCR2), and the LCR differences showed statistical significance.
LCR 10/11 pair and LCR7 showed considerable intrahost variations and preponderant allelic differences between the samples. LCRs 5, 6, and 7 were situated in a defined central conserved site of the MPXV genome between genomic positions 130,000 and 138,000, and the site comprised OPGs-152, 153, and 154. LCR7 was situated at a functional ORF center, whereas LCR3 and 21 were located in the promoter/starting site, probably altering the starting site of ORF. The region between positions 170,000 and 180,000 included LCRs 2, 3, 19, 20, and 21 and was another site of functional impact.
Overall, the study findings showed that most of the MPXV genomic variability occurred in LCRs. Therefore, research emphasizing phenotypic MPXV differences should focus on LCR variations rather than SNP variations.
bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.