On the Importance of Variation: A High-Resolution Map of Copy Number Variants in Arabidopsis

Linking genotype to phenotype is a major challenge in plant biology. Phenotypic variation observed between individuals of a same plant species is the consequence of a vast array of genetic variation, including Single Nucleotide Polymorphisms (SNPs) and small or large structural variants including Copy Number Variation (CNV). CNVs are genetic polymorphisms in which a segment of the genome differs in number between individuals. CNVs may result from deletions, insertions, duplications and larger amplifications occurring at coding and non-coding genomic regions. CNVs are formed through diverse genetic mechanisms including unequal crossing over, DNA double-strand break repair, and transposon activity. CNVs may have facilitated evolution and diversification of plant phenotypes, mostly through modification of gene dosage. Nevertheless, a population-scale analysis of CNVs to resolve their diversity and consequences on plant phenotypes is still lacking. In a new study, Zmienko et al. (2020) provide a population-scale map of DNA copy number variations in Arabidopsis thaliana, in an effort to facilitate the exploration of the genetic determinants of phenotypic variation in the model plant.

Taking advantage of the short-read whole-genome sequencing data released through the 1001 Genomes Consortium (2016), the authors called CNVs on 1,064 Arabidopsis accessions (Zmienko et al., 2020). CNVs were identified with different tools, based on read-depth, read-pair, split-read, or hybrid strategies. Through this combined approach, the authors uncovered 19,003 CNVs in the Arabidopsis genome. In parallel, 70,137 large indels were called only by read pair-based callers, to expand the repertoire of structural variation identified in Arabidopsis. Data are accessible at http://athcnv.ibch.poznan.pl through a user-friendly interface. After extensive benchmarking, analysis of the distribution and the genomic content of CNVs revealed an overlap with 18.3% of protein-coding genes, particularly enriched in defense- and stress-response genes. Most of the CNVs were located in genomic segments that are enriched in transposable elements (TEs) and associated with high levels of genomic rearrangements. Conversely, CNVs overlapping with TEs tend to lie further from genes compared to non-CNV TEs. Taken together, those two observations suggest that the distribution of CNVs is in part defined by the local genomic context (see Figure).

The authors then harnessed the newly identified gene-associated CNVs as markers to infer Arabidopsis population structure. Compared to the classical approach of using SNPs, the CNV-based approach performed better at identifying the global distribution of each accession but was less sensitive for detecting genetic subgroups. Use of CNVs as markers for population structure analysis thus represents a valuable complement to SNP markers. At the individual accession level, up to 26.9% of gene-associated CNVs showed variation within the population. The authors investigated the consequences of change in gene copy number on phenotypic variation. Using CNV markers in a Genome-Wide-Association-Study (GWAS), they show a strong association between resistance to Pseudomonas and change in copy number of RPS5 and RPM1 resistance genes. This example shows that CNVs can be efficiently used as markers for GWAS.

Availability of a copy number variant map in Arabidopsis thaliana offers unprecedented perspectives for assessing the consequences of gene copy number variation on quantitative phenotypes in plants. The increasing availability and refinement of long-read DNA sequencing technologies (Michael et al., 2018; Jiao and Schneeberger, 2020) will help to resolve particularly complex CNVs and to provide further insights into the consequences of structural variation on the evolution of plant phenotypes.

Matthias Benoit

Howard Hughes Medical Institute, Cold Spring Harbor Laboratory

benoit@cshl.edu

ORCID: 0000-0002-3958-3173

 

REFERENCES

Jiao, W.B. and Schneeberger, K. (2020). Chromosome-level assemblies of multiple Arabidopsis genomes reveal hotspots of rearrangements with altered evolutionary dynamics. Nat. Commun. 11: 1–10.

Michael, T.P., Jupe, F., Bemm, F., Motley, S.T., Sandoval, J.P., Lanz, C., Loudet, O., Weigel, D., and Ecker, J.R. (2018). High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell. Nat. Commun. 9: 1–8.

 Zmienko, A., Marszalek-Zenczak, M., Wojciechowski, P., Samelak-Czajka, A., Luczak, M., Kozlowski, P., Karlowski, W. M. and Figlerowicz, M. (2020). AthCNV – a map of DNA copy number variations in the Arabidopsis thaliana genome. Plant Cell. DOI: https://doi.org/10.1105/tpc.19.00640.

1001 Genomes Consortium. (2016). 1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana. Cell 166: 481–491.