Gaps and complex structurally variant loci in phased genome assemblies
In: Genome Research, Jg. 33 (2023-04-01), Heft 4
academicJournal
- 496 - 510
Zugriff:
There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.
Titel: |
Gaps and complex structurally variant loci in phased genome assemblies
|
---|---|
Autor/in / Beteiligte Person: | Porubsky, David ; Vollger, Mitchell R ; Harvey, William T ; Rozanski, Allison N ; Ebert, Peter ; Hickey, Glenn ; Hasenfeld, Patrick ; Sanders, Ashley D ; Stober, Catherine ; Consortium, Human Pangenome Reference ; Korbel, Jan O ; Paten, Benedict ; Marschall, Tobias ; Eichler, Evan E ; Abel, Haley J ; Antonacci-Fulton, Lucinda L ; Asri, Mobin ; Baid, Gunjan ; Baker, Carl A ; Belyaeva, Anastasiya ; Billis, Konstantinos ; Bourque, Guillaume ; Buonaiuto, Silvia ; Carroll, Andrew ; Chaisson, Mark JP ; Chang, Pi-Chuan ; Chang, Xian H ; Cheng, Haoyu ; Chu, Justin ; Cody, Sarah ; Colonna, Vincenza ; Cook, Daniel E ; Cook-Deegan, Robert M ; Cornejo, Omar E ; Diekhans, Mark ; Doerr, Daniel ; Ebler, Jana ; Eizenga, Jordan M ; Fairley, Susan ; Fedrigo, Olivier ; Felsenfeld, Adam L ; Feng, Xiaowen ; Fischer, Christian ; Flicek, Paul ; Formenti, Giulio ; Frankish, Adam ; Fulton, Robert S ; Gao, Yan ; Garg, Shilpa ; Garrison, Erik ; Garrison, Nanibaa’ A ; Giron, Carlos Garcia ; Green, Richard E ; Groza, Cristian ; Guarracino, Andrea ; Haggerty, Leanne ; Hall, Ira M ; Haukness, Marina ; Haussler, David ; Heumos, Simon ; Hoekzema, Kendra ; Hourlier, Thibaut ; Howe, Kerstin ; Jain, Miten ; Jarvis, Erich D ; Ji, Hanlee P ; Kenny, Eimear E ; Koenig, Barbara A ; Kolesnikov, Alexey ; Kordosky, Jennifer ; Koren, Sergey ; Lee, Ho ; Joon ; Lewis, Alexandra P ; Li, Heng ; Liao, Wen-Wei ; Lu, Shuangjia ; Lu, Tsung-Yu ; Lucas, Julian K ; Magalhães, Hugo ; Marco-Sola, Santiago ; Marijon, Pierre ; Markello, Charles ; Martin, Fergal J ; McCartney, Ann ; McDaniel, Jennifer ; Miga, Karen H ; Mitchell, Matthew W ; Monlong, Jean ; Mountcastle, Jacquelyn ; Munson, Katherine M ; Mwaniki, Moses Njagi ; Nattestad, Maria ; Novak, Adam M ; Nurk, Sergey |
Link: | |
Zeitschrift: | Genome Research, Jg. 33 (2023-04-01), Heft 4 |
Veröffentlichung: | eScholarship, University of California, 2023 |
Medientyp: | academicJournal |
Umfang: | 496 - 510 |
Schlagwort: |
|
Sonstiges: |
|