The Korean Reference Variome: KoVariome
KoVariome: Korean National Standard Reference Variome database of whole genomes with comprehensive SNV, indel, CNV, and SV analyses
High-coverage whole-genome sequencing data of a single ethnicity can provide a useful catalogue of population-specific genetic variations. Herein, we report a comprehensive analysis of the Korean population, and present the Korean National Standard Reference Variome (KoVariome). As a part of the Korean Personal Genome Project (KPGP), we constructed the KoVariome database using 5.5 terabases of whole genome sequence data from 50 healthy Korean individuals with an average coverage depth of 31. In total, KoVariome includes 12.7M single-nucleotide variants (SNVs), 1.7M short insertions and deletions (indels), 4K structural variations (SVs), and 3.6K copy number variations (CNVs). Among them, 2.4M (19%) SNVs and 0.4M (24%) indels were identified as novel. We also discovered selective enrichment of 3.8M SNVs and 0.5M indels from Korean individuals, which were used to filter out the 1,271 coding-SNVs not originally removed from the 1,000 Genomes Project data when prioritizing disease-causing variants. CNV analyses have revealed gene losses related to bone mineral densities and duplicated genes involved in brain development and fat reduction. KoVariome health records were used to identify novel disease-causing variants in the Korean population, demonstrating the value of high-quality ethnic variation databases for the accurate interpretation of individual genomes and the precise characterization of genetic variations.