Imputation and investigation of sequence genotypes for 6,735,530 variants of 39,048 Holsteins

A. Al-Khudhair*,1, J.R. O'Connell2, D.J. Null1, and P.M. VanRaden1

1Animal Genomics and Improvement Laboratory, ARS, USDA, Beltsville, MD
2University of Maryland, School of Medicine, Baltimore, MD


2020 J. Dairy Sci. (?)
© American Dairy Science Association, 2020. All rights reserved.
Individuals may download, store, or print single copies solely for personal use.
Do not share personal accounts or passwords for the purposes of disseminating this article.
 

ABSTRACT

Previous US studies of Holstein genotypes from run5 of the 1000 Bull Genomes Project used sequence variants in exons and very close to genes, whereas current study of run7 genotypes also includes intronic and intergenic loci. After data cleaning/editing, sequence genotypes for 6,735,530 variants of 917 Holsteins were selected from run7 raw data, in addition to array genotypes from the Council on Dairy Cattle Breeding (Bowie, MD) database, which included either 79,294 SNP from routine predictions or 643,059 SNP from imputed high-density (HD) genotypes using Findhap, version 3. A total of 39,048 Holstein bulls had either sequence or imputed HD genotypes, and all were imputed to sequence. Editing and imputation tests combining sequence and HD array genotypes revealed higher genotype error rate with run7 genotypes than from previous run5 genotypes. Genome-wide association was performed with deregressed milk and fat phenotypes of Holstein bulls using a mixed model framework. That framework included an intercept and a polygenic random effect estimated with a genetic relationship matrix constructed from 79,294 markers from the imputed genotype file for December 2019 US genomic evaluations. Residual error was modeled using a diagonal matrix with deregressed animal-specific reliabilities. Milk and fat had 488 and 603 markers, respectively, with a P-value of <1E-10. Known major loci, such as in DGAT1, ABCG2, and Β-casein, had highest effects in official predictions, on the contrary, nearby linked loci had higher effects in imputed HD or imputed sequence data. This indicates that using more variants does not ensure localizing causal variants; however, official predictions included about 800,000 genotyped and phenotyped cows that were not included in the HD or sequence studies. Phenotypic effects were also estimated by multiple regression for 13 traits, but convergence was incomplete. Annotation of results and conditional analyses is underway to investigate if intronic and intergenic loci also directly affect phenotypes of interest and to identify additional candidate loci to be included in future genotyping chips.

Keywords: genome-wide association, genotype, sequence imputation