Mixed-model GWAS on milk production traits of 1.16M genotyped Holstein cattle

S. Toghiani1, P.M. VanRaden1, K.L. Gaddis2, M.J. VandeHaar3, and R.J. Tempelman3

1Department of Animal Science, North Carolina State University, Raleigh, NC, USA
2Department of Animal and Avian Sciences, University of Maryland, College Park, MD, USA
3Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, USA
4Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA

2022 J. Dairy Sci.


Genome-wide association studies (GWAS) have been widely used for elucidating the genetic basis of complex traits. The mixed-model method is usually needed to account for sample relatedness and polygenic effects in GWAS, but it is computationally challenging to apply it to large-scale samples. We here present a new solution to mixed-model GWAS, which we refer to as SSGP (https://github.com/jiang18/ssgp), and apply it to the largest-to-date GWAS on milk production traits by using data from the U.S. Council on Dairy Cattle Breeding. SSGP enables million-scale genomic restricted maximum likelihood estimation and accurate approximation of mixed-model association statistics. We used deregressed estimated breeding values and ~76K autosomal SNP genotypes of ~1.16M Holstein cattle in mixed-model association analysis. The mixed model’s polygenic term was accounted for by ~48K LD-pruned SNPs. This GWAS identified few new associations on milk production traits compared to our previous analysis with only 27K Holstein bulls. GWAS with subsamples of 50K, 100K, 150K, and 200K individuals showed that the increase in sample size has a bigger effect on p-values of significant SNPs than non-significant ones; that is, non-significant SNPs rarely become significant as the sample size increases. In summary, this study suggests that the genetic effect of many SNPs on Holstein milk traits may be too small to detect alone.