Discovering ancestors and connecting relatives in large genomic databases

J.P. Nani1,*, L.R. Bacheller2, J.B. Cole1, and P.M. VanRaden1

1Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350
2 Council on Dairy Cattle Breeding, Bowie, MD 20716
*Corresponding author


2019 J. Dairy Sci. (?)
© American Dairy Science Association, 2019. All rights reserved.
Individuals may download, store, or print single copies solely for personal use.
Do not share personal accounts or passwords for the purposes of disseminating this article.
 

ABSTRACT

Genomic evaluation has improved both plant and animal breeding by allowing more accurate estimation of an individual’s genetic potential. Because often only a small proportion of the population to be evaluated has been genotyped, genomic estimations rely heavily on complete pedigree information. Confirmation, discovery, and correction of parentage and connected relatives allow the creation of more complete pedigrees, which in turn increases the number of usable phenotypic records and prediction accuracy. Previous methods accounted for parent-progeny conflicts using SNP. More recently haplotype methods allowed discovery of distant relationships such as maternal grandsire (MGS) and maternal great-grandsires (MGGS) with improved accuracy. However, discovered MGS and MGGS often were never used because no dam information was available to link them to the calf. An automated procedure to discover and fill missing maternal identification information was developed, thus allowing discovered MGS and MGGS to be used in imputation as well as in calculating breeding values for animals in the US dairy cattle database. An MGS was discovered for 295,136 animals with unknown dam, and the MGGS was discovered for 153,909 of these animals. A virtual maternal identification was added for animals with missing information. The effect of pedigree completion on progeny inbreeding, breeding values, and reliabilities was examined. Mean inbreeding of animals with missing maternal pedigree information was 6.69% before and 6.87% after pedigree assignment; expected future inbreeding was 7.24% before and 7.20% after assignment. Reliabilities for traditional breeding values increased from 26.6 to 32.6% for milk yield, 25.9 to 32.0% for fat yield, and 26.9 to 32.9% for protein yield; genomic reliabilities also increased slightly from 76.2 to 77.1% for milk, 76.0 to 76.9% for fat, and 76.3 to 77.3% for protein. The procedure developed for pedigree completion is a useful tool for improving accuracy of national and international evaluations and aiding producers in making better mating decisions.

Key Words: ancestry discovery, pedigree, genomics, genotype, grandsire