Animal Improvement Programs : Software / findhap.f90
You are here: Beltsville Area Home / Beltsville Agricultural Research Center / Animal Genomics and Improvement Laboratory / Animal Improvement Program


ANIMAL IMPROVEMENT
PROGRAM

findhap.f90 Find haplotypes and impute genotypes using multiple chip sets

Downloads Version 4 program, example files, and executable
(beta version — not quite ready for routine use on U.S. chip data, but performs better than version 3 for sequence data)
Version 3 program, example files, and executable
Version 2 program, example files, and executable
(not maintained)

Inputs genotypes.txt Format: animal# chip# #SNPs genotypes
Sort by animal#, genotype codes are 0,1,2, and 5 = missing
For fixed length input, set chip# to 1 and missing genotypes to 5
For variable length input, #SNPs and order must match chromosome.data
chromosome.data List of all SNPs used and which SNPs are on each chip
Sort by chromosome number and position within chromosome
X-specific chromosome last, after pseudo-autosomal "chromosome"
Y-specific SNPs not supported yet
pedigree.file Format: sex  animal#  sire#  dam#  birthdate  animal ID  animal name
Sort in ascending birth date order
findhap.options Program control file with user-defined options
sequences.readdepth
(version 4 only)
Format: animal#  chip#  #SNPs
Read counts for A and B alleles stored in 1-byte hexadecimal format

Outputs hap.list List of all haplotypes found in each segment
hap.found Each animal's paternal and maternal haplotypes (2 lines/animal)
hap.inherit Tracks inheritance and crossovers for each parental chromosome
hap.filled Summarizes imputation quality for each animal
cross.overs Lists exact location of all detected crossovers
allele.frequency Estimated allele frequencies and missing rates for each SNP
genotypes.filled Imputed genotypes with codes: 0 = BB, 1 = AB, 2 = AA, 3 = B_, 4 = A_, 5 = __
Number of animals output may exceed input because of imputed dams
Remaining missing alleles in codes 3, 4, and 5 can be set using allele frequencies
haplotypes.txt Imputed haplotypes: SNP1 paternal maternal, SNP2 pat mat, etc., for each animal
No missing alleles, allowing genotypes to be formed simply as (pat + mat - 2)

References 2014  VanRaden, P.M., and C. Sun. Fast imputation using medium- or low-coverage sequence data. Proc. 10th World Congr. Genet. Appl. Livest. Prod., Vancouver, Canada, Aug. 17–22, Comm. 179.
2013  VanRaden, P.M., D.J. Null, M. Sargolzaei, G.R. Wiggans, M.E. Tooker, J.B. Cole, T.S. Sonstegard, E.E. Connor, M. Winters, J.B.C.H.M. van Kaam, A. Valenti, B.J. Van Doormaal, M.A. Faust, and G.A. Doak. Genomic imputation and evaluation using high-density Holstein genotypes. J. Dairy Sci. 96:668–678.
2011  VanRaden, P.M., J.R. O'Connell, G.R. Wiggans, and K.A. Weigel. Genomic evaluations with many more genotypes. Genet. Sel. Evol. 43:10.
2010 VanRaden, P.M. Genomic evaluations with many more genotypes and phenotypes. Proc. 9th World Congr. Genet. Appl. Livest. Prod., Leipzig, Germany, Aug. 1–6, Comm. 27.
 VanRaden, P.M., J.R. O'Connell, G.R. Wiggans, and K.A. Weigel. Combining different marker densities in genomic evaluation. Interbull Bull. 42:113–118.

Version differences 4 vs. 3 Can input numbers of A and B allele reads from sequence data
Increased memory and CPU because of likelihood ratio calculations
3 vs. 2Computing time reduced by using priors or imputing only new animals
Files hap.list and hap.found output multiple lengths to use as priors
Options file includes damout, listout, and errate for outputting imputed parents, outputting all steps or only the final step, and allowing error within haplotypes
Option genout can output only best call (0,1,2) or just missing (0,1,2,5) in genotypes.filled
2 vs. 1Options file uses maxlen, minlen, and steps to divide long segment into shorter segments
Computing time increases by number of steps used to get from maxlen to minlen
Population and pedigree haplotyping in one loop vs. 2 separate loops
Searches for great-grandparent haplotypes, not just genotyped parents and grandparents
Higher accuracy and/or fewer high-density genotypes required

License Fortran program findhap.f90 is public domain and was developed with U.S. taxpayer funding. Accurate results are not guaranteed. Please report any bugs to Paul.VanRaden@ars.usda.gov. You may modify, improve, use, and redistribute the code to anyone for any purpose. Or, you can ask Paul to make changes that could benefit U.S. evaluations and other users.


Paul VanRaden
Animal Genomics and Improvement Laboratory
Agricultural Research Service, USDA

Last Modified: 08/18/2014