首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 22 毫秒
1.
The objective of this paper was to investigate, for various scenarios at low and high marker density, the accuracy of imputing genotypes when using a multivariate mixed model framework using information from 2, 4, or 10 surrounding markers. This model predicts genotypes at a locus, using genotypes at nearby loci as correlated traits, and the additive genetic relationship matrix to use information from genotyped relatives. For 2 scenarios this method was compared with the population-based imputation algorithms FastPHASE and Beagle. Accuracies of imputation were obtained with Monte Carlo simulation and predicted with selection index theory, using input from the simulated data. Five different scenarios of missing genotypes were considered: 1) genotypes of some loci are missing due to genotyping errors, 2) juvenile selection candidates are genotyped using a smaller SNP panel, 3) some animals in the pedigree of a breeding population are not genotyped, 4) juvenile selection candidates are not genotyped, and 5) 1 generation of animals in the top of the pedigree are not genotyped. Surrounding marker information did not improve accuracy of imputation when animals whose genotypes were imputed were not genotyped for those surrounding markers. When those animals were genotyped for surrounding markers, results indicated a limited gain when linkage disequilibrium (LD) between SNP was low, but a substantial increase in accuracy when LD between SNP was high. For scenario 1, using 1 vs. 11 SNP, accuracy was respectively 0.75 and 0.81 at low, and 0.75 and 0.93 at high density. For scenario 2, using 1 vs. 11 SNP, accuracy was, respectively, 0.70 and 0.73 at low, and 0.71 and 0.84 at high density. Beagle outperformed the other methods at high SNP density, whereas the multivariate mixed model was clearly superior when SNP density was low and animals where genotyped with a reduced SNP panel. The results showed that extending the univariate gene content method to a multivariate BLUP model with inclusion of surrounding marker information only yields greater imputation accuracy when the animals with imputed loci are at least genotyped for some SNP that are in LD with the SNP to be imputed. The equation derived from selection index theory accurately predicted the accuracy of imputation using the multivariate mixed model framework.  相似文献   

2.
A major obstacle in applying genomic selection (GS) to uniquely adapted local breeds in less-developed countries has been the cost of genotyping at high densities of single-nucleotide polymorphisms (SNP). Cost reduction can be achieved by imputing genotypes from lower to higher densities. Locally adapted breeds tend to be admixed and exhibit a high degree of genomic heterogeneity thus necessitating the optimization of SNP selection for downstream imputation. The aim of this study was to quantify the achievable imputation accuracy for a sample of 1,135 South African (SA) Drakensberger cattle using several custom-derived lower-density panels varying in both SNP density and how the SNP were selected. From a pool of 120,608 genotyped SNP, subsets of SNP were chosen (1) at random, (2) with even genomic dispersion, (3) by maximizing the mean minor allele frequency (MAF), (4) using a combined score of MAF and linkage disequilibrium (LD), (5) using a partitioning-around-medoids (PAM) algorithm, and finally (6) using a hierarchical LD-based clustering algorithm. Imputation accuracy to higher density improved as SNP density increased; animal-wise imputation accuracy defined as the within-animal correlation between the imputed and actual alleles ranged from 0.625 to 0.990 when 2,500 randomly selected SNP were chosen vs. a range of 0.918 to 0.999 when 50,000 randomly selected SNP were used. At a panel density of 10,000 SNP, the mean (standard deviation) animal-wise allele concordance rate was 0.976 (0.018) vs. 0.982 (0.014) when the worst (i.e., random) as opposed to the best (i.e., combination of MAF and LD) SNP selection strategy was employed. A difference of 0.071 units was observed between the mean correlation-based accuracy of imputed SNP categorized as low (0.01 < MAF ≤ 0.1) vs. high MAF (0.4 < MAF ≤ 0.5). Greater mean imputation accuracy was achieved for SNP located on autosomal extremes when these regions were populated with more SNP. The presented results suggested that genotype imputation can be a practical cost-saving strategy for indigenous breeds such as the SA Drakensberger. Based on the results, a genotyping panel consisting of ~10,000 SNP selected based on a combination of MAF and LD would suffice in achieving a <3% imputation error rate for a breed characterized by genomic admixture on the condition that these SNP are selected based on breed-specific selection criteria.  相似文献   

3.
Using target and reference fattened steer populations, the performance of genotype imputation using lower‐density marker panels in Japanese Black cattle was evaluated. Population imputation was performed using BEAGLE software. Genotype information for approximately 40 000 single nucleotide polymorphism (SNP) markers by Illumina BovineSNP50 BeadChip was available, and imputation accuracy was assessed based on the average concordance rates of the genotypes, varying equally spaced SNP densities, and the number of individuals in the reference population. Two additional statistics were also calculated as indicators of imputation performance. The concordance rates tended to be lower for SNPs with greater minor allele frequencies, or those located near the ends of the chromosomes. Longer autosomes yielded greater imputation accuracies than shorter ones. When SNPs were selected based on linkage disequilibrium information, relative imputation accuracy was slightly improved. When 3000 and 10 000 equally spaced SNPs were used, the imputation accuracies were greater than 90% and approximately 97%, respectively. These results indicate that combining genotyping using a lower‐density SNP chip with genotype imputation based on a population of individuals genotyped using a higher‐density SNP chip is a cost‐effective and valid approach for genomic prediction.  相似文献   

4.
The objective of this study was to investigate the accuracy of genomic prediction of body weight and eating quality traits in a numerically small sheep population (Dorper sheep). Prediction was based on a large multi-breed/admixed reference population and using (a) 50k or 500k single nucleotide polymorphism (SNP) genotypes, (b) imputed whole-genome sequencing data (~31 million), (c) selected SNPs from whole genome sequence data and (d) 50k SNP genotypes plus selected SNPs from whole-genome sequence data. Furthermore, the impact of using a breed-adjusted genomic relationship matrix on accuracy of genomic breeding value was assessed. The selection of genetic variants was based on an association study performed on imputed whole-genome sequence data in an independent population, which was chosen either randomly from the base population or according to higher genetic proximity to the target population. Genomic prediction was based on genomic best linear unbiased prediction (GBLUP), and the accuracy of genomic prediction was assessed according to the correlation between genomic breeding value and corrected phenotypes divided by the square root of trait heritability. The accuracy of genomic prediction was between 0.20 and 0.30 across different traits based on common 50k SNP genotypes, which improved on average by 0.06 (absolute value) on average based on using prioritized genetic markers from whole-genome sequence data. Using prioritized genetic markers from a genetically more related GWAS population resulted in slightly higher prediction accuracy (0.02 absolute value) compared to genetic markers derived from a random GWAS population. Using high-density SNP genotypes or imputed whole-genome sequence data in GBLUP showed almost no improvement in genomic prediction accuracy however, accounting for different marker allele frequencies in reference population according to a breed-adjusted GRM resulted to on average 0.024 (absolute value) increase in accuracy of genomic prediction.  相似文献   

5.
Boar reproductive traits are economically important for the pig industry. Here we conducted a genome‐wide association study (GWAS) for 13 reproductive traits measured on 205 F2 boars at day 300 using 60 K single nucleotide polymorphism (SNP) data imputed from a reference panel of 1200 pigs in a White Duroc × Erhualian F2 intercross population. We identified 10 significant loci for seven traits on eight pig chromosomes (SSC). Two loci surpassed the genome‐wide significance level, including one for epididymal weight around 60.25 Mb on SSC7 and one for semen temperature around 43.69 Mb on SSC4. Four of the 10 significant loci that we identified were consistent with previously reported quantitative trait loci for boar reproduction traits. We highlighted several interesting candidate genes at these loci, including APN, TEP1, PARP2, SPINK1 and PDE1C. To evaluate the imputation accuracy, we further genotyped nine GWAS top SNPs using PCR restriction fragment length polymorphism or Sanger sequencing. We found an average of 91.44% of genotype concordance, 95.36% of allelic concordance and 0.85 of r2 correlation between imputed and real genotype data. This indicates that our GWAS mapping results based on imputed SNP data are reliable, providing insights into the genetic basis of boar reproductive traits.  相似文献   

6.
The Genome Analysis Toolkit(GATK) is a popular set of programs for discovering and genotyping variants from next-generation sequencing data. The current GATK recommendation for RNA sequencing(RNA-seq) is to perform variant calling from individual samples, with the drawback that only variable positions are reported. Versions 3.0 and above of GATK offer the possibility of calling DNA variants on cohorts of samples using the HaplotypeCaller algorithm in Genomic Variant Call Format(GVCF) mode. Using this approach, variants are called individually on each sample, generating one GVCF file per sample that lists genotype likelihoods and their genome annotations. In a second step, variants are called from the GVCF files through a joint genotyping analysis. This strategy is more flexible and reduces computational challenges in comparison to the traditional joint discovery workflow. Using a GVCF workflow for mining SNP in RNA-seq data provides substantial advantages, including reporting homozygous genotypes for the reference allele as well as missing data. Taking advantage of RNA-seq data derived from primary macrophages isolated from 50 cows, the GATK joint genotyping method for calling variants on RNA-seq data was validated by comparing this approach to a so-called "per-sample" method. In addition, pair-wise comparisons of the two methods were performed to evaluate their respective sensitivity, precision and accuracy using DNA genotypes from a companion study including the same 50 cows genotyped using either genotyping-by-sequencing or with the Bovine SNP50 Beadchip(imputed to the Bovine high density). Results indicate that both approaches are very close in their capacity of detecting reference variants and that the joint genotyping method is more sensitive than the per-sample method. Given that the joint genotyping method is more flexible and technically easier, we recommend this approach for variant calling in RNA-seq experiments.  相似文献   

7.
Reliable genomic prediction of breeding values for quantitative traits requires the availability of sufficient number of animals with genotypes and phenotypes in the training set. As of 31 October 2016, there were 3,797 Brangus animals with genotypes and phenotypes. These Brangus animals were genotyped using different commercial SNP chips. Of them, the largest group consisted of 1,535 animals genotyped by the GGP‐LDV4 SNP chip. The remaining 2,262 genotypes were imputed to the SNP content of the GGP‐LDV4 chip, so that the number of animals available for training the genomic prediction models was more than doubled. The present study showed that the pooling of animals with both original or imputed 40K SNP genotypes substantially increased genomic prediction accuracies on the ten traits. By supplementing imputed genotypes, the relative gains in genomic prediction accuracies on estimated breeding values (EBV) were from 12.60% to 31.27%, and the relative gain in genomic prediction accuracies on de‐regressed EBV was slightly small (i.e. 0.87%–18.75%). The present study also compared the performance of five genomic prediction models and two cross‐validation methods. The five genomic models predicted EBV and de‐regressed EBV of the ten traits similarly well. Of the two cross‐validation methods, leave‐one‐out cross‐validation maximized the number of animals at the stage of training for genomic prediction. Genomic prediction accuracy (GPA) on the ten quantitative traits was validated in 1,106 newly genotyped Brangus animals based on the SNP effects estimated in the previous set of 3,797 Brangus animals, and they were slightly lower than GPA in the original data. The present study was the first to leverage currently available genotype and phenotype resources in order to harness genomic prediction in Brangus beef cattle.  相似文献   

8.
Missing genotypes are a common feature of high density SNP datasets obtained using SNP chip technology and this is likely to decrease the accuracy of genomic selection. This problem can be circumvented by imputing the missing genotypes with estimated genotypes. When implementing imputation, the criteria used for SNP data quality control and whether to perform imputation before or after data quality control need to consider. In this paper, we compared six strategies of imputation and quality control using different imputation methods, different quality control criteria and by changing the order of imputation and quality control, against a real dataset of milk production traits in Chinese Holstein cattle. The results demonstrated that, no matter what imputation method and quality control criteria were used, strategies with imputation before quality control performed better than strategies with imputation after quality control in terms of accuracy of genomic selection. The different imputation methods and quality control criteria did not significantly influence the accuracy of genomic selection. We concluded that performing imputation before quality control could increase the accuracy of genomic selection, especially when the rate of missing genotypes is high and the reference population is small.  相似文献   

9.
Genomic prediction has become the new standard for genetic improvement programs, and currently, there is a desire to implement this technology for the evaluation of Angus cattle in Brazil. Thus, the main objective of this study was to assess the feasibility of evaluating young Brazilian Angus (BA) bulls and heifers for 12 routinely recorded traits using single-step genomic BLUP (ssGBLUP) with and without genotypes from American Angus (AA) sires. The second objective was to obtain estimates of effective population size (Ne) and linkage disequilibrium (LD) in the Brazilian Angus population. The dataset contained phenotypic information for up to 277,661 animals belonging to the Promebo breeding program, pedigree for 362,900, of which 1,386 were genotyped for 50k, 77k, and 150k single nucleotide polymorphism (SNP) panels. After imputation and quality control, 61,666 SNPs were available for the analyses. In addition, genotypes from 332 American Angus (AA) sires widely used in Brazil were retrieved from the AA Association database to be used for genomic predictions. Bivariate animal models were used to estimate variance components, traditional EBV, and genomic EBV (GEBV). Validation was carried out with the linear regression method (LR) using young-genotyped animals born between 2013 and 2015 without phenotypes in the reduced dataset and with records in the complete dataset. Validation animals were further split into progeny of BA and AA sires to evaluate if their progenies would benefit by including genotypes from AA sires. The Ne was 254 based on pedigree and 197 based on LD, and the average LD (±SD) and distance between adjacent single nucleotide polymorphisms (SNPs) across all chromosomes were 0.27 (±0.27) and 40743.68 bp, respectively. Prediction accuracies with ssGBLUP outperformed BLUP for all traits, improving accuracies by, on average, 16% for BA young bulls and heifers. The GEBV prediction accuracies ranged from 0.37 (total maternal for weaning weight and tick count) to 0.54 (yearling precocity) across all traits, and dispersion (LR coefficients) fluctuated between 0.92 and 1.06. Inclusion of genotyped sires from the AA improved GEBV accuracies by 2%, on average, compared to using only the BA reference population. Our study indicated that genomic information could help us to improve GEBV accuracies and hence genetic progress in the Brazilian Angus population. The inclusion of genotypes from American Angus sires heavily used in Brazil just marginally increased the GEBV accuracies for selection candidates.  相似文献   

10.
The objective of this study was to evaluate, using three different genotype density panels, the accuracy of imputation from lower‐ to higher‐density genotypes in dairy and beef cattle. High‐density genotypes consisting of 777 962 single‐nucleotide polymorphisms (SNP) were available on 3122 animals comprised of 269, 196, 710, 234, 719, 730 and 264 Angus, Belgian Blue, Charolais, Hereford, Holstein‐Friesian, Limousin and Simmental bulls, respectively. Three different genotype densities were generated: low density (LD; 6501 autosomal SNPs), medium density (50K; 47 770 autosomal SNPs) and high density (HD; 735 151 autosomal SNPs). Imputation from lower‐ to higher‐density genotype platforms was undertaken within and across breeds exploiting population‐wide linkage disequilibrium. The mean allele concordance rate per breed from LD to HD when undertaken using a single breed or multiple breed reference population varied from 0.956 to 0.974 and from 0.947 to 0.967, respectively. The mean allele concordance rate per breed from 50K to HD when undertaken using a single breed or multiple breed reference population varied from 0.987 to 0.994 and from 0.987 to 0.993, respectively. The accuracy of imputation was generally greater when the reference population was solely comprised of the breed to be imputed compared to when the reference population comprised of multiple breeds, although the impact was less when imputing from 50K to HD compared to imputing from LD.  相似文献   

11.
The Algorithm for Proven and Young (APY) enables the implementation of single‐step genomic BLUP (ssGBLUP) in large, genotyped populations by separating genotyped animals into core and non‐core subsets and creating a computationally efficient inverse for the genomic relationship matrix ( G ). As APY became the choice for large‐scale genomic evaluations in BLUP‐based methods, a common question is how to choose the animals in the core subset. We compared several core definitions to answer this question. Simulations comprised a moderately heritable trait for 95,010 animals and 50,000 genotypes for animals across five generations. Genotypes consisted of 25,500 SNP distributed across 15 chromosomes. Genotyping errors and missing pedigree were also mimicked. Core animals were defined based on individual generations, equal representation across generations, and at random. For a sufficiently large core size, core definitions had the same accuracies and biases, even if the core animals had imperfect genotypes. When genotyped animals had unknown parents, accuracy and bias were significantly better (p ≤ .05) for random and across generation core definitions.  相似文献   

12.
为探究基于A矩阵期望遗传关系最大化(maximizing the expected genetic relationship for matrix A,RELA)、基于A矩阵目标群体遗传方差最小化(minimized the target population genetic variance for matrix A,MCA)、平均亲缘关系最大化(the highest mean kinship coefficients,KIN)、随机选择(random selection,RAN)、共同祖先筛选(common ancestor,CA)等不同参考群筛选方法及参考群规模对基因型填充准确性的影响。本研究使用矮小型黄羽肉鸡作为试验群体,采用鸡600K SNP芯片(Affymetrix Axion HD genotyping array)进行基因分型,测定435羽子代公鸡45、56、70、84、91日龄体重。利用Beagle软件将低密度SNP芯片填充为高密度SNP芯片数据,比较不同参考群筛选方法、参考群规模对基因型填充准确性的影响,以及填充芯片基因组预测准确性。结果表明,使用Beagle 4.0结合系谱信息进行填充效果最佳,其次为Beagle 4.0,而Beagle 5.1填充效果最差。使用MCA方法筛选参考群进行基因型填充准确性最高,使用RAN方法筛选参考群进行基因型填充准确性最低,MCA、RELA、CA 3种方法基因型填充准确性差别较小。相比其他方法,使用MCA方法筛选个体作为参考群将低密度SNP芯片填充至高密度SNP芯片进行基因组选择的预测准确性较高,与真实高密度SNP芯片的基因组预测准确性相差甚微。随着参考群规模增大,基因型填充准确性也随之增加,但增速逐渐下降,最后趋于平缓。综上所述,可以通过参考群筛选方法构建参考群以及控制参考群规模,以保证基因型填充和基因组预测准确性并节省成本,本研究为基因型填充在畜禽遗传育种中的应用提供技术参考。  相似文献   

13.
旨在探究低密度液相芯片在生产实践中的实用性,降低育种成本。本试验选用了3 761头约160日龄,110 kg左右健康大白猪,随机抽取100头大白猪,根据10K芯片标记信息,从50K芯片中抽取标记生成10K芯片,作为填充群体。再从剩余群体中,分别随机抽取800、2 000、3 600个个体作为参考群体,使用Beagle 4.1软件对100头填充群体进行基因型填充至50K芯片,重复10次,以基因型一致性和基因型相关系数来评价基因型填充的准确性。结果表明,10K和50K芯片平均连锁不平衡(r2)程度为0.227和0.258,相差不大。最小等位基因频率(MAF)为0.05是基因型填充准确性的拐点,剔除掉MAF<0.05标记后,填充准确性明显升高。填充准确性随参考群体规模增大而上升,参考群由800头扩大到3 600头,填充准确性从0.90提高到0.95,10次重复的标准差也从0.006下降到0.002。对于较小的参考群体规模,染色体基因型填充准确性波动较大,随着参考群体规模增大,每条染色体填充准确性相差不大。本研究结果表明,猪液相芯片从10K填充到50K是可行的,可以大规模用于基因组选择,降低基因组选择育种成本。  相似文献   

14.
The influence of genotype imputation using low‐density single nucleotide polymorphism (SNP) marker subsets on the genomic relationship matrix (G matrix), genetic variance explained, and genomic prediction (GP) was investigated for carcass weight and marbling score in Japanese Black fattened steers, using genotype data of approximately 40,000 SNPs. Genotypes were imputed using equally spaced SNP subsets of different densities. Two different linear models were used. The first (model 1) incorporated one G matrix, while the second (model 2) used two different G matrices constructed using the selected and remaining SNPs. When using model 1, the estimated additive genetic variance was always larger when using all SNPs obtained via genotype imputation than when using only equally spaced SNP subsets. The correlations between the genomic estimated breeding values obtained using genotype imputation with at least 3,000 SNPs and those using all available SNPs without imputation were higher than 0.99 for both traits. While additive genetic variance was likely to be partitioned with model 2, it did not enhance the accuracy of GP compared with model 1. These results indicate that genotype imputation using an equally spaced low‐density panel of an appropriate size can be used to produce a cost‐effective, valid GP.  相似文献   

15.
Deoxyribonucleic acid-based tests were used to assign paternity to 625 calves from a multiple-sire breeding pasture. There was a large variability in calf output and a large proportion of young bulls that did not sire any offspring. Five of 27 herd sires produced over 50% of the calves, whereas 10 sires produced no progeny and 9 of these were yearling bulls. A comparison was made between the paternity results obtained when using a DNA marker panel with a high (0.999), cumulative parentage exclusion probability (P(E)) and those obtained when using a marker panel with a lower P(E) (0.956). A large percentage (67%) of the calves had multiple qualifying sires when using the lower resolution panel. Assignment of the most probable sire using a likelihood-based method based on genotypic information resolved this problem in approximately 80% of the cases, resulting in 75% agreement between the 2 marker panels. The correlation between weaning weight, on-farm EPD based on pedigrees inferred from the 2 marker panels was 0.94 for the 24 bulls that sired progeny. Partial progeny assignments inferred from the lower resolution panel resulted in the generation of EPD for bulls that actually sired no progeny according to the high-P(E) panel, although the Beef Improvement Federation accuracies of EPD for these bulls were never greater than 0.14. Simulations were performed to model the effect of loci number, minor allele frequency, and the number of offspring per bull on the accuracy of genetic evaluations based on parentage determinations derived from SNP marker panels. The SNP marker panels of 36 and 40 loci produced EPD with accuracies nearly identical to those EPD resulting from use of the true pedigree. However, in field situations where factors including variable calf output per sire, large sire cohorts, relatedness among sires, low minor allele frequencies, and missing data can occur concurrently, the use of marker panels with a larger number of SNP loci will be required to obtain accurate on-farm EPD.  相似文献   

16.
This study investigated the effect of including Nordic Holsteins in the reference population on the imputation accuracy and prediction accuracy for Chinese Holsteins. The data used in this study include 85 Chinese Holstein bulls genotyped with both 54K chip and 777K (HD) chip, 2862 Chinese cows genotyped with 54K chip, 510 Nordic Holstein bulls genotyped with HD chip, and 4398 Nordic Holstein bulls genotyped with 54K chip and with deregressed proofs for five milk production traits. Based on these data, the accuracy of imputation from 54K to HD marker data and the accuracy of genomic predictions in Chinese Holstein were assessed. The allele correct rate increased around 2.7 and 1.7% in imputation from the 54K to the HD marker data for Chinese Holstein bulls and cows, respectively, when the Nordic HD‐genotyped bulls were included in the reference data for imputation. However, the prediction accuracy was improved slightly when using the marker data imputed based on the combined HD reference data, compared with using the marker data imputed based on the Chinese HD reference data only. On the other hand, when using the combined reference population including 4398 Nordic Holstein bulls, the accuracy of genomic predictions increased 6.5 percentage points together with a reduction of prediction bias. The HD markers did not outperform the 54K markers in genomic prediction based on the present data. The results indicate that for Chinese Holsteins, it is necessary to genotype more individuals with 54K chip to increase reference population rather than increasing marker density.  相似文献   

17.
Linkage disequilibrium (LD) plays an important role in genomic selection and mapping of quantitative trait loci (QTL). This study investigated the pattern of LD and effective population size (Ne) in Gir cattle selected for yearling weight. For this purpose, 173 animals with imputed genotypes (from 18 animals genotyped with the Illumina BovineHD BeadChip and 155 animals genotyped with the Bovine LDv4 panel) were analysed. The LD was evaluated at distances of 25–50 kb, 50–100 kb, 100–500 kb and 0.5–1 Mb. The Ne was estimated based on 5 past generations. The r2 values (a measure of LD) were, respectively, .35, .29, .18 and .032 for the distances evaluated. The LD estimates decreased with increasing distance of SNP pairs and LD persisted up to a distance of 100 kb (r2 = .29). The Ne was greater in generations 4 and 5 (24 and 30 animals, respectively) and declined drastically after the last generation (12 animals). The results showed high levels of LD and low Ne, which were probably due to the loss of genetic variability as a consequence of the structure of the Gir population studied.  相似文献   

18.
Background: Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence(WGS) data. However, sequencing thousands of individuals of interest is expensive.Imputation from SNP panels to WGS data is an attractive and less expensive approach to obtain WGS data. The aims of this study were to investigate the accuracy of imputation and to provide insight into the design and execution of genotype imputation.Results: We genotyped 450 chickens with a 600 K SNP array, and sequenced 24 key individuals by whole genome re-sequencing. Accuracy of imputation from putative 60 K and 600 K array data to WGS data was 0.620 and 0.812 for Beagle, and 0.810 and 0.914 for FImpute, respectively. By increasing the sequencing cost from 24 X to 144 X, the imputation accuracy increased from 0.525 to 0.698 for Beagle and from 0.654 to 0.823 for FImpute. With fixed sequence depth(12 X), increasing the number of sequenced animals from 1 to 24, improved accuracy from 0.421 to0.897 for FImpute and from 0.396 to 0.777 for Beagle. Using optimally selected key individuals resulted in a higher imputation accuracy compared with using randomly selected individuals as a reference population for resequencing. With fixed reference population size(24), imputation accuracy increased from 0.654 to 0.875 for FImpute and from 0.512 to 0.762 for Beagle as the sequencing depth increased from 1 X to 12 X. With a given total cost of genotyping, accuracy increased with the size of the reference population for FImpute, but the pattern was not valid for Beagle, which showed the highest accuracy at six fold coverage for the scenarios used in this study.Conclusions: In conclusion, we comprehensively investigated the impacts of several key factors on genotype imputation. Generally, increasing sequencing cost gave a higher imputation accuracy. But with a fixed sequencing cost, the optimal imputation enhance the performance of WGP and GWAS. An optimal imputation strategy should take size of reference population, imputation algorithms, marker density, and population structure of the target population and methods to select key individuals into consideration comprehensively. This work sheds additional light on how to design and execute genotype imputation for livestock populations.  相似文献   

19.
There is an increasing interest in using whole‐genome sequence data in genomic selection breeding programmes. Prediction of breeding values is expected to be more accurate when whole‐genome sequence is used, because the causal mutations are assumed to be in the data. We performed genomic prediction for the number of eggs in white layers using imputed whole‐genome resequence data including ~4.6 million SNPs. The prediction accuracies based on sequence data were compared with the accuracies from the 60 K SNP panel. Predictions were based on genomic best linear unbiased prediction (GBLUP) as well as a Bayesian variable selection model (BayesC). Moreover, the prediction accuracy from using different types of variants (synonymous, non‐synonymous and non‐coding SNPs) was evaluated. Genomic prediction using the 60 K SNP panel resulted in a prediction accuracy of 0.74 when GBLUP was applied. With sequence data, there was a small increase (~1%) in prediction accuracy over the 60 K genotypes. With both 60 K SNP panel and sequence data, GBLUP slightly outperformed BayesC in predicting the breeding values. Selection of SNPs more likely to affect the phenotype (i.e. non‐synonymous SNPs) did not improve the accuracy of genomic prediction. The fact that sequence data were based on imputation from a small number of sequenced animals may have limited the potential to improve the prediction accuracy. A small reference population (n = 1004) and possible exclusion of many causal SNPs during quality control can be other possible reasons for limited benefit of sequence data. We expect, however, that the limited improvement is because the 60 K SNP panel was already sufficiently dense to accurately determine the relationships between animals in our data.  相似文献   

20.
The aim of this study was to perform a Bayesian genomewide association study (GWAS) to identify genomic regions associated with growth traits in Hereford and Braford cattle, and to select Tag-SNPs to represent these regions in low-density panels useful for genomic predictions. In addition, we propose candidate genes through functional enrichment analysis associated with growth traits using Medical Subject Headings (MeSH). Phenotypic data from 126,290 animals and genotypes for 131 sires and 3,545 animals were used. The Tag-SNPs were selected with BayesB (π = 0.995) method to compose low-density panels. The number of Tag-single nucleotide polymorphism (SNP) ranged between 79 and 103 SNP for the growth traits at weaning and between 78 and 100 SNP for the yearling growth traits. The average proportion of variance explained by Tag-SNP with BayesA was 0.29, 0.23, 0.32 and 0.19 for birthweight (BW), weaning weight (WW205), yearling weight (YW550) and postweaning gain (PWG345), respectively. For Tag-SNP with BayesA method accuracy values ranged from 0.13 to 0.30 for k-means and from 0.30 to 0.65 for random clustering of animals to compose reference and validation groups. Although genomic prediction accuracies were higher with the full marker panel, predictions with low-density panels retained on average 76% of the accuracy obtained with BayesB with full markers for growth traits. The MeSH analysis was able to translate genomic information providing biological meanings of more specific gene products related to the growth traits. The proposed Tag-SNP panels may be useful for future fine mapping studies and for lower-cost commercial genomic prediction applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号