首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基因型填充策略研究
引用本文:邓天宇,杜立新,王立贤,赵福平.基因型填充策略研究[J].畜牧兽医学报,2020,51(9):2068-2078.
作者姓名:邓天宇  杜立新  王立贤  赵福平
作者单位:中国农业科学院北京畜牧兽医研究所, 农业部动物遗传育种与繁殖(家禽)重点实验室, 北京 100193
基金项目:国家自然科学基金(31572357);国家生猪产业技术体系(CARS-35)
摘    要:基因组数据在畜禽遗传育种中的应用越来越广泛,基因型填充作为基因组数据处理的重要工具,填充结果的好坏直接影响后续分析,为了得到好的填充结果,需要制定完善的填充策略。本研究通过模拟数据探讨参考群体大小、目标群体与参考群体间遗传关系(距离)远近、目标位点数目(比例)、最小等位基因频率以及填充算法等因素对基因型填充效果的影响。结果表明,目标位点数目与填充效果呈显著的正相关(P<0.05),是影响基因型填充准确性的主要因素;参考群体大小是影响Beagle5.1填充错误率的主要因素,目标位点数目是影响Minimac4填充错误率的主要因素;目标群体和参考群体的遗传距离对Beagle5.1填充效果的影响较Minimac4更为显著;一般情况下,最小等位基因频率越高的位点填充错误率越高;在参考群体个体数量少且目标位点数目多的情况下,Minimac4的填充速度优于Beagle5.1,但随参考群体个体数目增加有逆趋势。在保证填充质量的前提下,Beagle5.1对本研究中几种因素的标准要求相对较低。相对地,当目标群体位点数目较低,参考群体个体数目较多时,Beagle5.1的填充效果更好,而Minimac4更适合参考群体个体数目较少,目标群体位点数目较高的填充中。本研究针对不同的填充目的制定了不同策略,为基因型填充标准提供了参考。

关 键 词:基因型填充  模拟数据  参考群体大小  填充算法  错误率  
收稿时间:2020-03-12

Study on the Strategies of Genotype Imputation
DENG Tianyu,DU Lixin,WANG Lixian,ZHAO Fuping.Study on the Strategies of Genotype Imputation[J].Acta Veterinaria et Zootechnica Sinica,2020,51(9):2068-2078.
Authors:DENG Tianyu  DU Lixin  WANG Lixian  ZHAO Fuping
Institution:Key Laboratory of Animal Genetics, Breeding and Reproduction(poultry) of Ministry of Agriculture, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
Abstract:Genomic data is more and more widely used in livestock breeding. Genotype imputation is an important tool to handle missing values in genotypic data, and the quality of imputation results directly affects the subsequent analysis. To obtain good imputation results, a comprehensive imputation strategy needs to be formulated. We studied on the effects of several factors on genotype imputation by simulation. The factors included reference population size, genetic relationship (distance) between the target population and the reference population, the number of target sites (proportion), the minimum allele frequency (MAF), and the imputation algorithm. The results showed that the number of target sites was the main factor affecting the genotype imputation, and it showed significantly positive correlation with the quality of imputation(P<0.05). The reference population size was the main factor affecting the imputation error rate in Beagle5.1. Correspondingly, the number of target sites was the main factor affecting the imputation error rate in Minimac4. Genetic distance between the target population and the reference population had a more significant effect on the imputation quality of Beagle5.1 than Minimac4. In general, the imputation error rate increased as the increases of MAF in a site. When the number of individuals in the reference population was small and the number of target sites was large, the speed of Minimac4 was superior to Beagle5.1, but there was a reverse trend as the reference population size increased. On the premise of ensuring the imputation quality, Beagle5.1 had relatively lower requirements for the above factors. In contrast, when the number of target sites was low and reference population size was large, the imputation effect of Beagle5.1 was better, while Minimac4 was more suitable for the imputation of a small reference population size and a higher number of target sites. In this study, different strategies were formulated for different imputation purposes, and the study results would provide a reference for genotype imputation.
Keywords:genotype imputation  simulation data  reference population size  imputation method  error rate  
本文献已被 CNKI 等数据库收录!
点击此处可从《畜牧兽医学报》浏览原始摘要信息
点击此处可从《畜牧兽医学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号