首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The purpose of this study is to present guidelines in selection of statistical and computing algorithms for variance components estimation when computing involves software packages. For this purpose two major methods are to be considered: residual maximal likelihood (REML) and Bayesian via Gibbs sampling. Expectation‐Maximization (EM) REML is regarded as a very stable algorithm that is able to converge when covariance matrices are close to singular, however it is slow. However, convergence problems can occur with random regression models, especially if the starting values are much lower than those at convergence. Average Information (AI) REML is much faster for common problems but it relies on heuristics for convergence, and it may be very slow or even diverge for complex models. REML algorithms for general models become unstable with larger number of traits. REML by canonical transformation is stable in such cases but can support only a limited class of models. In general, REML algorithms are difficult to program. Bayesian methods via Gibbs sampling are much easier to program than REML, especially for complex models, and they can support much larger datasets; however, the termination criterion can be hard to determine, and the quality of estimates depends on a number of details. Computing speed varies with computing optimizations, with which some large data sets and complex models can be supported in a reasonable time; however, optimizations increase complexity of programming and restrict the types of models applicable. Several examples from past research are discussed to illustrate the fact that different problems required different methods.  相似文献   

2.
Frequentist and Bayesian approaches to scientific inference in animal breeding are discussed. Routine methods in animal breeding (selection index, BLUP, ML, REML) are presented under the hypotheses of both schools of inference, and their properties are examined in both cases. The Bayesian approach is discussed in cases in which prior information is available, prior information is available under certain hypotheses, prior information is vague, and there is no prior information. Bayesian prediction of genetic values and genetic parameters are presented. Finally, the frequentist and Bayesian approaches are compared from a theoretical and a practical point of view. Some problems for which Bayesian methods can be particularly useful are discussed. Both Bayesian and frequentist schools of inference are established, and now neither of them has operational difficulties, with the exception of some complex cases. There is software available to analyze a large variety of problems from either point of view. The choice of one school or the other should be related to whether there are solutions in one school that the other does not offer, to how easily the problems are solved, and to how comfortable scientists feel with the way they convey their results.  相似文献   

3.
利用Bayesian—MCMC方法对不同家系结构的畜禽远交群体的二级离散性状进行QTL连锁检测,在分析中,基于IBD方差组分的随机模型的定位策略,同时利用MCMC的3种不同抽样技术(Gibbs抽样、Metropolis抽样和Reversible jump MCMC抽样)产生相应QTL参数的后验样本,在此基础上进行目标参数的Bayesian统计推断。结果表明:Bayesian—MCMC方法能够对QTL数目进行准确估计,并且在不同家系结构下得到较为理想的参数估计结果。  相似文献   

4.
First parity calving difficulty scores from Italian Piemontese cattle were analysed using a threshold mixed effects model. The model included the fixed effects of age of dam and sex of calf and their interaction and the random effects of sire, maternal grandsire, and herd‐year‐season. Covariances between sire and maternal grandsire effects were modelled using a numerator relationship matrix based on male ancestors. Field data consisted of 23 953 records collected between 1989 and 1998 from 4741 herd‐year‐seasons. Variance and covariance components were estimated using two alternative approximate marginal maximum likelihood (MML) methods, one based on expectation‐maximization (EM) and the other based on Laplacian integration. Inferences were compared to those based on three separate runs or sequences of Markov Chain Monte Carlo (MCMC) sampling in order to assess the validity of approximate MML estimates derived from data with similar size and design structure. Point estimates of direct heritability were 0.24, 0.25 and 0.26 for EM, Laplacian and MCMC (posterior mean), respectively, whereas corresponding maternal heritability estimates were 0.10, 0.11 and 0.12, respectively. The covariance between additive direct and maternal effects was found to be not different from zero based on MCMC‐derived confidence sets. The conventional joint modal estimates of sire effects and associated standard errors based on MML estimates of variance and covariance components differed little from the respective posterior means and standard deviations derived from MCMC. Therefore, there may be little need to pursue computation‐intensive MCMC methods for inference on genetic parameters and genetic merits using conventional threshold sire and maternal grandsire models for large datasets on calving ease.  相似文献   

5.
In Chile, Mycobacterium avium subsp. paratuberculosis (Map) has been isolated on several occasions and clinical cases have been reported. Nevertheless, diagnostic tests have not yet been validated for this agent in the Chilean setting. The objective of the study was to validate a commercial ELISA to detect Map shedding dairy cows in management conditions, prevalence and stages of infection existing in Southern Chile, utilising different statistical approaches. Blood and faeces were collected from 1333 lactating cows in 27 dairy herds (both large commercial and smallholder dairy farms) between September 2003 and August 2004. Within the herds up to a maximum of 100 dairy cows were selected based on age (>or=3 years old) and, if present, clinical signs of a Map infection. In herds with less than 100 cows, all cows >or=3 years old were sampled. Blood samples were tested using a commercial ELISA kit (IDEXX Laboratories, Inc.). Faecal samples were cultured on Herrold's Egg Yolk Medium (HEYM). Latent class models (i.e. maximum likelihood (ML) methods and Bayesian inference) were used to determine the validity of the ELISA. Map was cultured from 54 (4.1%) cows and 10 (37.0%) herds, which were all large, commercial dairy herds. As a result of empty cells in the cross-tabulations, the ML model provided the same results as the validation with faecal culture as the gold-standard. In the Bayesian model, the Se and Sp of the ELISA were estimated to be 26% (95% CI: 18-35%) and 98.5% (95% CI: 97.4-99.4%), respectively. For faecal culture, the Se was 54% (95% CI: 46-62%) and the Sp was 100% (95% CI: 99.9-100%). Interestingly, the prevalence in the smallholder dairy farms was estimated to be 8% even though there were no faecal culture positive cows detected in those herds. There was no significant correlation between the two tests. The advantage of Bayesian inference is that the Se and Sp of both tests are obtained in one model relative to the (latent) true disease status, the model can handle small datasets and empty cells and the estimates can be corrected for the correlation between tests when the tests are not conditionally independent. Therefore, Bayesian analysis was the preferred method for Map that lacks a gold-standard and usually has low cow-level prevalence.  相似文献   

6.
The performance of a new diagnostic test is frequently evaluated by comparison to a perfect reference test (i.e. a gold standard). In many instances, however, a reference test is less than perfect. In this paper, we review methods for estimation of the accuracy of a diagnostic test when an imperfect reference test with known classification errors is available. Furthermore, we focus our presentation on available methods of estimation of test characteristics when the sensitivity and specificity of both tests are unknown. We present some of the available statistical methods for estimation of the accuracy of diagnostic tests when a reference test does not exist (including maximum likelihood estimation and Bayesian inference). We illustrate the application of the described methods using data from an evaluation of a nested polymerase chain reaction and microscopic examination of kidney imprints for detection of Nucleospora salmonis in rainbow trout.  相似文献   

7.
Interval-censoring occurs in survival analysis when the time until an event of interest is not known precisely (and instead, only is known to fall into a particular interval). Such censoring commonly is produced when periodic assessments (usually clinical or laboratory examinations) are used to assess if the event has occurred. My objectives were to raise awareness about interval-censoring including its existence, the potential ramifications of ignoring its existence, the different types of interval-censored data, and the analytical methods to analyze such data (including availability in standard statistical software). Asynchronous interval-censored survival analysis was demonstrated by parametric evaluation of risk factors for the time to first detected shedding of Salmonella muenster (identified by repeated periodic fecal cultures) for a herd of dairy cows. These results were compared with those from survival analyses which ignored or approximated the interval-censoring.

Ignoring or approximating the asynchronous interval-censoring in the survival analysis generally resulted in the risk factors’ regression coefficients having the same signs and a decrease (often >50%) in their absolute size. All the standard errors from the three methods of approximating the interval-censoring were <40% of their interval-censored counterparts. The conclusions drawn from the asynchronous interval-censored analysis versus those from the approximations varied dramatically. (The general conclusion from the approximations was that none of the risk factors for this example warranted further consideration.) That ignoring or approximating the left- and interval-censored nature of the dependent variable resulted in biased results was consistent with the literature.

In the currently available asynchronous interval-censored models, the inclusion of time-dependent covariates that vary continuously is awkward. Statistical models for the semi-parametric estimation of asynchronous interval-censored survival analysis are not generally available in standard statistical software.  相似文献   


8.
This work focuses on the effects of variable amount of genomic information in the Bayesian estimation of unknown variance components associated with single‐step genomic prediction. We propose a quantitative criterion for the amount of genomic information included in the model and use it to study the relative effect of genomic data on efficiency of sampling from the posterior distribution of parameters of the single‐step model when conducting a Bayesian analysis with estimating unknown variances. The rate of change of estimated variances was dependent on the amount of genomic information involved in the analysis, but did not depend on the Gibbs updating schemes applied for sampling realizations of the posterior distribution. Simulation revealed a gradual deterioration of convergence rates for the locations parameters when new genomic data were gradually added into the analysis. In contrast, the convergence of variance components showed continuous improvement under the same conditions. The sampling efficiency increased proportionally to the amount of genomic information. In addition, an optimal amount of genomic information in variance–covariance matrix that guaranty the most (computationally) efficient analysis was found to correspond a proportion of animals genotyped ***0.8. The proposed criterion yield a characterization of expected performance of the Gibbs sampler if the analysis is subject to adjustment of the amount of genomic data and can be used to guide researchers on how large a proportion of animals should be genotyped in order to attain an efficient analysis.  相似文献   

9.
10.
Our aim was to assess the seroprevalence of Chlamydophila (Cd) abortus (Chlamydia psittaci serotype 1), denoted ovine enzootic abortion (OEA), in the Swiss sheep population. A competitive enzyme-linked immunosorbent assay (cELISA) was adapted for the investigation of pooled serum samples (pool approach) and receiver-operator characteristic (ROC) analysis was applied to define the cut-off of the pool approach. At a cut-off value of 30% inhibition, the flock-level pooled sensitivity and specificity were 92.9% and 97.6% when compared to classifying the flock based on individual-animal samples.

Subsequently, sera from 775 randomly selected flocks out of 11 cantons of Switzerland were investigated using the pool approach. The cantons included in the study represented 72% of the Swiss sheep flocks and 76% of Swiss sheep population. Antibodies against Cd. abortus were found in almost 19% (144) of the 775 examined sheep flocks. Test prevalences were adjusted for the imperfect test characteristics using the Rogan–Gladen estimator and Bayesian inference. Seroprevalence was highest (43%) in the canton Graubünden. In the remaining 10 cantons the seroprevalence ranged from 2 to 29%. The cELISA in combination with testing pooled sera and statistical methods for true prevalence estimation provided a good survey tool at lower costs and time when compared to other approaches.  相似文献   


11.
Between holding contacts are more common over short distances and this may have implications for the dynamics of disease spread through these contacts. A reliable estimation of how contacts depend on distance is therefore important when modeling livestock diseases. In this study, we have developed a method for analyzing distant dependent contacts and applied it to animal movement data from Sweden. The data were analyzed with two competing models. The first model assumes that contacts arise from a purely distance dependent process. The second is a mixture model and assumes that, in addition, some contacts arise independent of distance. Parameters were estimated with a Bayesian Markov Chain Monte Carlo (MCMC) approach and the model probabilities were compared. We also investigated possible between model differences in predicted contact structures, using a collection of network measures.We found that the mixture model was a much better model for the data analyzed. Also, the network measures showed that the models differed considerably in predictions of contact structures, which is expected to be important for disease spread dynamics. We conclude that a model with contacts being both dependent on, and independent of, distance was preferred for modeling the example animal movement contact data.  相似文献   

12.
13.
In order to use a drug in a food producing animal, evidence has to be provided that after a certain withdrawal time, drug residues in tissues, such as muscle meat, fat, liver, kidney etc., are below a given maximum residue limit (MRL), for a majority of animals. Several statistical methods, both regression based and nonparametric based methods, have been proposed, each relying on different sets of assumptions, which may or may not hold for the specific data situation. The purpose of this paper is to enrich the range of methods, i.e. to provide approaches for situations where current methods are inappropriate. Bayesian methods, using Markov chain Monte Carlo, are proposed to derive inference on the parameters of interest.  相似文献   

14.
Genetic variation and covariation of liability to clinical mastitis in the course of first lactation in Norwegian Cattle (NRF) were investigated. The data consisted of 36,178 first-lactation cows with 354,506 clinical mastitis (absence=0 vs. presence=1) monthly records. A longitudinal binary data analysis was carried out using Bayesian threshold models and Markov chain Monte Carlo (MCMC) procedures. Liability was related to stage of lactation using random regression functions: the Ali–Schaeffer function (AS), the Wilmink function (W) and Legendre Polynomials of order 2, 3 or 4 (L2, L3, L4). Models were compared using a pseudo Bayes factor and an analysis of residuals. The MCMC scheme for the AS function did not converge after 20,000 iterations, and was therefore excluded from further analysis. The pseudo Bayes factor strongly favored the L4 model. Most posterior means of the residuals fell in the range from −0.2 to 0 when cows were healthy (a residual is negative when mastitis is absent and positive otherwise). The L4 model tended to have smaller residuals than the other three models when cows had mastitis. The posterior means of the herd variance and of the cow-specific variance were 0.0645 and 0.1084, respectively, for the fourth order Legendre polynomial. Heritability of liability to clinical mastitis was from 7% to 13% before calving, and ranged between 3% and 11% from calving to 260 days after calving. Most genetic correlations of liability to clinical mastitis between different days of first-lactation ranged from 0.4 to 0.7.  相似文献   

15.
Binary repeated measures data are commonly encountered in both experimental and observational veterinary studies. Among the wide range of statistical methods and software applicable to such data one major distinction is between marginal and random effects procedures. The objective of the study was to review and assess the performance of marginal and random effects estimation procedures for the analysis of binary repeated measures data. Two simulation studies were carried out, using relatively small, balanced, two-level (time within subjects) datasets. The first study was based on data generated from a marginal model with first order autocorrelation, the second on a random effects model with autocorrelated random effects within subjects. Three versions of the models were considered in which a dichotomous treatment was modelled additively, either between or within subjects, or modelled by a time interaction. Among the studied statistical procedures were: generalized estimating equations (GEE), Marginal Quasi Likelihood, likelihood based on numerical integration, penalized quasi-likelihood, restricted pseudo likelihood and Bayesian Markov Chain Monte Carlo. Results for data generated by the marginal model showed autoregressive GEE to be highly efficient when treatment was within subjects, even with strongly correlated responses. For treatment between subjects, random effects procedures also performed well in some situations; however, a relatively small number of subjects with a short time series proved a challenge for both marginal and random effects procedures. Results for data generated by the random effects model showed bias in estimates from random effects procedures when autocorrelation was present in the data, while the marginal procedures generally gave estimates close to the marginal parameters.  相似文献   

16.
Two analytical approaches were used to investigate the relationship between somatic cell concentrations in monthly quarter milk samples and subsequent, naturally occurring clinical mastitis in three dairy herds. Firstly, cows with clinical mastitis were selected and a conventional matched analysis was used to compare affected and unaffected quarters of the same cow. The second analysis included all cows, and in order to overcome potential bias associated with the correlation structure, a hierarchical Bayesian generalised linear mixed model was specified. A Markov chain Monte Carlo (MCMC) approach, that is Gibbs sampling, was used to estimate parameters.

The results of both the matched analysis and the hierarchical modelling suggested that quarters with a somatic cell count (SCC) in the range 41,000–100,000 cells/ml had a lower risk of clinical mastitis during the next month than quarters <41,000 cell/ml. Quarters with an SCC >200,000 cells/ml were at the greatest risk of clinical mastitis in the next month. There was a reduced risk of clinical mastitis between 1 and 2 months later in quarters with an SCC of 81,000–150,000 cells/ml compared with quarters below this level. The hierarchical modelling analysis identified a further reduced risk of clinical mastitis between 2 and 3 months later in quarters with an SCC 61,000–150,000 cells/ml, compared to other quarters.

We conclude that low concentrations of somatic cells in milk are associated with increased risk of clinical mastitis, and that high concentrations are indicative of pre-existing immunological mobilisation against infection. The variation in risk between quarters of affected cows suggests that local quarter immunological events, rather than solely whole cow factors, have an important influence on the risk of clinical mastitis. MCMC proved a useful tool for estimating parameters in a hierarchical Bernoulli model. Model construction and an approach to assessing goodness of model fit are described.  相似文献   


17.
The Markov chain Monte Carlo (MCMC) strategy provides remarkable flexibility for fitting complex hierarchical models. However, when parameters are highly correlated in their posterior distributions and their number is large, a particular MCMC algorithm may perform poorly and the resulting inferences may be affected. The objective of this study was to compare the efficiency (in terms of the asymptotic variance of features of posterior distributions of chosen parameters, and in terms of computing cost) of six MCMC strategies to sample parameters using simulated data generated with a reaction norm model with unknown covariates as an example. The six strategies are single-site Gibbs updates (SG), single-site Gibbs sampler for updating transformed (a priori independent) additive genetic values (TSG), pairwise Gibbs updates (PG), blocked (all location parameters are updated jointly) Gibbs updates (BG), Langevin-Hastings (LH) proposals, and finally Langevin-Hastings proposals for updating transformed additive genetic values (TLH). The ranking of the methods in terms of asymptotic variance is affected by the degree of the correlation structure of the data and by the true values of the parameters, and no method comes out as an overall winner across all parameters. TSG and BG show very good performance in terms of asymptotic variance especially when the posterior correlation between genetic effects is high. In terms of computing cost, TSG performs best except for dispersion parameters in the low correlation scenario where SG was the best strategy. The two LH proposals could not compete with any of the Gibbs sampling algorithms. In this study it was not possible to find an MCMC strategy that performs optimally across the range of target distributions and across all possible values of parameters. However, when the posterior correlation between parameters is high, TSG, BG and even PG show better mixing than SG.  相似文献   

18.
In the analysis of large amounts of data to obtain BLUP, large sets of mixed model equations must be solved iteratively, which can involve considerable computing time. In real life, solutions are required only periodically for breeders to choose the best individuals, so that computing time is not usually a limiting factor. In simulation studies involving evaluation of individuals by BLUP, many rounds of evaluation are required for each simulated population. Since several or many replicates are usually required to obtain an accurate result from stochastic simulations, computing time can become a limiting factor. One of the factors that can drastically affect computing time in iterative methods is the criterion for ceasing iteration, or convergence criterion (CC). With iterative methods, a disadvantage can be that the rate of convergence can be slow, or under certain circumstances not converge at all. Nevertheless, when the system converges, the more stringent the CC, the more accurate the solutions. The more stringent the CC the more iterations and hence more computing time is required. The objectives of this study were to investigate how much response to selection is affected by the stringency of the CC and how much reranking occurs among selected individuals at different levels of the convergence criteria. These explorations provided a profile analysis of the computing time spent for each of the major subroutines in the BBSIM program.  相似文献   

19.
Markov chain Monte Carlo (MCMC) enables fitting complex hierarchical models that may adequately reflect the process of data generation. Some of these models may contain more parameters than can be uniquely inferred from the distribution of the data, causing non‐identifiability. The reaction norm model with unknown covariates (RNUC) is a model in which unknown environmental effects can be inferred jointly with the remaining parameters. The problem of identifiability of parameters at the level of the likelihood and the associated behaviour of MCMC chains were discussed using the RNUC as an example. It was shown theoretically that when environmental effects (covariates) are considered as random effects, estimable functions of the fixed effects, (co)variance components and genetic effects are identifiable as well as the environmental effects. When the environmental effects are treated as fixed and there are other fixed factors in the model, the contrasts involving environmental effects, the variance of environmental sensitivities (genetic slopes) and the residual variance are the only identifiable parameters. These different identifiability scenarios were generated by changing the formulation of the model and the structure of the data and the models were then implemented via MCMC. The output of MCMC sampling schemes was interpreted in the light of the theoretical findings. The erratic behaviour of the MCMC chains was shown to be associated with identifiability problems in the likelihood, despite propriety of posterior distributions, achieved by arbitrarily chosen uniform (bounded) priors. In some cases, very long chains were needed before the pattern of behaviour of the chain may signal the existence of problems. The paper serves as a warning concerning the implementation of complex models where identifiability problems can be difficult to detect a priori. We conclude that it would be good practice to experiment with a proposed model and to understand its features before embarking on a full MCMC implementation.  相似文献   

20.
A Bayesian procedure is presented for detecting quantitative trait loci (QTL) affecting longitudinal traits. The statistical model assumes a QTL affecting the prior distribution of the parameters of a given production function, under a hierarchical Bayesian scheme. Marginal posterior distributions for the effects associated with the QTL are calculated using Markov chain Monte Carlo methods. Furthermore, the Bayesian analysis allows the use of some available relevant information that can improve the detection of the QTL substantially. To illustrate the procedure, an example of QTL detection using the Von Bertalanffy growth function is presented with a F2 pig population bred from Iberian boars and Landrace sows. Animals of the F2 population were genotyped for seven markers in chromosome 2 (SSC2). Two prior distributions for the mean effect of the parameters related with birth and adult weight were compared. On the one hand, vague prior distributions were used, and, on the other, there were assumed univariate Gaussian distributions that ensure biologically meaningful adult and birth weights on the posterior growth curves. Results from the second prior distribution supported the presence of QTL, by showing that individuals with both alleles of Iberian origin had lower rates of maturation. On the contrary, when vague priors were used, the procedure was not able to detect QTL.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号