首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Traditional analyses of capture–recapture data are based on likelihood functions that explicitly integrate out all missing data. We use a complete data likelihood (CDL) to show how a wide range of capture–recapture models can be easily fitted using readily available software JAGS/BUGS even when there are individual-specific time-varying covariates. The models we describe extend those that condition on first capture to include abundance parameters, or parameters related to abundance, such as population size, birth rates or lifetime. The use of a CDL means that any missing data, including uncertain individual covariates, can be included in models without the need for customized likelihood functions. This approach also facilitates modeling processes of demographic interest rather than the complexities caused by non-ignorable missing data. We illustrate using two examples, (i) open population modeling in the presence of a censored time-varying individual covariate in a full robust design, and (ii) full open population multi-state modeling in the presence of a partially observed categorical variable. Supplemental materials for this article are available online.  相似文献   

2.
When stones prevent the measurement of cone resistance, and missing values below the stones are ignored, then averages can be seriously underestimated. Methods are considered for correcting this bias and an algorithm is proposed in which missing observations are replaced by their expected values. A numerical example gives results in close agreement with those obtained using the optimal, but computationally expensive, method of maximum likelihood estimation. It is recommended that data from incomplete penetrations should not be discarded but should be used, preferably with the proposed algorithm, to reduce the bias in estimates of mean values.  相似文献   

3.
This study proposes a method, backpropagation (BP) neural network, for interpolating missing values in daily precipitation time series. Firstly, the BP neural network is adopted to interpolate missing daily rainfall data at three selected stations in Yantai, Shandong, China. Then, the temporal and spatial variations in precipitation extremes across Shandong are analyzed by utilizing the complete daily rainfall dataset derived from accurate propagation at 24 meteorological stations. The results show that the long-term trends in five selected extreme precipitation indices calculated from interpolated daily rainfall data are generally consistent with those from original nonmissing values. And the spatial patterns of trends in precipitation extremes also show better performance for BP neural network approach in interpolating missing daily rainfall gaps. Those suggest that this BP neural network algorithm can obtain a good fit in terms of space-time variability of regional precipitation extremes, in case that the correlation coefficients between the target stations with missing values and reference stations with complete daily rainfall dataset are relatively large. These findings could be crucial for investigating regional frequency of heavy rainfall and water resource management.  相似文献   

4.
We develop a novel modeling strategy for analyzing data with repeated binary responses over time as well as time-dependent missing covariates. We assume that covariates are missing at random (MAR). We use the generalized linear mixed logistic regression model for the repeated binary responses and then propose a joint model for time-dependent missing covariates using information from different sources. A Monte Carlo EM algorithm is developed for computing the maximum likelihood estimates. We propose an extended version of the AIC criterion to identify the important factors that m a y explain the binary responses. A real plant dataset is used to motivate and illustrate the proposed methodology.  相似文献   

5.
OBJECTIVE: To investigate item non-response in a postal food-frequency questionnaire (FFQ), and to assess the effect of substituting/imputing missing values on dietary intake levels in the Norwegian Women and Cancer study (NOWAC). We have adapted and probably for the first time applied k nearest neighbours (KNN) imputation to FFQ data. DESIGN: Data from a recent reproducibility study were used. The FFQ was mailed twice (test-retest) about 3 months apart to the same subjects. Missing responses in the test FFQ were imputed using the null value (frequencies = null, amount = smallest), the sample mode, the sample median, KNN, and retest values. SETTING: A methodological substudy of NOWAC, a national population-based cohort. SUBJECTS: A random sample of 2000 women aged 46-75 years was drawn from the cohort in 2002 (response 75%). The imputation methods were compared for 1430 women who completed at least 50% of the test FFQ. RESULTS: We imputed 16% missing values in the overall test data matrix. Compared to null value imputation, the largest differences in estimated dietary intake were seen for KNN, and for food items with a high proportion of missing. Imputation with retest values increased total energy intake, indicating that not all missing values are caused by respondents failing to specify no consumption, and that null value imputation may lead to underestimation and misclassification. CONCLUSION: Missing values in FFQs present a methodological challenge. We encourage the application and evaluation of newer imputation methods, including KNN, which may reduce imputation errors and give more accurate intake estimates.  相似文献   

6.
Dry deposition may be a substantial source ofphosphorus (P) to the Florida Everglades. Drydeposition has been measured on a weekly basis in theregion since 1987, but a significant amount of thisdata is missing (about 34%) due to instrumentalfailures and sample contamination. This study developsa statistical model of the P dry depositiontime-series to estimate missing data. The model isbased on a multivariate stochastic time-series theory.Model parameters are calibrated using theexpectation-maximization algorithm which is efficientfor data sets with many gaps. The pooled mean andstandard deviation of the data before estimating themissing values was 88.4±85.7 μg P m-2d-1 and after estimating the missing values was87.8±82.4 μg P m-2 d-1. Modelverification demonstrates that the calibrated modelsprovide unbiased data estimates while preserving thestatistics of the raw data. For each sampling site themean and standard deviation before and after werequite similar. No trend with time was detected. The Pdeposition fluctuates seasonally (highest in Octoberand lowest in June), but this fluctuation does notfollow the seasonal pattern of Florida's rainfall. Random noise in the data, however, is significant andcauses long-term fluctuations of the data. The datawith gaps filled in are useful for computing theweekly P load distribution.  相似文献   

7.
对点源时间序列数据缺失值进行有效估值能提升其数据质量。为探究不同估值方法对点源时间序列数据缺失值的估值效果及其影响因素,以亚热带典型小流域长期定位观测的每日气象和水文数据(最高气温、最低气温、太阳辐射量、降雨量及地表径流量)为例,以均方根误差(RMSE)、绝对平均误差(MAE)和Pearson相关系数(r)为性能验证指标,比较了线性内插法(LIM)、K-最近邻插值法(KNNM)、样条插值法(SIM)、多项式插值法(PIM)和核密度估值法(KDEM)5种估值方法的估值性能差异及其主要影响因素。结果表明:(1)LIM、SIM和KDEM的估值性能总体上优于其它2种方法;(2)5种估值方法对气象数据(最高气温、最低气温和太阳辐射量)缺失值估值的RMSE为1.81~6.35,MAE为1.30~4.20,r为0.70~0.98(P<0.05),而对水文数据(降雨量和地表径流量)缺失值估值的RMSE为12.54~26.28,MAE为3.60~14.21,r为0.07~0.72。可见,各估值方法对气象数据的估值性能强于对水文数据;(3)上述数据集的变异系数(CV)与估值评估指标(RMSE、MAE及r)线性相关(P<0.05),是影响估值性能的重要因素。  相似文献   

8.
基于传递函数的土壤数据库缺失数据的填补研究   总被引:1,自引:0,他引:1  
韩光中  杨银华  吴彬  李山泉 《土壤》2019,51(5):1036-1041
数据缺失在土壤调查研究中是一个非常普遍的现象,处理不当一定程度上会影响研究结果的可靠性。土壤转换函数(pedotransfer functions,PTFs)是简单、快速、大批量填补土壤数据库缺失信息的有效手段。但目前分析和厘定我国土壤数据库缺失数据特征的研究较少,针对土壤数据库缺失数据的填补方法也亟待规范。本文对我国第二次土壤普查数据库进行分析,探讨该数据库的数据缺失特征,并对数据缺失严重的土壤属性进行预测,以期为今后的土壤数据库缺失数据填补工作提供参考。总体来看,质地(砂粒、粉粒和黏粒含量)、pH、有机质、全氮、全磷、全钾是土壤普查中最基础的调查项目,这些土壤属性信息的完整性最好。有效磷、速效钾和阳离子交换量数据有一定的缺失。碱解氮、容重、砾石含量、各种类型氧化铁数据缺失严重。在填补缺失数据时,建议首先考虑模型的稳定性,尽量使用那些相对稳定且数据完整性好的土壤属性来预测缺失数据。我国第二次土壤普查数据库基本都缺少空间属性信息,在填补缺失数据时最好采用简单而相对稳定的回归模型。利用回归分析得到的土壤传递函数可以较好地实现容重、碱解氮和部分阳离子交换量缺失数据的填补工作。尽管如此,由于部分土壤属性信息有一定的时效性,应用传递函数时要注意数据源的历史背景。  相似文献   

9.
The identification of sea regimes from environmental multivariate times series is complicated by the mixed linear?Ccircular support of the data, by the occurrence of missing values, by the skewness of some variables, and by the temporal autocorrelation of the measurements. We address these issues simultaneously by a hidden Markov approach, and segment the data into pairs of toroidal and skew-elliptical clusters by means of the inferred sequence of latent states. Toroidal clusters are defined by a class of bivariate von Mises densities, while skew-elliptical clusters are defined by mixed linear models with positive random effects. The core of the classification procedure is an EM algorithm accounting for missing measurements, unknown cluster membership, and random effects as different sources of incomplete information. Moreover, standard simulation routines allow for the efficient computation of bootstrap standard errors. The proposed procedure is illustrated for a multivariate marine time series, and identifies a number of wintertime regimes in the Adriatic Sea.  相似文献   

10.
针对森林通量观测站涡度相关法碳通量观测普遍存在的长时间连续性数据缺失情景,为探究不同数据插补方法的有效性,以华北低丘山地栓皮栎人工混交林生态系统为例,以经EddyPro处理和质量控制的2017年3月1日-11月30日0.5h尺度净生态系统碳交换(NEE)数据为基准数据集,随机生成含有连续1、3、7、15和31d数据缺失的5类数据缺失集,重复10次,使用固定窗口平均昼夜变化法(MDV)、可变窗口平均昼夜变化法(MDC)、查表法(LUT)、非线性回归法(NLR)、边际分布采样法(MDS)、人工神经网络法(ANN)对缺失数据集进行插补,并将插补数据与实际观测数据进行对比,通过分析统计参数来评估不同方法的插补精度和稳定性,以评估不同方法的适用范围。结果表明:日间,当连续缺失少于15d时,ANN方法插补数据与实测数据间的R2(决定系数)相对较高,NLR方法的R2较低;LUT方法插补数据与实测数据间的相对均方根误差(RRMSE)较低,NLR方法的RRMSE较高。当缺失达到连续15d时,除NLR方法的R2显著较低(P<0.05)外,其它方法间R2差异不显著;LUT方法的RRMSE显著(P<0.05)较低,其它方法间RRMSE差异不显著。当缺失达到连续31d时,除NLR方法R2显著较低(P<0.05)外,各方法间R2和RRMSE无显著差异;MDV方法的平均绝对误差(MAE)出现较多异常值,各方法间的MAE开始出现分化的趋势。随着缺失片段长度的增加,除MDV方法外,各方法的R2呈下降趋势,连续1d缺失与连续31d缺失情景下插补所得NEE与实测NEE的R2差异显著(P<0.05);MDV和MDS方法的RRMSE呈增大趋势,连续1d缺失与连续31d缺失情景下的RRMSE差异显著(P<0.05),其它方法的RRMSE差异相对不显著。夜间,在各缺失情景下,ANN方法的R2较高,LUT方法的R2较低,二者之间差异显著(P<0.05);LUT方法的RRMSE最高,与其它方法存在显著差异(P<0.05)。在连续缺失大于31d的情景下,各方法的RRMSE差异均不显著。除LUT方法MAE显著(P<0.05)较高外,其它方法的MAE无明显差异。随着缺失片段长度的增加,MDC、MDS和ANN方法插补数据的R2呈下降趋势,MDV和LUT的R2始终无显著差异;各方法的RRMSE差异无显著变化。在对典型晴天0.5h尺度上NEE日变化趋势的还原方面,MDC方法性能相对较优。综上,NLR方法适用于气象数据完备、NEE数据连续缺失少于7d的情景;MDV或MDC方法适用于气象数据不可用或缺失严重、NEE数据连续缺失少于15d的情景;LUT和MDS方法则适用于气象数据缺失较少、NEE数据连续缺失少于15d的情景;ANN方法适用性相对较广,可用于气象数据缺失较少、NEE数据连续缺失长达31d的情景。  相似文献   

11.
High nitrogen (N) input to rivers requires measures for the reduction of diffuse N pollution. Beside the groundwater, artificial subsurface drainage systems are the main pathways of diffuse N input into rivers. Nevertheless, the N discharge via subsurface drainage systems is one of the main missing links for modeling, especially because of the lack of data bases of subsurface drainage areas. We introduce a method to calculate the normally unknown proportions of drained areas in arable lands improving the existing method by Behrendt et al. (2000). The method is applied for the catchment of Middle Mulde river (area: 2,700 km2) in Saxony/Germany. The data records of the mesoscale soil mapping are allocated to the subsurface‐drainage areas digitalized in representative areas using ARC/INFO GIS. In this way, it is possible to establish a differentiated record of the proportion of subsurface‐drainage area of each regional site type. The results were extrapolated to the entire area by transferring the proportions of subsurface‐drainage areas to areas where no information on drainage areas was available. The approach is well‐suited for future model approaches on a regional scale. By creating and integrating new data sets derived from modern GIS operations, the approach reduces the uncertainty of modeling N and water fluxes.  相似文献   

12.
Imputation is needed in almost all major surveys. Imputation tools are often adopted according to the convenience and the contexts of the surveys. Traditional hot-deck imputation needs extensive knowledge of the survey variables. Explicit model-based imputation needs a valid model for every survey variable. In large-scale national surveys, different groups of people with different backgrounds work on different stages of surveys and often the statistical estimation group has little or insufficient communication with the other groups. In such situations, it is difficult to use hot-deck imputation. On the other hand, because of the complex nature of the survey, finding a suitable model for every survey variable may not be easy and thus a nonparametric method— such as neural network imputation—may be attractive. One such large-scale national survey is the U.S. Department of Agriculture’s National Resources Inventory Survey (NRI). By design, the survey has missing values. The missing values are imputed using a donor-based method. This article develops a neural network imputation model and compares its performance with that of the existing imputation method. The end result looks promising.  相似文献   

13.
WXGEN天气发生器是SWAT分布式流域水文模型的组成模块之一,用于模型预测预报以及天气数据缺测时生成模拟天气数据。以长江上游5个典型国家气象站40 a的日观测数据为基础,分析评价了WXGEN天气发生器在该地区模拟的日和月气象参数的精度。结果表明,WXGEN天气发生器在长江上游5个典型气象站模拟精度基本一致;模拟的日天气数据误差较大;模拟的月天气数据效果好于日天气数据,WXGEN天气发生器能较好地拟合月天气数据的分布,更适合于长江上游地区月气象参数的模拟,但WXGEN模拟的月天气数据误差具有较强的季节规律性,月降雨量和气温的模拟值较实际值偏低,而太阳辐射量模拟值却较实际值偏高。  相似文献   

14.
SPSS在数据缺失值处理中的应用   总被引:1,自引:0,他引:1  
为了解决在利用灰色理论建模中出现的数据缺失问题。提出了利用SPSS软件的缺失值处理模块以及稳健估计计算模块的合成,针对数据的缺失机制进行处理。通过该方法对数据缺失的处理,保证了模型能够正常进行数据拟合。利用SPSS软件对某一实例中缺失值进行估计并对缺失值填充后的结果进行评价,以达到客观、准确的结果。  相似文献   

15.
An understanding of survival patterns is a fundamental component of animal population biology. Mark-recapture models are often used in the estimation of animal survival rates. Maximum likelihood estimation, via either analytic solution or numerical approximation, has typically been used for inference in these models throughout the literature. In this article, a Bayesian approach is outlined and an easily applicable implementation via Markov chain Monte Carlo is described. The method is illustrated using 13 years of mark-recapture data for fulmar petrels on an island in Orkney. Point estimates of survival are similar to the maximum likelihood estimates (MLEs), but the posterior variances are smaller than the corresponding asymptotic variances of the MLEs. The Bayesian approach yields point estimates of 0.9328 for the average annual survival probability and 14.37 years for the expected lifetime of the fulmar petrels. A simple modification that accounts for missing data is also described. The approach is easier to apply than augmentation methods in this case, and simulations indicate that the performance of the estimators is not significantly diminished by the missing data.  相似文献   

16.
CLIGEN天气发生器在长江上游地区的适用性评价   总被引:1,自引:0,他引:1  
CLIGEN天气发生器是WEPP土壤侵蚀模型的组成模块之一,用于模型预测预报以及天气数据缺测时生成模拟天气数据;但是CLIGEN天气发生器是基于美国的天气条件研发的,在其他地区模拟的精度具有不确定性.以长江上游沱沱河站、马尔康站、丽江站、都江堰站和沙坪坝站5个典型国家气象站40年的日观测数据为基础,分析评价CLIGEN天气发生器在该地区模拟的日和月天气数据的精度.结果表明:CLIGEN天气发生器天气参数的输入对于模拟结果具有较大的影响;模拟结果在长江上游5个不同地貌区气象站的精度基本相当;模拟的日天气数据误差较大,模拟的月天气数据效果好于日天气数据;模拟的各月最高气温和最低气温平均值较实际值偏低,部分参数模拟值的绝对误差和相对误差存在一定的月际差异.  相似文献   

17.
Modeling complex collective animal movement presents distinct challenges. In particular, modeling the interactions between animals and the nonlinear behaviors associated with these interactions, while accounting for uncertainty in data, model, and parameters, requires a flexible modeling framework. To address these challenges, we propose a general hierarchical framework for modeling collective movement behavior with multiple stages. Each of these stages can be thought of as processes that are flexible enough to model a variety of complex behaviors. For example, self-propelled particle (SPP) models (e.g., Vicsek et al. in Phys Rev Lett 75:1226–1229, 1995) represent collective behavior and are often applied in the physics and biology literature. To date, the study and application of these models has almost exclusively focused on simulation studies, with less attention given to rigorously quantifying the uncertainty. Here, we demonstrate our general framework with a hierarchical version of the SPP model applied to collective animal movement. This structure allows us to make inference on potential covariates (e.g., habitat) that describe the behavior of agents and rigorously quantify uncertainty. Further, this framework allows for the discrete time prediction of animal locations in the presence of missing observations. Due to the computational challenges associated with the proposed model, we develop an approximate Bayesian computation algorithm for estimation. We illustrate the hierarchical SPP methodology with a simulation study and by modeling the movement of guppies.Supplementary materials accompanying this paper appear online.  相似文献   

18.
针对当前作物肥效模型建模成功率普遍偏低的问题,探讨了提高建模成功率的优化建模策略。在分析整合三元非结构肥效模型非线性最小二乘(NLS)和三元二次多项式肥效模型普通最小二乘(OLS)、主成分回归(PCR)和可行广义最小二乘回归(FGLS)四种建模法的适用性基础上,根据水稻和露地蔬菜的1 122个氮磷钾田间肥效试验结果,探讨三元肥效模型的综合应用方法。结果表明,三元肥效模型不同函数式及其建模法的适用性有明显差别。三元二次多项式肥效模型OLS建模法的典型式比例平均仅有19.8%,克服多重共线性危害的PCR建模法和克服异方差危害的FGLS建模法均有利于提高典型式比例,而同时克服了模型设定偏误和多重共线性危害的非结构肥效模型及其NLS建模法的典型式比例则提高至41.4%。根据不同模型及其建模法的适用性,提出三元肥效模型四步建模法,结果使典型式的比例进一步提高至57.5%,且在双季稻、单季稻和露地蔬菜中的相关比例差异很小。因此,四步建模法是大幅度提高三元肥效模型建模成功率的有效技术方法。  相似文献   

19.
In population viability analysis we are often faced with a lack of knowledge of survival rates in animal populations. In particular, survival of recruits is usually hard to assess. However, data on population structure might be considered as patterns that contain valuable information to estimate missing parameters indirectly. As an example for this pattern-oriented modelling and parameterization, pre-breeding survival rate of the endangered Lesser Spotted Woodpecker (Picoides minor) was determined here using data on population structure (e.g. sex ratio) and reproductive success at the population level (e.g. nesting success). Therefore, an individual-based model was developed simulating the population dynamics for two different populations that had been empirically studied at Lake Möckeln, Sweden, and Taunus, Germany. For both populations, a small range for pre-breeding survival rates could be identified wherein all simulated patterns corresponded best to the empirical values. Pre-breeding survival rate was found to be higher in the German scenario than in the Swedish and geographical variation in life-history traits is discussed as a possible reason. It is concluded that the pattern-oriented approach is a valuable method for estimating missing demographic parameters, even when using weak patterns from empirical investigations. Furthermore, it was shown that the use of multiple patterns is necessary for this purpose.  相似文献   

20.
Conservation planning at broad spatial scales facilitates coherence between local land management and objectives set at the state or provincial level. Habitat suitability models are commonly used to identify key areas for conservation planning. The challenge is that habitat suitability models are data hungry, which limits their applicability to species for which detailed information exists, but managers need to address the needs of all at-risk species. We propose a modeling approach useful for regional-scale conservation planning that accommodates limited species knowledge, and identifies what managers should aim for at the local scale. For twenty at-risk bird species, we built models to identify potential habitat using both literature information and empirical data. Species occupancy within potential habitat depends on the presence of intrinsic elements, which we identified for each species so that managers can enhance these elements as appropriate. For most species, the estimated amount of habitat needed to meet population targets was <10% of the mapped potential habitat, with notable exceptions for Northern Goshawk (Accipiter gentilis; 100%), Brown Thrasher (Toxostoma rufum; 63.7%), and Veery (Catharus fuscescens; 17.9%). Model validation showed that interior forest species models performed best. Our modeling framework allowed us to build potential habitat models to various endpoints for different species, depending on the information available, and revealed a number of species for which basic natural history data are missing. Our potential habitat models provide regional perspective and guide local habitat management, and assist in identifying research priorities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号