高级检索

    盛建为, 钱夕元. 零膨胀对数级数分布的参数估计[J]. 华东理工大学学报(自然科学版), 2019, 45(3): 507-510. DOI: 10.14135/j.cnki.1006-3080.20180419003
    引用本文: 盛建为, 钱夕元. 零膨胀对数级数分布的参数估计[J]. 华东理工大学学报(自然科学版), 2019, 45(3): 507-510. DOI: 10.14135/j.cnki.1006-3080.20180419003
    SHENG Jianwei, QIAN Xiyuan. Parameters Estimation for the Zero-Inflated Logarithmic Series Distribution[J]. Journal of East China University of Science and Technology, 2019, 45(3): 507-510. DOI: 10.14135/j.cnki.1006-3080.20180419003
    Citation: SHENG Jianwei, QIAN Xiyuan. Parameters Estimation for the Zero-Inflated Logarithmic Series Distribution[J]. Journal of East China University of Science and Technology, 2019, 45(3): 507-510. DOI: 10.14135/j.cnki.1006-3080.20180419003

    零膨胀对数级数分布的参数估计

    Parameters Estimation for the Zero-Inflated Logarithmic Series Distribution

    • 摘要: 对数级数分布是一种常见的长尾分布,在取值为正整数的计数数据中有着广泛的应用。然而在实际中,某些计数数据含有大部分的0,因此本文将传统的对数级数分布推广至零膨胀对数级数分布,并讨论了该分布参数的矩估计、极大似然估计以及贝叶斯估计。同时通过蒙特卡洛方法产生模拟数据,并通过均方误差比较了这些估计方法的优劣,结果表明贝叶斯估计优于其他传统估计方法,且在小样本情况下优势更加明显。最后使用该模型对实际中的临床再入院次数进行了拟合分析。

       

      Abstract: The logarithmic series distribution is a common long-tailed distribution and has a wide range of applications in count data with positive integers, such as the species abundance in some forest and the types of fish in a sea area. In practice, however, some count data contains most of the zeros which is not suitable for logarithmic series distribution. To fit the excessive zeros in the count data, this paper extends the logarithmic series distribution to a zero-inflated logarithmic series distribution in the frame of the zero-inflated model. Three methods of parameter estimations, that are moment estimation, maximum likelihood estimation and Bayesian estimation, were used to estimate the parameters in the model. In the Bayesian estimation, the posterior distribution is constructed by the random walk metropolis algorithm since there is no analytical method for the posterior distribution. The Monte Carlo method is used to generate the simulation data of the zero-inflated logarithmic series distribution, and the mean square error is the metric which is used to compare the accuracies of different estimation methods. The results show that Bayesian method has a higher accuracy than other traditional estimation methods in case the sample size is small. Moreover, the precision of Bayesian method is comparable with the traditional method when the sample size is big, which suggests that Bayesian method has advantage in case there are only few samples. Finally, the model was used to fit the number of clinical readmissions within ninety days which has more than sixty percent zeros and led to a fairly good fitness.

       

    /

    返回文章
    返回