何朝兵,杜保建,劉華文
(1. 安陽(yáng)師范學(xué)院數(shù)學(xué)與統(tǒng)計(jì)學(xué)院,中國(guó) 安陽(yáng) 455000; 2. 山東大學(xué)數(shù)學(xué)學(xué)院,中國(guó) 濟(jì)南 250100)
左截?cái)嘤覄h失數(shù)據(jù)下Pareto分布形狀參數(shù)多變點(diǎn)的貝葉斯估計(jì)
何朝兵1*,杜保建1,劉華文2
(1. 安陽(yáng)師范學(xué)院數(shù)學(xué)與統(tǒng)計(jì)學(xué)院,中國(guó) 安陽(yáng)455000; 2. 山東大學(xué)數(shù)學(xué)學(xué)院,中國(guó) 濟(jì)南250100)
摘要通過(guò)添加缺損的壽命變量數(shù)據(jù)得到了左截?cái)嘤覄h失數(shù)據(jù)下Pareto分布相對(duì)簡(jiǎn)單的似然函數(shù),給出了形狀參數(shù)變點(diǎn)位置和其他參數(shù)的滿(mǎn)條件分布.利用MCMC方法對(duì)參數(shù)的滿(mǎn)條件分布進(jìn)行了抽樣,把Gibbs樣本的均值作為參數(shù)的貝葉斯估計(jì).隨機(jī)模擬的結(jié)果表明各參數(shù)貝葉斯估計(jì)的精度都較高.
關(guān)鍵詞似然函數(shù);滿(mǎn)條件分布;MCMC方法;Gibbs抽樣;Metropolis-Hastings算法
Bayesian Estimation of Shape Parameter of Pareto Distribution with
Multiple Change Points for Left Truncated and Right Censored Data
HEChao-bing1*,DUBao-jian1,LIUHua-wen2
(1. School of Mathematics and Statistics, Anyang Normal University, Anyang 455000, China;
2. School of Mathematics, Shandong University, Ji’nan 250100, China)
AbstractBy filling in the missing data of the life variable, the relatively simple likelihood function of Pareto distribution for left truncated and right censored data is obtained. The full conditional distributions of change-point positions of shape parameter and other parameters are given. Each parameter is sampled from the full conditional distributions by MCMC method. The means of Gibbs samples are taken as Bayesian estimations of the parameters. The random simulation results show that Bayesian estimations of the parameters are fairly accurate.
Key wordslikelihood function; full conditional distribution; MCMC method; Gibbs sampling; Metropolis-Hastings algorithm
近年來(lái)變點(diǎn)問(wèn)題成為統(tǒng)計(jì)學(xué)中比較熱門(mén)的研究方向,它廣泛應(yīng)用于金融、經(jīng)濟(jì)和地震預(yù)測(cè)等領(lǐng)域.目前變點(diǎn)分析方法主要有極大似然法、貝葉斯法和非參數(shù)方法等.關(guān)于變點(diǎn)問(wèn)題的研究可參看文獻(xiàn)[1-4].而貝葉斯計(jì)算方法中的MCMC方法,使變點(diǎn)分析變得非常方便.Pareto分布在生存分析和可靠性理論等方面仍然具有很多應(yīng)用價(jià)值.關(guān)于Pareto分布的研究可參看文獻(xiàn)[5-7].當(dāng)進(jìn)行壽命試驗(yàn)時(shí),經(jīng)常會(huì)出現(xiàn)左截?cái)嘤覄h失數(shù)據(jù),對(duì)此模型的研究可參看文獻(xiàn)[8-10].關(guān)于左截?cái)嗷蛴覄h失數(shù)據(jù)下變點(diǎn)問(wèn)題的研究有一些成果,可參看文獻(xiàn)[11-13],但對(duì)數(shù)據(jù)既左截?cái)嘤钟覄h失情形下變點(diǎn)問(wèn)題的研究還不多見(jiàn).本文主要利用MCMC方法研究左截?cái)嘤覄h失數(shù)據(jù)下Pareto分布形狀參數(shù)多變點(diǎn)的貝葉斯估計(jì).
1左截?cái)嘤覄h失數(shù)據(jù)試驗(yàn)?zāi)P?/p>
設(shè)(X,Y,T)是一連續(xù)型隨機(jī)變量,壽命變量X的分布函數(shù)為F(x;θ)=P(X≤x),密度函數(shù)為f(x;θ),這里θ是未知參數(shù);Y是一右刪失隨機(jī)變量,分布函數(shù)為G(y),密度函數(shù)為g(y);T是一左截?cái)嚯S機(jī)變量,分布函數(shù)為H(t),密度函數(shù)為h(t),且Y,T的分布與參數(shù)θ無(wú)關(guān).假定X,Y,T是相互獨(dú)立非負(fù)隨機(jī)變量.對(duì)于n個(gè)受試樣品(產(chǎn)品壽命),左截?cái)嘤覄h失數(shù)據(jù)的試驗(yàn)?zāi)P褪莾H在Zi≥Ti時(shí)得到觀察數(shù)據(jù)(Zi,Ti,δi),而在Zi Zi=Xi∧Yi=min(Xi,Yi),δi=I(Xi≤Yi),i=1,2,…,n. 下面求樣本的似然函數(shù). 為了研究方便,引入示性變量νi=I(min(Xi,Yi)≥Ti),i=1,2,…,n.則基于觀察數(shù)據(jù){(zi,ti,δi):νi=1,1≤i≤n}的似然函數(shù)為 2左截?cái)嘤覄h失數(shù)據(jù)下Pareto分布的似然函數(shù) 若X的密度函數(shù)為f(x)=θbθx-(θ+1),x≥b>0,θ>0,則稱(chēng)X服從尺度參數(shù)為b,形狀參數(shù)為θ的Pareto分布,記為X~Pa(b,θ),易知X的分布函數(shù)為F(x)=1-bθx-θ,x≥b. 左截?cái)嘤覄h失數(shù)據(jù)試驗(yàn)?zāi)P椭?假設(shè)產(chǎn)品壽命X~Pa(b,θ),且b已知. 令δ,z,ν分別表示由δi,zi,νi組成的向量,則似然函數(shù)為 (1) 由(1)式易知,左截?cái)嘤覄h失數(shù)據(jù)下Pareto分布的似然函數(shù)比較復(fù)雜. 下面添加部分缺損的Xi的值,以獲得較簡(jiǎn)單的似然函數(shù).具體如下: 當(dāng)νi=0時(shí),添加數(shù)據(jù)Z1i=Xi=z1i. 在νi=0,即min(Xi,Yi) ψ(x;θ)=[u(θ)]-1f(x;θ)P({Y>x,T>x}∪{Y≤x,T>Y})= 可以利用篩選法隨機(jī)產(chǎn)生Z1i的值z(mì)1i.z1i抽樣的具體步驟如下: (1)由均勻分布U(0,1)抽取u,由f(x;θ)抽取x; 令αi=I(zi≥b),令α,w分別表示由αi,z1i組成的向量,則添加數(shù)據(jù)后的似然函數(shù)為 (2) 3左截?cái)嘤覄h失數(shù)據(jù)下Pareto分布形狀參數(shù)多變點(diǎn)的貝葉斯估計(jì) Pareto分布形狀參數(shù)多變點(diǎn)模型如下: (3) 其中θ1,θ2,θ3兩兩不相等,1≤k1 下面利用貝葉斯方法對(duì)對(duì)變點(diǎn)位置k1,k2及參數(shù)θ1,θ2,θ3進(jìn)行估計(jì). 令D1={1,2,…,k1},D2={k1+1,…,k2},D3={k2+1,…,n}. 記β=(k1,k2,θ1,θ2,θ3),由(2)式和(3)式可得此變點(diǎn)問(wèn)題的似然函數(shù)為 其中nm1=n1(Dm),nm2=n2(Dm),sm=s(Dm),m=1,2,3. 下面確定各參數(shù)的先驗(yàn)分布. (2) 取θi的先驗(yàn)分布為伽瑪分布Ga(ci,di),ci,di已知,即 假設(shè)(k1,k2),θ1,θ2,θ3相互獨(dú)立,則 L(β|z,w,α,ν,δ)∝π(k1,k2)π(θ1)π(θ2)π(θ3)L(z,w,α,ν,δ|β) 其中z-1i={z1j:j≠i}. 記π(θ1|·)=π(θ1|k1,k2,θ2,θ3,z,w,α,ν,δ). 下面求各參數(shù)的滿(mǎn)條件分布. 利用篩選法隨機(jī)抽取z1i;采用θ1,θ2,θ3直接對(duì)Gibbs抽樣;利用Metropolis-Hastings算法對(duì)k1,k2進(jìn)行抽樣.下面寫(xiě)出MCMC方法的具體步驟. 4隨機(jī)模擬 下面進(jìn)行隨機(jī)模擬試驗(yàn).取受試樣品的個(gè)數(shù)n=200. 假設(shè)Pareto分布的尺度參數(shù)8是已知的,取θ1,θ2,θ3的先驗(yàn)分布分別為Ga(3.5,6),Ga(20,1.6),Ga(17,2.6);右刪失變量Yi~Pa(10,2),左截?cái)嘧兞縏i~Pa(6,10).則(k1,k2,θ1,θ2,θ3)的真實(shí)值為(80,150,0.7,13,6). 使用R軟件進(jìn)行MCMC模擬,主要對(duì)參數(shù)k1,k2進(jìn)行估計(jì)分析.在模擬過(guò)程中,先進(jìn)行10 000次Gibbs預(yù)迭代,以確保抽樣的收斂性,然后丟棄最初的預(yù)迭代,再進(jìn)行10 000次Gibbs迭代.迭代從第10 001次開(kāi)始至第20 000次的R程序的運(yùn)行結(jié)果見(jiàn)表1. 表1 參數(shù)k1, k2,θ1,θ2,θ3的貝葉斯估計(jì) 變點(diǎn)位置參數(shù)的Gibbs抽樣迭代過(guò)程見(jiàn)圖1和圖2. 圖1 參數(shù)k1的Gibbs抽樣迭代過(guò)程 圖2 參數(shù)k2的Gibbs抽樣迭代過(guò)程Fig.1 Gibbs sampling iterations of k1 Fig.2 Gibbs sampling iterations of k2 在模型的分析過(guò)程中,MCMC的收斂性診斷很重要,模擬時(shí)可以對(duì)參數(shù)進(jìn)行多層鏈?zhǔn)降治?即輸入多組初始值,形成多層迭代鏈,當(dāng)抽樣收斂時(shí),迭代圖形重合.在模擬過(guò)程中,輸入兩組初始值分別進(jìn)行10 000次迭代,變點(diǎn)位置參數(shù)的多層迭代鏈軌跡見(jiàn)圖3和圖4. 圖3 k1的多層迭代鏈軌跡 圖4 k2的多層迭代鏈軌跡Fig.3 Multiple iterative chains trace of k1 Fig.4 Multiple iterative chains trace of k2 最后,進(jìn)行模擬結(jié)果分析.首先,由表1可以看出位置參數(shù)的貝葉斯估計(jì)與真值的相對(duì)誤差不超過(guò)4%,其他參數(shù)估計(jì)的相對(duì)誤差不超過(guò)9%,整體上估計(jì)的精度較高,效果較好;其次,要判斷所產(chǎn)生的馬爾科夫鏈?zhǔn)欠袷諗?由圖3和圖4可以發(fā)現(xiàn),變點(diǎn)位置參數(shù)的兩條迭代鏈都趨于重合,收斂性較好.綜上分析,可以看出通過(guò)MCMC方法模擬所產(chǎn)生的效果較好. 參考文獻(xiàn): [1]PAGE E S. Continuous inspection schemes[J]. Biometrika, 1954,41(1):100-115. [2]CS?RG? M, HORVTH L. Limit theorems in change-point analysis[M]. New York: Wiley, 1997. [3]CHERNOFF H, ZACKS S. Estimating the current mean of a normal distribution which is subjected to changes in time[J]. Ann Math Stat, 1964,35(3):999-1018. [4]FEARNHEAD P. Exact and efficient Bayesian inference for multiple changepoint problems[J]. Stat Comput, 2006,16(2):203-213. [5]ARNOLD B C. Pareto distribution[M]. New York: John Wiley & Sons, Inc, 1985. [6]CASTILLO J, DAOUDI J. Estimation of the generalized Pareto distribution[J]. Stat Probab Lett, 2009,79(5):684-688. [7]KAMAL M, ZARRIN S, ISLAM A U. Accelerated life testing design using geometric process for Pareto distribution[J]. Int J Adv Stat Probab, 2013,1(2):25-31. [8]BALAKRISHNAN N, MITRA D. Likelihood inference for lognormal data with left truncation and right censoring with an illustration[J]. J Stat Plann Inference, 2011,141(11):3536-3553. [9]COSSLETT S R. Efficient semiparametric estimation of censored and truncated regressions via a smoothed self-consistency equation[J]. Econometrica, 2004,72(4):1277-1293. [10]GROSS S T, LAI T L. Nonparametric estimation and regression analysis with left-truncated and right-censored data[J]. J Am Stat Assoc, 1996,91(435):1166-1180. [11]ANTONIADIS A, GRéGOIRE G. Nonparametric estimation in change point hazard rate models for censored data: A counting process approach[J]. J Nonparametr Stat, 1993,3(2):135-154. [12]ANTONIADIS A, GIJBELS I, MACGIBBON B. Nonparametric estimation for the location of a change-point in an otherwise smooth hazard function under random censoring[J]. Scand J Stat, 2000,27(3):501-519. [13]GIJBELS I, GüRLER ü. Estimation of a change point in a hazard function based on censored data[J]. Lifetime Data Anal, 2003,9(4):395-411. (編輯胡文杰) *通訊作者,E-mail:chaobing5@163.com 基金項(xiàng)目:國(guó)家自然科學(xué)基金資助項(xiàng)目(61174099); 河南省教育廳科學(xué)技術(shù)研究重點(diǎn)項(xiàng)目(14B110011) 收稿日期:2014-02-28 中圖分類(lèi)號(hào)O213.2; O212.8 文獻(xiàn)標(biāo)識(shí)碼A 文章編號(hào)1000-2537(2015)03-0080-05 DOI:10.7612/j.issn.1000-2537.2015.03.015































湖南師范大學(xué)自然科學(xué)學(xué)報(bào)2015年3期