解瑞飛 吳 波.杭州市腫瘤醫院信息科,浙江杭州 30002;2.浙江省臺州市中心醫院放療科,浙江臺州 38000
隨機生存森林在小細胞肺癌預后分析中的應用
解瑞飛1吳波2▲
1.杭州市腫瘤醫院信息科,浙江杭州310002;2.浙江省臺州市中心醫院放療科,浙江臺州318000
目的 辨識與小細胞肺癌具有本質關聯的基因變量,可以幫助臨床醫生制定個性化治療方案,延長患者生存期,提高患者預后生活質量。方法 共入組 117例小細胞肺癌患者,含41000個基因變量,8個一般特征。利用隨機生存森林方法結合基因表達譜及預后數據從一系列基因變量中探索與小細胞肺癌具有密切相關的基因變量。結果 一般特征及EGFR、K-ras、p53表達在預后上無明顯差異;所挑選的前12個基因中,FTCD、BTC、PSMC4、SLC43A1與小細胞肺癌具有密切的關系,而UCHL5、PSMC4與PSMD7、PCSK4、VPS13D與VPS13A具有調控依賴關系。結論 隨機生存森林可以高效的辨識與預后具有密切相關的本質基因。
小細胞肺癌;隨機生存森林;基因表達譜;生存分析;基因調控
[Abstract]Objective To distinguish the genetic variables with essential relevance with small cell lung cancer,which is able to help clinical physicians to formulate customized therapeutic protocols,prolong patients'survival time,and improve patients'prognosis and life quality.Methods A total of 117 patients with small cell lung cancer were included,with 41000 genetic variables and 8 general characteristics.Random survival forests were applied,combined with gene expression profile and prognostic data,genetic variables closely related to small cell lung cancer were explored in a series of genetic variables.Results General characteristics and EGFR,K-ras and p53 expressions were not significantly different in prognosis;in the former 12 selected genes,FTCD,BTC,PSMC4,SLC43A1 were closely related to small cell lung cancer,but UCHL5,PSMC4 and PSMD7,PCSK4,VPS13D and VPS13A were in the dependent relation of regulation.Conclusion Random survival forests are able to effectively distinguish the essential genes closely related to the prognosis.
[Key words]Small cell lung cancer;Random survival forests;Gene expression profile;Survival analysis;Gene regulation
在全球范圍,肺癌是最常見的惡性腫瘤之一,且死亡率較高,預后較差[1-3]。小細胞肺癌(small cell lung cancer,SCLC)較非小細胞肺癌 (non-small cell lung cancer,NSCLC)預后更差。在我國,超過80%的小細胞肺癌5年存活率不超過10%[4,5]。因此,尋找與SCLC發生發展相關的基因和分子,對于腫瘤的診斷和治療尤為重要[2,6,7]。
近年來,轉化醫學的研究逐漸被重視,越來越多的研究者致力于基因組學的研究。高維基因組數據和生存信息的結合可以幫助研究者從全新的角度認識個體生物學過程以及疾病的發生、發展及預后過程。隨機生存森林(random survival forest,RSF)[8-13]可以在高維基因組數據中有效地結合生存信息,提取與預后相關的基因變量,指導臨床醫生對患者進行個性化治療[14]。
1.1臨床資料
本文數據從117例小細胞肺癌患者中提取,共包含41000個基因,一般特征見表1。EGFR與性別、K-ras與性別及T分期具有較強的相關性。
1.2隨機生存森林
隨機生存森林是在隨機森林(Random Forest)基礎上,加入生存分析,采用bootstrap方法從原始數據中有放回的隨機抽取N個樣本,建立生存樹模型,而袋外37%樣本測試生存樹模型。
假設在樹節點h上有n(h)例樣本,(T1,δ1),…,(Tn,δn)表示他們的生存時間和截尾信息,δ=0表示個體i在時間Ti時右截位,δ=1表示在時間Ti時死亡,則給定的一個變量Xj(j=1,2,…,m),在節點h處可以根據Xj≤c和Xj>c將生存數據分為兩組數據。RSF在每棵樹的節點處,隨機選擇M個變量作為分割節點的候選變量,選擇使子節點生存差異最大的分支。樹節點分裂準則采用Log-Rank分裂方法,計算生存函數采用Kaplan-Meier估計方法。為了選擇極少最重要的基因變量,可以依據變量的重要性(VIMP)對變量進行篩選,VIMP值越大表明其預測能力越強。具體流程如下:第一步:清除缺失數據;第二步:對所有基因,使用Cox模型;第三步:選擇P<0.005的基因變量;第四步:利用一般臨床特征及最終選擇的基因變量使用RSF,并根據其重要性對各變量進行排名。

表1 一般特征
對于臨床一般特征及EGFR、K-ras、p53突變,利用Kaplan-Meier和Log-rank進行數據分析,見封三圖1。從封三圖1可以看出,性別、年齡、EGFR、K-ras、p53在小細胞肺癌腺癌患者中,預后無明顯差異。而在T、N和臨床分期中,只有T1vs T4、N0vs N2、臨床Ⅰ期 vs臨床Ⅲ期所對應的P均<0.001,提示差異存在統計學意義。
從封三圖2和表2可以看出,在建立模型過程中,隨著生存樹個數的增多,錯誤率趨于穩定。對于不同的根據對預后的影響進行排序,前12個變量分別為:FTCD、UCHL5、RANBP9、YWHAQ、LOC151878、PPP2R5C、C20orf96、NFKBIB、BTC、SUMO3、PSMC4、 C6orf64。通過 Genecard數據庫分析,FTCD、BTC、PSMC4、SLC43A1與腫瘤具有密切的關系,而UCHL5、PSMC4與PSMD7,PCSK4、VPS13D與VPS13A具有調控依賴關系,如表3所示。與PCSK4、PSMC具有相關調控的基因關系如封三圖3所示,各基因相互影響,相互控制,共同影響腫瘤的生成及演化過程。
為了進一步驗證所獲得的敏感基因和臨床特征是否影響預后,對其采用Cox regression進行單因素和多因素分析,結果如表4所示。在單因素分析中,只有T、N、FTCD、UCHL5、BTC、PSMC4、PCSK4、SLC43A1具有統計學意義,對其進行多因素分析后,T、N、FTCD、UCHL5及PSMC4的P值小于0.05,具有統計學意義,即共同影響患者預后。
RSF方法利用基因表達譜,結合預后數據,可以有效地篩選出與肺癌具有密切關系的基因,指導臨床醫生制定個性化治療方式,提高患者生活治療,延長患者生存期。
隨機生存森林有效地結合機器學習及臨床生存數據,可以快速有效地識別與預后密切關系的本質基因。由于隨機森林在挑選特征過程中考慮多個基因的聯合作用,所挑選出的基因組具有較強的相關性或具有相互調控關系,為后期分析基因之間調控關系、建立基因調控網絡奠定基礎。
在眾多的研究中,EGFR[27]、K-ras[28]、p53[29]位點是否發生突變影響著非小細胞肺癌的治療手段和方法,如EGFR突變時,EGFR酪氨酸激酶抑制劑(EGFR Tyrosine Kinase Inhibitors,EGFR-TKIs)吉非替尼和厄羅替尼可以顯著提高非小細胞肺癌患者的生存獲益,已被FDA批準用于治療晚期非小細胞肺癌(NSCLC)。然而,利用隨機生存森林驗證發現,EGFR、K-ras、p53并未進入影響小細胞肺癌預后的敏感基因中,而通過Kaplan-Meier及單因素也再次驗證此3個突變位點并未影響患者的預后。因此,根據EGFR、K-ras、p53決定SCLC患者相應治療方法的可能意義不大。

表2 變量重要性排名

表3 敏感基因的生物信息學

表4 臨床特征及敏感基因單因素和多因素分析
對于進入隨機生存森林的其他預后敏感基因,通過Cox回歸模型進行單因素及多因素分析發現,T分期、N分期、FTCD、UCHL5、PSMC4具有統計相關性,并且與腫瘤預后具有較強的相關性。T分期為主要是通過腫瘤體積大小進行劃分,N分期通過淋巴結轉移位置及范圍進行劃分,腫瘤體積越小、淋巴結轉移范圍越小(即T、N分期越低),患者預后越好。而FTCD、UCHL5、PSMC4在發生突變的位點,預后風險比分別為1.569、2.194、2.314,與SCLC患者具有較顯著的關聯。有文獻已經證實FTCD的敲除可以減少HIF-1α在低氧環境中的效果,加強HepG2在細胞中的化療敏感性且FTCD和HIF之間的存在相互調控關系;同時,已經證實FTCD可作為一個靶基因用于治療肝癌患者[30]。Randles等[31]已經證實UCHL5基因影響著細胞周期,UCHL5的蛋白缺失將導致細胞周期停止在G0/G1階段,無法正常進行。PSMC4是ATP亞基酶的一種,亞基酶已經被證實與核激素受體的超高表達具有相互影響,在肝臟或肝臟蛋白中,并已經確定具有兩個轉錄變異體[32]。對于臨床醫生,可以利用FTCD、UCHL5、PSMC4構建預后預測模型,可以針對發生FTCD、UCHL5、PSMC4發生基因突變的患者使用特定的靶向藥物,抑制腫瘤的發展及惡化,提高患者預后。
隨機生存森林可以快速、高效的辨識與預后具有較強相關性的基因,進一步促進SCLC患者的精準醫療,精確的尋找到SCLC的原因和治療的靶點,并對不同狀態和過程進行精確分類,最終實現對SCLC患者進行個性化精準治療的目的,提高疾病診治與預防的效益。
[1]Menachery A,Burt J,Chappell S,et al.Dielectrophoretic characterization and separation of metastatic variants of small cell lung cancer cells[J].Une,2016,(3):386-389.
[2]Mitsudomi T.Molecular epidemiology of lung cancer and geographic variations with special reference to EGFR mutations[J].Transl Lung Cancer Res,2014,3(4):205-211.
[3]Jung KW,Won YJ,Kong HJ,et al.Cancer statistics in Korea:Incidence,mortality,survival,and prevalence in 2012[J]. Cancer Res Treat,2015,47(2):127-141.
[4]Chen W,Zheng R,Zeng H,et al.Epidemiology of lung cancer in China[J].Thorac Cancer,2015,6(2):209-215.
[5]Zhou C.Lung cancer molecular epidemiology in China:Recent trends[J].Transl Lung Cancer Res,2014,3(5):270-279.
[6]Chen W,Zheng R,Zeng H,et al.Geographic distribution and epidemiology of lung cancer during 2011 in Zhejiang province of China[J].Asian Pac J Cancer Prev,2014,15(13):5299-5303.
[7]Blakely CM,Pazarentzos E,Olivas V,et al.NF-kappaB-activating complex engaged in response to EGFR oncogene inhibition drives tumor cell survival and residual disease in lung cancer[J].Cell Rep,2015,11(1):98-110.
[8]Miao F,Cai YP,Zhang YX,et al.Risk prediction of oneyear mortality in patients with cardiac arrhythmias using random survival forest[J].Comput Math Methods Med,2015,2015:303250.
[9]Marino SR,Lin S,Maiers,et al.Identification by random forest method of HLA class I amino acid substitutions associated with lower survival at day 100 in unrelated donor hematopoietic cell transplantation[J].Bone Marrow Transplant,2012,47(2):217-226.
[10]Buhnemann C,Li S,Yu H,et al.Quantification of the heterogeneity of prognostic cellular biomarkers in ewing sarcoma using automated image and random survival forest analysis[J].PLoS One,2014,9(9):e107105.
[11]Choi JY,Kim SK,Lee WH,et al.A survival prediction model of rats in hemorrhagic shock using the random forest classifier[J].Conf Proc IEEE Eng Med Biol Soc,2012,2012:5570-5573.
[12]Shim JH,Jun MJ,Han S,et al.Prognostic nomograms for prediction of recurrence and survival after curative liver resection for hepatocellular carcinoma[J].Ann Surg,2015,261(5):939-946.
[13]Biesbroek S,vander ADl,Brosens MC,et al.Identifying cardiovascular risk factor-related dietary patterns with reduced rank regression and random forest in the EPICNL cohort[J].Am J Clin Nutr,2015,102(1):146-154.
[14]Kasinski AL,Kelnar K,Stahlhut C,et al.A combinatorial microRNA therapeutics approach to suppressing nonsmall cell lung cancer[J].Oncogene,2015,34(27):3547-3555.
[15]Seimiya Masanori,Tomonaga Takeshi,Matsushita Kazuyuki,et al.Identification of novel immunohistochemical tumor markers for primary hepatocellular carcinoma;clathrinheavychainandformiminotransferasecyclodeaminase[J].Hepatology,2008,48(2):519-530.
[16]Kawaguchi M,Hosotani R,Kogire,et al.Auto-induction and growth stimulatory effect of betacellulin in human pancreatic cancer cells[J].Int J Oncol,2000,16(1):37-41.
[17]Yamamoto T,Akisue T,Marui T,et al.Expression of betacellulin,heparin-binding epidermal growth factor and epiregulin in human malignant fibrous histiocytoma[J]. Anticancer Res,2004,24(3b):2007-2010.
[18]Moon WS,Park HS,Yu KH,et al.Expression of betacel-lulin and epidermal growth factor receptor in hepatocel lular carcinoma:Implications for angiogenesis[J].Hum Pathol,2006,37(10):1324-1332.
[19]Watanabe T,Shintani A,Nakata M,et al.Recombinant human betacellulin:Molecular structure,biological activities,and receptor interaction[J].J Biol Chem,1994,269 (13):9966-9973.
[20]Ocharoenrat P,Modjtahedi H,Rhys-Evans P,et al.Epidermal growth factor-like ligands differentially up-regulate matrix metalloproteinase 9 in head and neck squamous carcinoma cells[J].Cancer Res,2000,60(4):1121-1128.
[21]Sakon M,Kishimoto S,Aoki T,et al.A patient with HCC successfully treated by ethanol injection therapy with etoposide[J].Gan To Kagaku Ryoho,1996,23(11):1585-1587.
[22]Lu Z,Hu X,Li Y,et al.Human papillomavirus 16 E6 oncoprotein interferences with insulin signaling pathway by binding to tuberin[J].J Biol Chem,2004,279(34):35664-35670.
[23]Szabo A,Perou CM,Karaca M,et al.Statistical modeling for selecting housekeeper genes[J].Genome Biol,2004,5 (8):R59.
[24]Kokkinakis DM,Liu X,Chada S,et al.Modulation of gene expression in human central nervous system tumors under methionine deprivation-induced stress[J].Cancer Res,2004,64(20):7513-7525.
[25]Bassi DE,Mahloogi H,Klein-Szanto AJ.The proprotein convertases furin and PACE4 play a significant role in tumor progression[J].Mol Carcinog,2000,28(2):63-69.
[26]Cole KA,Chuaqui RF,Katz K,et al.cDNA sequencing and analysis of POV1(PB39):A novel gene up-regulated in prostate cancer[J].Genomics,1998,51(2):282-287.
[27]Paez J Guillermo,J?nne Pasi A,Lee Jeffrey C,et al.EGFR mutations in lung cancer:Correlation with clinical response to gefitinib therapy[J].Science,2014,304(5676):1497-1500.
[28]Johnson Leisa,Mercer Kim,Greenbaum Doron,et al.Somatic activation of the K-ras oncogene causes early onset lung cancer in mice[J].Nature,2001,410(6832):1111-1116.
[29]Denissenko Mikhail F,Pao Annie,Tang Moon-shong,et al. Preferential formation of benzo[a]pyrene adducts at lung cancer mutational hotspots in P53[J].Science,1996,274 (5286):430-432.
[30]Yu Zhenhai,Ge Yingying,Xie Lei,et al.Using a yeast two-hybrid system to identify FTCD as a new regulator for HIF-1α in HepG2 cells[J].Cellular signalling,2014,7(26):1560-1566.
[31]Randles L,Anchoori RK,Roden RB,et al.Proteasome Ubiquitin Receptor hRpn13 and its Interacting Deubiquitinating Enzyme Uch37 are Required for Proper Cell Cycle Progression[J].J Biol Chem,2016,M115:694588.
[32]Choi HS,Seol W,Moore DD.A component of the 26S proteasome binds on orphan member of the nuclear hormone receptor superfamily[J].J Steroid Biochem Mol Biol,1996,56(6):23-30.
Application of random survival forests in the analysis of small cell lung cancer prognosis
XIE Ruifei1WU Bo2
1.Department of Information,Hangzhou Tumor Hospital,Hangzhou310002,China;2.Department of Radiology,Taizhou Central Hospital in Zhejiang Province,Taizhou318000,China
R734
A
1673-9701(2016)17-0004-05
浙江省科技廳公益技術研究社會發展項目(2015C33268);浙江省醫藥衛生科技項目(2014KYA181);浙江省杭州市衛生科技計劃(一般)項目(2014A33)
▲
(2016-04-29)