五種人工智能技術在基層乳腺癌超聲篩查中的應用比較

2022-05-03 22:47:03魯京慧張宏艷王亞娟張楠

中國醫藥科學 2022年7期

魯京慧張宏艷王亞娟張楠

[摘要]目的利用人工智能技術輔助基層乳腺癌超聲篩查，為基層醫療數字化服務提供實證依據。方法收集 2019年3月至2021年3月在北京市朝陽區安貞社區衛生服務中心體檢及自2004年北京市兩癌篩查項目實施以來本單位的兩癌篩查的女性的乳腺超聲圖像，篩選出 BI-RADS 分級3級及以上的乳腺結節圖像271例，查找患者隨訪記錄，抽取60例上級醫院病理診斷為乳腺癌的圖像，60例由上級醫院病理確診的乳腺良性結節圖像。利用 Python（隨機不放回抽樣）隨機抽取50例乳腺癌及50例良性結節組成試驗組，剩余的10例乳腺癌及良性乳腺結節10例組成測試組。標記感興趣區（ROI）。提取并篩選圖像紋理特征。建立人工智能（AI）模型，其中包括支持向量機（SVM），隨機森林（RF），貝葉斯（NB），神經網絡（NN），選擇樹（XGB）5種建模方法。利用 ROC 曲線評價 AI 模型性能。比較 AI 模型之間的性能。結果試驗組：5種模型（RF、SVM、NB、NN、XGB）性能的 ROC 曲線的 AUC 值（95%CI）分別為0.806（0.743～0.869），0.835（0.777～0.898），0.859（0.852～0.939），0.843（0.779～0.906），0.906（0.871～0.942）。NN 與 XGB 模型差異無統計學意義（P >0.05）， NN 與 XGB 性能明顯優于其他3種 AI 模型，差異有統計學意義（P <0.05）。測試組：5種模型（RF、SVM、NB、NN、XGB）性能的 ROC 曲線的AUC 值（95%CI）分別為0.973（0.912～1.000），0.867（0.689～1.000），0.880（0.726～1.000），0.893（0.751～1.000），0.960（0.875～1.000）。5種 AI模型性能兩兩相互比較，差異無統計學意義（P >0.05）。結論5種 AI 模型均可輔助乳腺癌超聲篩查。其中 NN 及 XGB 性能較為突出，可輔助超聲診斷乳腺癌。

[關鍵詞]超聲;人工智能;乳腺癌;紋理特征;數字醫療

[中圖分類號] R445.1? [文獻標識碼] A? ?[文章編號]2095-0616（2022）07-0163-05

Comparison of five artificial intelligence technologies in the ultrasonic screening of breast cancer at the grass-roots level

LU? Jinghui1????? ZHANG? Hongyan2????? WANG? Yajuan3????? ZHANG? Nan1

1. Department of General Medicine， Anzhen Community Health Service Center， Chaoyang District， Beijing 100011， China;2. Department of Ultrasound， Anzhen Community Health Service Center， Chaoyang District ， Beijing 100011， China;3. Department of Maternal and Child Health Care， Anzhen Community Health Service Center， Chaoyang District， Beijing 100011， China

[Abstract] Objective To utilize artificial intelligence technology to assist the ultrasonic screening of breast cancer at the grass-roots level， and provide empirical evidence for medical digital service at the grass-roots level. Methods The breast ultrasound images of women who underwent physical examination in Chaoyang District Anzhen Community Health Service Center in Beijing from March 2019 to March 2021 and received two-cancer screening in our unit since the implementation of the two-cancer screening project in Beijing in 2004 were collected.271 breast node images rated as Breast Imaging Reporting and Data System （BI-RADS） category 3 and above were screened out， the follow-up records of patients were searched， and 60 images showing breast cancer pathologically diagnosed in higher-level hospitals and 60 images showing benign breast nodules pathologically diagnosed in higher-level hospitals were selected. By Python （random sampling without replacement）， 50 cases of breast cancer and 50 cases of benign nodules were randomly selected and allocated to the experimental group. The remaining 10 cases of breast cancer and 10 cases of benign breast nodules were composed of the test group. The region of interest （ROI）was marked. Image texture features were extracted and screened out. Artificial intelligence （AI） models were established， covering five modeling methods： Support Vector Machine （SVM）， Random Forest （RF）， Na?ve Bayes （NB）， Neural Network （NN） and eXtreme Gradient Boosting （XGB）. The Receiver Operator Characteristic （ROC） curve was used to evaluate the performance of AI models. The performance of various AI models was compared. Results With regard to the experimental group， the area under the curve （AUC） values （95%CI） of ROC curves indicating the performance of models established by five AI methods （RF， SVM， NB， NN and XGB in order） were respectively 0.806（0.743-0.869）， 0.835（0.777-0.898）， 0.859（0.852-0.939）， 0.843（0.779-0.906） and 0.906（0.871-0.942）. There was no statistically significant difference between NN and XGB models （P >0.05）. The performance of NN and XGB models was significantly better than that of the other three AI models， with a statistically significant difference （P <0.05）. With regard to the test group， the AUC values （95%CI） of ROC curves for testing the performance of five AI models （RF， SVM， NB， NN and XGB in order） were respectively 0.973（0.912-1.000）， 0.867（0.689-1.000）， 0.880（0.726-1.000）， 0.893（0.751-1.000） and 0.960（0.875-1.000）. The performances of the five AI models were compared with each other， and the difference was not statistically significant （P >0.05）. Conclusion All 5 AI models can assist the ultrasonic screening of breast cancer. The performance of NN and XGB models is outstanding， which can assist ultrasonic diagnosis of breast cancer.

[Key words] Ultrasound; Artificial intelligence; Breast cancer; Texture features; Digital medical treatment

根據《中共中央關于制定國民經濟和社會發展第十四個五年規劃和二〇三五年遠景目標的建議》中提出完善基層醫療體系建設，提高基層醫療服務質量及加快數字化發展等建議[1]，將數字技術應用于基層醫療中來提高基層的醫療水平成為基層醫療的一個研究議題。2017年7月，國務院發布了《新一代人工智能發展規劃》（簡稱《規劃》）[2]，在規劃的指導下，醫療是中國人工智能戰略的重要領域。2019年1月1日起根據《北京市婦女聯合會關于優化整合北京市兩癌篩查和長效體檢工作的通知》文件內容[3]，北京地區女性進行免費乳腺癌篩查。

乳腺癌居中國女性惡性腫瘤的首位，乳腺癌已成為城市中病死率增長最快的癌癥[4]。早期乳腺癌的生存率明顯高于中晚期乳腺癌，乳腺癌早發現早治療成為降低病死率的關鍵。乳腺超聲檢查由于其廉價，便攜及無射線等特點，成為篩查早期乳腺癌的重要手段。研究顯示乳腺超聲準確率為79.8%～94.35%[5-7]，超聲診斷的準確率嚴重依賴于操作者的經驗及技術水平。但是基層超聲醫師技術參差不齊，所以乳腺超聲準確率并不高。近年來新興的人工智能技術提高了很多診斷方法的準確率并且減少操作者之間的差異[8-9]。利用 AI 技術可以規避操作者帶來的差異，短時間內提高初學者乳腺超聲的準確率。

1資料與方法

1.1 一般資料

收集2019年3月至2021年3月在北京市朝陽區安貞社區衛生服務中心體檢及自2004年北京市兩癌篩查項目實施以來本單位兩癌篩查的女性的乳腺超聲圖像。納入標準：乳腺結節圖像 BI-RADS 分級3級及以上。排除標準：既往診斷為乳腺癌。共篩選出符合條件的圖像271例。查找患者隨訪記錄，取60例上級醫院病理診斷為乳腺癌超聲圖像，60例由上級醫院病理確診的乳腺良性結節圖像。利用 Python（隨機不放回抽樣）隨機抽取50例乳腺癌及50例良性結節組成試驗組，剩余的10例乳腺癌及良性乳腺結節10例組成測試組。試驗組入組患者均為女性，年齡31～86歲，平均（59.0±10.4）歲。測試組入組患者均為女性，年齡41～78歲，平均（61.0±8.9）歲。

1.2 方法

利用 Image J（fiji）圖片處理軟件處理原始圖片，標記感興趣區（region of interest， ROI）。利用 Python 提取圖像紋理特征。利用 Python 篩選有價值的紋理特征并建立人工智能（artificial intelligence， AI）模型，利用哈佛大學3DQI 實驗室平臺軟件建立 AI 模型，其中包括支持向量機（support vector machine， SVM），隨機森林（random forest， RF），貝葉斯（Na?ve Bayes， NB），神經網絡（neural network， NN），選擇樹（extreme gradient boosting， XGB）5種常用的建模方法。見圖1。

1.3 觀察指標

利用 BORUTA 算法進行重要特征篩選，利用受試者工作特征曲線（receiver operator characteristic? curve， ROC）的曲線下面積（area under the curve， AUC）值、敏感度、特異度、精確度評價 AI 模型性能。敏感度=TP/（TP+FN），特異度=TN/（TN+FP），精確度=TP/（TP+FP）。真陽性（true positive， TP）;假陽性（false positive， FP）;真陰性（true negative， TN）;假陰性（false negative， FN）。

1.4 統計學方法

應用Medcalc（V20.0.3）統計學軟件進行數據分析，計量資料用均數±標準差（x ± s）表示，采用 t 檢驗，利用 Mann-Whitney U 秩和檢驗方法比較 AI 模型之間的性能， P <0.05為差異有統計學意義。

2結果

2.1 篩選重要特征

試驗組提取的重要乳腺癌紋理特征分別是： Height、MinFeret、Minor、Area、FeretAngle、Perim。見圖2。

2.2? 試驗組模型性能及相互比較

試驗組 RF、SVM、NB、NN、XGB 模型性能的 ROC 曲線的 AUC 值（95%CI）分別為：0.806（0.743～0.869），0.838（0.777～0.898），0.843（0.852～0.939），0.895（0.779～0.906），0.906（0.871～0.942），見表1。5種 AI 模型性能兩兩相互比較，NN 模型的 AUC 值小于 XGB 模型，差異無統計學意義（P >0.05）。N N模型的 AUC 值大于 RF 、SVM 和NB 模型，差異有統計學意義（ P <0.05）; XGB 模型的 AUC 值大于 RF、SVM 和 NB 模型，差異有統計學意義（P <0.05），見表2。

2.3? 測試組模型性能及相互比較

測試組 RF、SVM、NB、NN、XGB 性能的 ROC 曲線的 AUC 值（95%CI）分別為：0.973（0.912～1.000），0.867（0.689～1.000），0.880（0.751～1.000），0.893（0.726～1.000），0.960（0.875～1.000），見表3。AUC 值由高至低分別為 RF、XGB、NN、NB、SVM，兩兩相互比較，差異無統計學意義（P >0.05），見表4。

3討論

本研究篩選出特征均來自一階紋理特征，表明形態特征仍然是 AI 模型中判斷乳腺腫瘤良惡性的重要特征。形態特征很容易被人眼識別，所以也從側面印證經過長時間嚴格訓練的超聲醫師裸眼判斷腫瘤性質準確率可以在90%以上[12]。

目前 AI 技術輔助乳腺超聲的臨床研究往往只應用1種 AI 算法或者是1家公司的 AI 輔助診斷系統[13-16]，雖然結果都可以提高乳腺超聲診斷的準確性，但缺乏不同方法間的對比，本試驗應用5種 AI 方法建立模型，期待能選出較為適合乳腺超聲的 AI 模型。在本研究中5種 AI 模型的 AUC 值均大于0.75，均可在臨床上作為篩查的方法，其中 NN 及 XGB 模型表現較為優異，其 AUC 值均大于0.85，敏感度、特異度及精確度均大于0.70，證明這兩種模型性能優良，可以作為診斷技術應用于臨床。

在5種 AI 模型中選擇出 NN 及 XGB 模型，與其他類似的研究結果大致相同[17-19]。對于小樣本的試驗 XGB 技術性能更為突出，本試驗也符合這一特點[20]。

本研究測試測試組模型性能，所有模型均表現良好，但并未顯示出試驗組顯示的差異。可能是由于以下原因：首先測試組樣本量少，測試組與試驗組數據源自同一中心，試驗數據同質性高，導致性能表現良好。其次測試數據樣本量少，無法體現不同方法之間的差異。

本研究存在以下不足，首先是樣本量有限，在以后的研究中不斷充實數據，可改善因數據不足導致的缺陷，如過擬合、數據偏倚等問題。其次單中心研究重復性差，以后的研究中我們會加入圖形正態化模塊，適用于更多中心的圖片分析。最后本研究還沒有整合成單個獨立的運行平臺，需要應用多家的軟件完成此項工作，在后續的研究中，我們也會致力于運行平臺建設，方便臨床醫生操作及應用。

[參考文獻]

[1]中共中央關于制定國民經濟和社會發展第十四個五年規劃和二〇三五年遠景目標的建議[C].中國企業改革發展2020藍皮書，2020：371-386.

[2] 《新一代人工智能發展規劃》[J].科技導報，2018，36（17）：113.

[3]北京市衛生和計劃生育委員會，北京市財政局，北京市總工會，北京市婦女聯合會關于優化整合北京市兩癌篩查和長效體檢工作的通知[J].北京市人民政府公報，2018（31）：68-77.

[4]商木巖，郭帥，張強，等.中國乳腺癌篩查現狀[J].實用癌癥雜志，2020，35（11）：1911-1914.

[5]朱德倉.B 超及鉬靶 X 線在乳腺疾病診斷中的對比分析[J].影像研究與醫學應用，2020，4（16）：48-49.

[6]吐那依木·依克木.乳腺 B 超鑒別乳腺癌的能力分析[J].影像研究與醫學應用，2020，4（7）：223-224.

[7]彭苑嫻.B 超在診斷乳腺占位性病變中的診斷價值[J].黑龍江醫藥，2020，33（1）：183-185.

[8]徐可文，許波，吳英，等.機器學習在超聲圖像中的應用綜述[J].計算機工程與應用，2021，57（4）：11-17.

[9]馬夢偉，秦耿耿，徐維敏，等.基于 X 線及超聲乳腺影像報告和數據系統構建機器學習模型預測乳腺癌分子分型[J].中國醫學影像技術，2020，36（12）：1814-1819.

[10]王慧珠，苑婉茹，王新霞，等.規培醫生應用醫學影像 AI 輔助診斷乳腺腫塊及使用意愿調查研究[J].現代醫藥衛生，2021，37（10）：1755-1757.

[11] Jeongmin Lee，Sanghee Kim，Bong Joo，et al.Evaluation of the effect of computer aided diagnosis system on breast ultrasound for inexperienced radiologists in describing and determining breast lesions[J].Medical Ultrasonography，2019，21（3）：239-245.

[12]臧愛華，姜明，孟聰，等.人工智能系統評估 BI-RADS 4類乳腺腫塊的應用價值[J].中華醫學超聲雜志（電子版），2021，18（8）：795-799.

[13]呂明慧，周帥，朱強.基于深度學習乳腺超聲計算機輔助診斷系統研究進展[J].中國醫學影像技術，2020，36（11）：1722-1725.

[14]楊意，姜偉.超聲新技術在乳腺良惡性病變診斷中的應用進展 [J]. 腫瘤預防與治療，2020，33（11）：894-900.

[15] 趙添羽，苗術，李靖宇，等 . 乳腺腫瘤超聲圖像識別模式分類方法的對比研究 [J]. 影像研究與醫學應用，2021，5（8）：56-57.

[16] 李林翰 . 基于圖神經網絡的乳腺超聲圖像小樣本分類和生成研究 [D]. 成都：四川大學，2021：1-63.

[17] Zhou BY，Wang LF，Yin HH，et al.Decoding the molecular subtypes of breast cancer seen on multimodal ultrasound images using an assembled convolutional neural network model： A prospective and multicentre study[J].EBioMedicine，2021，74：103684.

[18] Hoyt K，Warram JM，Umphrey H，et al.Determination of Breast Cancer Response to Bevacizumab Therapy Using Contrast-Enhanced Ultrasound[J].Ultrasound in Medicine，2010，29（4）：577-585.

[19] Wei Y，Su Z，Li W，et al.Partial dependence of breast tumor malignancy on ultrasound image features derived from boosted trees[J].Journal of Electronic Imaging，2010，19（2）：023004.

[20] Moustafa AF，Cary TW，Sultan LR，et al.Color Doppler Ultrasound Improves Machine Learning Diagnosis of Breast Cancer[J].Diagnostics （Basel），2020，10（9）：631.

（收稿日期：2021-12-10）