












摘" " 要:【目的】可溶性固形物含量(SSC)是評(píng)價(jià)獼猴桃果實(shí)品質(zhì)的關(guān)鍵指標(biāo)。旨在利用高光譜技術(shù)構(gòu)建獼猴桃果實(shí)SSC預(yù)測(cè)方案,實(shí)現(xiàn)無(wú)損、準(zhǔn)確評(píng)估果實(shí)內(nèi)部品質(zhì)。【方法】以米良一號(hào)獼猴桃果實(shí)為研究對(duì)象,對(duì)高光譜圖像進(jìn)行白板校正、感興趣區(qū)域提取;采用MSC、SG平滑、SG-MSC和SG-SNV方法進(jìn)行光譜數(shù)據(jù)預(yù)處理以消除噪聲影響,并通過(guò)PLSR模型確定最優(yōu)方法;結(jié)合CARS、SPA和RF算法分別提取與果實(shí)SSC相關(guān)的特征波段;建立PLSR、SVR、RFR、BPNN模型,比較特征波段與SSC實(shí)測(cè)值之間的耦合關(guān)系,選出最優(yōu)模型,并利用PSO算法優(yōu)化其預(yù)測(cè)精度,以實(shí)現(xiàn)果實(shí)內(nèi)部品質(zhì)的泛化預(yù)測(cè)。【結(jié)果】MSC方法在全波段回歸中表現(xiàn)最佳;CARS算法有效簡(jiǎn)化模型并提取關(guān)鍵特征波段;SVR模型預(yù)測(cè)精度最高,經(jīng)PSO優(yōu)化后訓(xùn)練集和測(cè)試集決定系數(shù)分別為R[2c]=0.949,R[2P]=0.913;均方根誤差分別為RMSEC=0.341 2,RMSEP=0.364 9。【結(jié)論】相比于單一環(huán)節(jié)的算法優(yōu)化,MSC+CARS+PSO-SVR的組合模型在獼猴桃果實(shí)可溶性固形物含量預(yù)測(cè)方面表現(xiàn)更優(yōu),研究結(jié)果可為果品品質(zhì)監(jiān)測(cè)和分級(jí)分選提供技術(shù)支持。
關(guān)鍵詞:獼猴桃;高光譜成像技術(shù);可溶性固形物含量;機(jī)器學(xué)習(xí);品質(zhì)預(yù)測(cè)
中圖分類號(hào):S663.4 文獻(xiàn)標(biāo)志碼:A 文章編號(hào):1009-9980(2024)12-2606-15
Prediction of soluble solids contents in kiwifruit based on both hyperspectral imaging technology and machine learning
LIU Zihan1, 2, LI Ming3#, ZHAO Zhiyao1, CHEN Qian2, LI Jiali2, YU Jiabin1*, QIAN Jianping2*
(1School of Computer and Artificial Intelligence, Beijing Technology And Business University, Beijing 100048, China; 2Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences/State Key Laboratory of Efficient Utilization of Arid and Semi-arid Arable Land in Northern China, Beijing 100081, China; 3Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences/National Key Laboratory for Germplasm Innovation amp; Utilization of Horticultural Crops, Zhengzhou 450009, Henan, China)
Abstract: 【Objective】 In the context of predicting soluble solids contents (SSC) for Miliang No. 1 kiwifruit, SSC is a key quality indicator representing the concentration of soluble sugars, which are important for determining the sweetness and maturity of the fruit. Accurate and timely SSC assessment is crucial for both consumer satisfaction and market pricing. Traditional methods like refractometry and liquid chromatography, while accurate, are time-consuming, costly and destructive, making them unsuitable for large-scale or real-time monitoring. To address these challenges, this study aims to develop a non-destructive SSC prediction model using hyperspectral imaging technology, integrating multiple preprocessing methods, feature extraction algorithms and machine learning models. The goal is to enhance the robustness and generalization of SSC predictions by optimizing the entire prediction process, rather than focusing on individual steps like preprocessing or feature extraction, which has been the primary focus of many previous studies. 【Methods】 This study was conducted using 150 Miliang No. 1 kiwifruit samples, which were randomly divided into a training set of 120 samples and a test set of 30 samples. Hyperspectral images were captured using a Rikola portable hyperspectral imager, covering the spectral range from 500 nm to 900 nm with a wavelength interval of 2 nm, resulting in 194 spectral bands. The imaging was conducted in a controlled dark-box laboratory environment to ensure data consistency and minimize external interference. After the hyperspectral images were captured, SSC measurements were performed using an ATAGO PAL-BX/ACID 8 refractometer. Three SSC measurements were taken for each sample, and the arithmetic mean of the three values was used as the actual SSC value. To improve the quality of the spectral data, various preprocessing methods were applied. Four specific methods were employed to enhance data consistency and eliminate noise: multiplicative scatter correction (MSC), Savitzky-Golay smoothing (SG), SG combined with MSC (SG-MSC) and SG combined with standard normal variate (SG-SNV). The optimal preprocessing method was determined based on the performance of the partial least squares regression (PLSR) model, with MSC identified as the most effective method for reducing noise and correcting baseline drift. On this basis, feature extraction was performed using competitive adaptive reweighted sampling (CARS), successive projections algorithm (SPA) and random frog (RF) to identify key spectral bands most relevant to SSC. These extracted spectral bands were then used as inputs for four machine learning models: partial least squares regression (PLSR), support vector regression (SVR), random forest regression (RFR) and backpropagation neural network (BPNN). The coupling relationships between the spectral data and the actual SSC measurements were evaluated, and their predictive performances were compared. Based on the best-performing model, particle swarm optimization (PSO) was further introduced to fine-tune the model parameters, aiming to enhance both prediction accuracy and generalization ability. 【Results】 After applying the four preprocessing methods to the spectral data, the MSC method was found to be the most effective at eliminating noise and baseline drift, leading to a significant overlap in the spectral curves. The MSC-CARS-PLSR, MSC-SPA-PLSR and MSC-RF-PLSR models demonstrated improved performance compared to the full-band PLSR model. Specifically, the R2 value for these models increased by 0.01 to 0.092, while the RMSEC decreased by 0.038 3 to 0.134 1. The three feature extraction methods were particularly successful in reducing interfering variables and improving the predictive power of the models. It was noted that the majority of the spectral feature bands identified through the feature extraction process were concentrated within the 750 nm to 900 nm range, indicating that this range was the most sensitive interval for predicting the SSC of kiwifruit. Following feature extraction, the performance of the four machine learning models was evaluated, and the MSC-CARS-SVR model was found to exhibit the best predictive performance. After PSO parameter optimization, the comparison revealed that MSC-CARS-PSO-SVR model had the best prediction effect, with the coefficient of determination [R2e]=0.949, [R2p]=0.913, the root mean square error RMSEC=0.341 2, and the RMSEP=0.364 9. These results indicated that the SVR model, especially when optimized using PSO, was highly effective at handling complex, high-dimensional and small-sample data, making it particularly well-suited for predicting SSC in kiwifruit and other quality metrics. However, the worst prediction effect was achieved by utilizing the BPNN model, in which the CARS-BPNN test set [R2p]=0.633, RMSEP=1.230 8. It indicated that the characteristics of the dataset used in this experiment may not be applicable to neural network prediction models, as its complexity or size may not be sufficient to effectively avoid model overfitting, which in turn may lead to limited prediction performance and affect the accuracy of the results. 【Conclusion】 The results of this study demonstrate that the MSC-CARS-PSO-SVR model is highly effective at predicting the internal quality indicators of kiwifruit, particularly SSC. This model provides a scientific basis for non-destructive quality inspection of agricultural products. By combining data preprocessing, feature extraction and machine learning techniques with hyperspectral imaging, the study presents a rapid, non-destructive method for SSC detection in kiwifruit. The findings offer valuable technical support for intelligent fruit quality monitoring, grading and sorting systems, and have the potential to be applied across a wide range of fruit and agricultural products in related industries.
Key words: Kiwifruit; Hyperspectral imaging technology; Soluble solids content; Machine learning; Quality prediction
獼猴桃因富含維生素C、膳食纖維和多種礦物質(zhì)而受到消費(fèi)者的喜愛[1]。可溶性固形物含量(soluble solids content,SSC)主要指可溶性糖類含量,是衡量獼猴桃果實(shí)口感甜度和成熟度的關(guān)鍵品質(zhì)指標(biāo)之一[2-3],直接影響消費(fèi)者購(gòu)買意愿和水果市場(chǎng)價(jià)格[4]。因此,實(shí)現(xiàn)獼猴桃果實(shí)SSC的精準(zhǔn)、快速定量檢測(cè),對(duì)監(jiān)測(cè)果實(shí)內(nèi)部品質(zhì)、優(yōu)化種植管理?xiàng)l件以及提升市場(chǎng)流通競(jìng)爭(zhēng)力具有重要意義。
目前,果品內(nèi)部品質(zhì)指標(biāo)分析測(cè)定主要依靠高準(zhǔn)確性的有損檢測(cè)技術(shù),包括折射儀、液相色譜法等,但在實(shí)際應(yīng)用中存在成本高、破壞性強(qiáng)等局限性[5]。為滿足快速、無(wú)損的果品品質(zhì)監(jiān)測(cè)需求,核磁共振、光譜分析和電子鼻等技術(shù),已被廣泛研究并應(yīng)用于檢測(cè)水果的內(nèi)部品質(zhì)[6]。其中,高光譜成像技術(shù)作為新興的光學(xué)檢測(cè)方法,可同時(shí)獲取待測(cè)對(duì)象的內(nèi)外部品質(zhì)信息,即二維空間和一維光譜信息。二維空間信息用于直接提取待測(cè)對(duì)象的外部品質(zhì)特征(如大小、形狀);將一維光譜信息與對(duì)象特定成分及含量等特征進(jìn)行耦合分析,可以實(shí)現(xiàn)果品內(nèi)部品質(zhì)預(yù)測(cè)評(píng)估[7]。因此,國(guó)內(nèi)外學(xué)者已逐漸將高光譜技術(shù)用于高效、無(wú)損的果品品質(zhì)監(jiān)測(cè)與分級(jí)研究中。Shao等[8]利用可見光和近紅外(Vis-NIR)高光譜成像技術(shù)監(jiān)測(cè)不同成熟期的冬棗果實(shí)SSC,以分析確定其貨架期。林嬌嬌等[9]利用近紅外高光譜(NIR-HSI)成像技術(shù)實(shí)現(xiàn)了不同品種杧果SSC的分析預(yù)測(cè)。高光譜技術(shù)相較于傳統(tǒng)近紅外技術(shù),具有更廣的光譜覆蓋范圍和更高的光譜分辨率[10],能夠捕捉果實(shí)內(nèi)部可溶性固形物含量的精細(xì)空間分布,提供更深入的皮下成分信息,從而在非破壞性檢測(cè)中實(shí)現(xiàn)更接近于傳統(tǒng)破壞性測(cè)量的精確性,同時(shí)具備分析更復(fù)雜成分的潛力。盡管高光譜技術(shù)在果品內(nèi)部品質(zhì)檢測(cè)中提供了更加深入的數(shù)據(jù)基礎(chǔ),但仍需通過(guò)預(yù)處理如Savitzky-Golay(SG)平滑、多元散射校正(multiplicative scatter correction,MSC)等方法來(lái)用于預(yù)先校準(zhǔn)光譜數(shù)據(jù),去除環(huán)境噪聲,從而提高原始光譜數(shù)據(jù)的可靠性[11]。為進(jìn)一步保留關(guān)鍵有效信息并實(shí)現(xiàn)數(shù)據(jù)降維,特征波段提取如主成分分析(principal component analysis,PCA)、遺傳算法(genetic algorithm,GA)等[12]方法被采用,以平衡模型準(zhǔn)確性和預(yù)測(cè)效率。針對(duì)不同果品及其內(nèi)部品質(zhì)指標(biāo)的光譜響應(yīng)獨(dú)特性,還需明確品質(zhì)指標(biāo)與其敏感特征波段之間的耦合關(guān)系,以建立可泛化的果品品質(zhì)預(yù)測(cè)模型。
針對(duì)光譜技術(shù)在果品品質(zhì)預(yù)測(cè)中的應(yīng)用,機(jī)器學(xué)習(xí)模型主要分為線性和非線性模型兩類。其中,線性模型如偏最小二乘回歸(partial least squares regression,PLSR)模型、嶺回歸(ridge regression)模型等,已被廣泛應(yīng)用于光譜數(shù)據(jù)分析領(lǐng)域,具有可解釋性強(qiáng)、計(jì)算效率高的優(yōu)勢(shì)[13]。但面對(duì)復(fù)雜高維的光譜數(shù)據(jù)時(shí)泛化能力較弱,難以動(dòng)態(tài)適應(yīng)跨域監(jiān)測(cè)場(chǎng)景。目前,非線性機(jī)器學(xué)習(xí)在處理光譜數(shù)據(jù)方面展現(xiàn)出良好的適應(yīng)性,該模型能夠自主學(xué)習(xí)特征波段與內(nèi)部品質(zhì)指標(biāo)間的潛在耦合關(guān)系并不斷優(yōu)化提升預(yù)測(cè)效果,適用于大規(guī)模數(shù)據(jù)集的分析任務(wù)[14-15]。因此,將光譜成像技術(shù)與非線性機(jī)器學(xué)習(xí)算法結(jié)合,可以充分發(fā)揮兩者優(yōu)勢(shì),以實(shí)現(xiàn)不同批次、實(shí)時(shí)、準(zhǔn)確的果品品質(zhì)無(wú)損檢測(cè)。Li等[16]利用高光譜成像技術(shù)結(jié)合偏最小二乘回歸(PLSR)、支持向量機(jī)回歸(support vector machine regression,SVR)、反向傳播神經(jīng)網(wǎng)絡(luò)(back propagation neural network,BPNN)以及卷積神經(jīng)網(wǎng)絡(luò)(convolutional neural network,CNN)等機(jī)器學(xué)習(xí)方法對(duì)枇杷SSC進(jìn)行無(wú)損檢測(cè),結(jié)果表明,在小樣本情況下非線性機(jī)器學(xué)習(xí)(SVR)預(yù)測(cè)模型的精度高于其他模型,能夠?qū)崿F(xiàn)快速、精準(zhǔn)預(yù)測(cè)果品內(nèi)部品質(zhì)指標(biāo)。這些研究強(qiáng)調(diào)了非線性模型在光譜數(shù)據(jù)分析中展現(xiàn)出優(yōu)越的預(yù)測(cè)性能,尤其在處理復(fù)雜數(shù)據(jù)時(shí)具有更強(qiáng)的適應(yīng)性和泛化能力。因此,將非線性機(jī)器學(xué)習(xí)算法應(yīng)用于果品品質(zhì)的無(wú)損檢測(cè),不僅能夠提升預(yù)測(cè)精度,還為大規(guī)模、實(shí)時(shí)果品品質(zhì)檢測(cè)提供了可行的解決方案。
盡管已有大量研究在光譜數(shù)據(jù)處理的某個(gè)環(huán)節(jié)(數(shù)據(jù)預(yù)處理、特征波段提取等)取得了顯著進(jìn)展,但對(duì)于整個(gè)預(yù)測(cè)流程的系統(tǒng)協(xié)調(diào)和優(yōu)化仍存在不足,限制了模型的穩(wěn)健性和普適性[17-18]。因此,筆者針對(duì)米良一號(hào)獼猴桃果實(shí)的SSC預(yù)測(cè),提出了一種系統(tǒng)優(yōu)化策略,深入分析其光譜特性,綜合考慮數(shù)據(jù)預(yù)處理、特征波段提取及模型構(gòu)建等多個(gè)環(huán)節(jié),旨在提高模型的預(yù)測(cè)精度與泛化能力。對(duì)多種光譜數(shù)據(jù)預(yù)處理方法進(jìn)行了比較分析,篩選出最優(yōu)的預(yù)處理方法以提升數(shù)據(jù)質(zhì)量。然后采用多種特征波段提取方法,分別提取與獼猴桃SSC相關(guān)的關(guān)鍵光譜波段。在此基礎(chǔ)上,構(gòu)建PLSR、SVR、RFR和BPNN模型,評(píng)估各模型與SSC實(shí)測(cè)值的耦合關(guān)系,并對(duì)其預(yù)測(cè)性能進(jìn)行比較。基于性能最優(yōu)的模型,筆者進(jìn)一步引入粒子群優(yōu)化算法(particle swarm optimization,PSO),對(duì)模型參數(shù)進(jìn)行優(yōu)化,提升預(yù)測(cè)精度和泛化能力。通過(guò)對(duì)不同環(huán)節(jié)、不同方法的組合預(yù)測(cè)效果分析,筆者構(gòu)建了一套基于高光譜成像技術(shù)的獼猴桃果實(shí)SSC預(yù)測(cè)最適組合方案,為實(shí)現(xiàn)獼猴桃果實(shí)品質(zhì)監(jiān)測(cè)和分級(jí)分選的產(chǎn)業(yè)化、智能化提供理論依據(jù)。
1 材料和方法
1.1 材料
試驗(yàn)材料為米良一號(hào)獼猴桃果實(shí),種植于河南省鄭州市中國(guó)農(nóng)業(yè)科學(xué)院鄭州果樹研究所獼猴桃試驗(yàn)園。隨機(jī)選擇15株長(zhǎng)勢(shì)一致的植株,按照NY/T 1392—2015《獼猴桃采收與貯運(yùn)技術(shù)規(guī)范》中的適宜采收期[19]指標(biāo)統(tǒng)一采集果實(shí),每株采集發(fā)育良好、大小均勻一致的果實(shí)10個(gè),共選取150個(gè)樣本,裝入采樣箱,并運(yùn)至實(shí)驗(yàn)室進(jìn)行后續(xù)相關(guān)數(shù)據(jù)采集工作。
1.2 獼猴桃果實(shí)高光譜圖像數(shù)據(jù)采集
筆者搭建了一套專門的光譜信息采集系統(tǒng)用于采集獼猴桃果實(shí)高光譜圖像數(shù)據(jù),包括環(huán)境模擬和數(shù)據(jù)采集模塊,如圖1所示。其中,在位移平臺(tái)上放置的暗箱斜上方安裝固定兩枚鎢鹵素?zé)簦?00 W),以模擬穩(wěn)定均勻的自然光照環(huán)境。將Rikola便攜式高光譜成像儀(500~900 nm,北京德中天地)鏡頭朝下,架設(shè)于暗箱正上方50 cm處,并通過(guò)數(shù)據(jù)線連接外置裝有高光譜相控軟件(rikola hyper spectral imager,Rikola HSI)的計(jì)算機(jī),實(shí)現(xiàn)獼猴桃果實(shí)的高光譜圖像實(shí)時(shí)采集。試驗(yàn)開展于2023年10月12日,處于獼猴桃成熟期,采集過(guò)程中將獼猴桃果實(shí)標(biāo)記編號(hào),與標(biāo)準(zhǔn)白板(JY-WS1,廣州景頤光電)一起均勻放置在位移平臺(tái)上進(jìn)行拍攝,波長(zhǎng)間隔設(shè)置為2 nm,共采集了194個(gè)波段的獼猴桃果實(shí)高光譜圖像。
1.3 高光譜數(shù)據(jù)處理
為提高采集的光譜數(shù)據(jù)質(zhì)量,降低數(shù)據(jù)冗余,保障模型預(yù)測(cè)準(zhǔn)確高效,筆者分別對(duì)原始高光譜圖像進(jìn)行以下數(shù)據(jù)處理:(1)白板校正,確定全白基準(zhǔn),保證光譜反射率數(shù)據(jù)準(zhǔn)確性;(2)ROI(region of interest,ROI)區(qū)域提取,以明確有效圖像數(shù)據(jù)范圍;(3)光譜預(yù)處理,進(jìn)一步消除噪聲干擾,減小數(shù)據(jù)誤差;(4)特征波段處理,實(shí)現(xiàn)數(shù)據(jù)降維,提取關(guān)鍵特征信息。
1.3.1" " 白板校正" " 高光譜成像儀在采集光譜時(shí),暗電流和光源亮度分布不均勻等現(xiàn)象會(huì)導(dǎo)致光譜采集對(duì)象的反射率波動(dòng),進(jìn)而影響高光譜圖像的整體質(zhì)量。因此,需針對(duì)原始高光譜圖像反射率進(jìn)行白板校正[20],在ENVI(environment for visualizing images software,Research Systems Inc.,Boulder,Co,USA)軟件中選取標(biāo)準(zhǔn)白板區(qū)域的多個(gè)像素點(diǎn),將其平均輻亮度作為標(biāo)準(zhǔn)值與相應(yīng)波段下的反射率匹配,以校正圖像中其他區(qū)域像素點(diǎn)的各波段反射率。校正公式如下:
R=[RWLW]×L。 " " " " " " " " " " " (1)
其中,R為待計(jì)算像素點(diǎn)的反射率,RW為白板反射率,LW為白板像素點(diǎn)輻亮度值,L為待計(jì)算像素點(diǎn)輻亮度值。
1.3.2" " ROI區(qū)域數(shù)據(jù)提取" " 在高光譜圖像校正的基礎(chǔ)上,利用ENVI軟件選取獼猴桃果實(shí)部分作為感興趣區(qū)域(ROI),并計(jì)算不同波段下ROI內(nèi)所有像素點(diǎn)的光譜反射率平均值,作為對(duì)應(yīng)樣本果實(shí)的反射率。在提取并保存獼猴桃果實(shí)高光譜圖像中的有效信息后,利用MATLAB R2022b軟件執(zhí)行光譜數(shù)據(jù)的后續(xù)預(yù)處理及分析與預(yù)測(cè)等相關(guān)工作。
1.3.3" " 光譜預(yù)處理" " 為進(jìn)一步降低因光線、噪聲、基線漂移等不確定因素造成的光譜數(shù)據(jù)誤差,提高數(shù)據(jù)質(zhì)量,需要對(duì)有效數(shù)據(jù)進(jìn)行預(yù)處理[21]。筆者分別采用了4種常見預(yù)處理方法對(duì)光譜數(shù)據(jù)進(jìn)行優(yōu)化,包括多元散射校正(MSC)、Savitzky-Golay平滑(SG)、Savitzky-Golay平滑結(jié)合多元散射校正(SG-MSC)和Savitzky-Golay平滑結(jié)合標(biāo)準(zhǔn)正態(tài)變量變換(SG-SNV)。以上4種方法各具特點(diǎn),其中,MSC可以有效降低樣本散射對(duì)光譜信息的影響,從而增強(qiáng)光譜吸收信息與成分含量的相關(guān)性,提高吸收光譜的信噪比[22];SG平滑通過(guò)擬合局部光譜趨勢(shì)并去除不符合趨勢(shì)的噪聲成分,使光譜數(shù)據(jù)更加平滑準(zhǔn)確[23] ;將SG平滑分別和MSC、SNV結(jié)合,前者能校正散射效應(yīng)引起的光譜失真,后者可以補(bǔ)償由顆粒大小和表面散射引起的偏差[24]。為獲得更清晰準(zhǔn)確的光譜數(shù)據(jù),筆者系統(tǒng)對(duì)比4種常見光譜預(yù)處理方法在獼猴桃SSC預(yù)測(cè)中的應(yīng)用效果,結(jié)合偏最小二乘回歸(PLSR)模型量化各方法的優(yōu)劣,旨在篩選出最佳預(yù)處理策略,以優(yōu)化數(shù)據(jù)質(zhì)量,為后續(xù)建模分析與預(yù)測(cè)提供高效、準(zhǔn)確的數(shù)據(jù)支持,從而提高整體分析流程的準(zhǔn)確度和效率。
1.3.4" " 特征波段提取" " 由于高光譜數(shù)據(jù)具有復(fù)雜高維的特征信息,為有效消除原始數(shù)據(jù)中的線性相關(guān)性和不穩(wěn)定性,需提取特征波段信息以解決維數(shù)過(guò)多等問(wèn)題[25]。筆者分別采用了競(jìng)爭(zhēng)性自適應(yīng)重加權(quán)采樣(competitive adaptive reweighted sampling,CARS)、連續(xù)投影(successive projections algorithm,SPA)、隨機(jī)蛙跳(random frog,RF)3種算法提取與獼猴桃果實(shí)SSC變化相關(guān)的有效光譜特征波段。其中,CARS通過(guò)模擬“生物進(jìn)化”過(guò)程,自適應(yīng)地對(duì)光譜波段進(jìn)行重加權(quán)和選擇,逐步淘汰冗余和不重要的波段,從而提高模型預(yù)測(cè)性能[26];SPA算法則利用最少冗余且最具代表性的波長(zhǎng)組合來(lái)選擇光譜信息,以解決共線性問(wèn)題[27-28];RF算法通過(guò)序貫策略生成不同的模型,確定各波段變量的選擇概率,以評(píng)估不同波段的重要性[29],進(jìn)而實(shí)現(xiàn)對(duì)高維數(shù)據(jù)的有效變量篩選。
1.4 獼猴桃果實(shí)SSC實(shí)測(cè)數(shù)據(jù)采集
為獲取獼猴桃果實(shí)SSC實(shí)測(cè)數(shù)據(jù)作為品質(zhì)預(yù)測(cè)模型的預(yù)測(cè)效果驗(yàn)證,筆者采用PAL-BX/ACID 8數(shù)字折射儀(0.0~90.0 °Brix; 0.1 °Brix; ATAGO)測(cè)量樣本獼猴桃果實(shí)SSC實(shí)測(cè)值;SSC測(cè)定參考NY/T 2637—2014《水果、蔬菜制品可溶性固形物含量的測(cè)定 折射儀法》[30],具體操作如下:(1)對(duì)獼猴桃果實(shí)樣本進(jìn)行去皮處理,以消除果皮對(duì)果肉內(nèi)部可溶性固形物含量測(cè)定的影響;(2)取適量處理后的果肉進(jìn)行壓榨,提取出清澈的果汁樣本,利用ATAGO數(shù)字折射儀進(jìn)行測(cè)定。同時(shí),為確保數(shù)據(jù)的準(zhǔn)確性,對(duì)每份樣本進(jìn)行3次獨(dú)立測(cè)量,計(jì)算平均值作為樣本果實(shí)的SSC實(shí)測(cè)值。
原始數(shù)據(jù)集包含150個(gè)米良一號(hào)樣本果實(shí),按照3∶1的比例隨機(jī)劃分為訓(xùn)練集和測(cè)試集。樣本集的SSC參考值如表1所示。表1展示了訓(xùn)練集和測(cè)試集的最小值、最大值、平均值及標(biāo)準(zhǔn)差。
1.5 果品品質(zhì)預(yù)測(cè)模型建立
1.5.1" " 基于機(jī)器學(xué)習(xí)的獼猴桃果實(shí)SSC預(yù)測(cè)模型
通過(guò)對(duì)獼猴桃果實(shí)樣本的SSC特征光譜數(shù)據(jù)及其實(shí)測(cè)值的耦合關(guān)系進(jìn)行建模,可泛化實(shí)現(xiàn)獼猴桃果實(shí)SSC的無(wú)損預(yù)測(cè)。筆者利用隨機(jī)數(shù)排序方法,將150個(gè)獼猴桃果實(shí)樣本劃分為訓(xùn)練集(120個(gè))和測(cè)試集(30個(gè)),并分別采用偏最小二乘回歸(PLSR)、支持向量機(jī)回歸(SVR)、隨機(jī)森林回歸(random forest regression,RFR)、反向傳播神經(jīng)網(wǎng)絡(luò)(BPNN)4種機(jī)器學(xué)習(xí)模型建立所需耦合模型。
為了驗(yàn)證不同機(jī)器學(xué)習(xí)方法在獼猴桃SSC預(yù)測(cè)中的效果,筆者基于各算法處理高維光譜數(shù)據(jù)的能力和適用性,采用對(duì)比分析方法評(píng)估其實(shí)際預(yù)測(cè)性能,并確定最優(yōu)預(yù)測(cè)方案。常用的PLSR、SVR和RFR模型分別擅長(zhǎng)于解決多重共線性問(wèn)題、非線性回歸和復(fù)雜數(shù)據(jù)的泛化問(wèn)題。PLSR模型基于主成分分析和最小二乘回歸,通過(guò)探索輸入變量(樣本光譜數(shù)據(jù))與輸出變量(SSC預(yù)測(cè)值)之間的最大協(xié)方差,在高度相關(guān)的樣本變量下對(duì)光譜數(shù)據(jù)進(jìn)行建模分析[31];SVR模型在高維光譜特征空間中,通過(guò)最優(yōu)超平面并最小化光譜數(shù)據(jù)點(diǎn)與其距離來(lái)穩(wěn)定實(shí)現(xiàn)回歸預(yù)測(cè),對(duì)噪聲和異常值具有魯棒性,在捕捉光譜數(shù)據(jù)與目標(biāo)變量關(guān)系上表現(xiàn)出較高準(zhǔn)確性[32];RFR模型通過(guò)綜合多個(gè)決策樹的預(yù)測(cè)結(jié)果,有效捕捉光譜數(shù)據(jù)與獼猴桃果實(shí)SSC的潛在非線性規(guī)律,擅長(zhǎng)分析光譜數(shù)據(jù)與目標(biāo)變量之間的復(fù)雜關(guān)聯(lián)關(guān)系,有效降低過(guò)擬合風(fēng)險(xiǎn),提高預(yù)測(cè)穩(wěn)定性和泛化能力[33]。
為了優(yōu)化模型性能,筆者引入了反向傳播神經(jīng)網(wǎng)絡(luò)(BPNN)模型,利用反向傳播算法優(yōu)化神經(jīng)網(wǎng)絡(luò)的權(quán)重和偏置[34],以非線性方式學(xué)習(xí)獼猴桃果實(shí)高光譜數(shù)據(jù)與SSC之間的復(fù)雜映射關(guān)系,具有強(qiáng)適應(yīng)性和精確預(yù)測(cè)特性,能夠捕捉光譜數(shù)據(jù)中的細(xì)微變化。本研究旨在通過(guò)比較4種模型在獼猴桃可溶性固形物含量(SSC)預(yù)測(cè)中的表現(xiàn),系統(tǒng)評(píng)估其預(yù)測(cè)效果。為提升模型精度,選取表現(xiàn)最優(yōu)的模型,借助粒子群優(yōu)化算法(PSO)對(duì)模型參數(shù)進(jìn)行全局搜索與優(yōu)化,解決了高維光譜數(shù)據(jù)在傳統(tǒng)算法下難以突破的精度瓶頸。通過(guò)組合常用算法和新興算法,并引入PSO進(jìn)行優(yōu)化,有效提升了獼猴桃果實(shí)SSC預(yù)測(cè)的準(zhǔn)確性和模型的魯棒性,為獼猴桃SSC的無(wú)損檢測(cè)提供了一套更為高效可靠的最優(yōu)組合方案。
粒子群優(yōu)化算法(particle swarm optimization,PSO)是一種基于群體智能的優(yōu)化算法,由Kennedy和Eberhart于1995年提出。該算法模擬鳥群覓食行為,通過(guò)一群粒子在解空間中的協(xié)同搜索找到最優(yōu)解[35]。每個(gè)粒子代表一個(gè)候選解,具有速度和位置兩個(gè)屬性。粒子根據(jù)自身的歷史最優(yōu)解(pbest)和全局最優(yōu)解(gbest)來(lái)更新位置和速度,不斷逼近最優(yōu)解。在高光譜預(yù)測(cè)模型中,PSO用于優(yōu)化模型的參數(shù)配置。每個(gè)粒子代表一個(gè)候選的模型參數(shù)組合,并根據(jù)個(gè)體最優(yōu)解和全局最優(yōu)解的反饋,不斷調(diào)整參數(shù),直到找到最優(yōu)參數(shù)配置。這種機(jī)制可以幫助機(jī)器學(xué)習(xí)預(yù)測(cè)模型,快速優(yōu)化其參數(shù),提高模型在獼猴桃SSC預(yù)測(cè)中的精度和泛化能力。
粒子速度的更新公式為:
vi(t+1)=wvi(t)+c1r1[pbesti-xi(t)]+
c2r2[gbest-xi(t)]; " " " "(2)
位置更新公式為:
xi(t+1)=xi(t)+vi(t+1)。" " " " " " " " " " " " " " " (3)
其中,vi表示粒子速度,t為迭代次數(shù),xi表示粒子的參數(shù)值,w為慣性權(quán)重,c1和c2為加速因子,r1和r2為隨機(jī)數(shù),pbesti為粒子i迄今為止找到的最佳位置,gbest為全體粒子中找到的全局最佳位置。
1.5.2 模型精度評(píng)價(jià) 采用決定系數(shù)(coefficient of determination,R2)和均方根誤差(root mean squares error,RMSE)兩種誤差指標(biāo)評(píng)估模型性能。R2越接近1,RMSE越小(訓(xùn)練集和測(cè)試集的決定系數(shù)分別為R[2c]和R[2P],均方根誤差分別為RMSEC和RMSEP),表明模型的預(yù)測(cè)能力越強(qiáng)。其計(jì)算公式如下:
R2=[i=1N(yi-yi)2i=1N(yi-y)2];" " " " " " " " " " " " " " " " " " " " " " " "(4)
RMSE=[1Ni=1N(yi-yi)2]。" " " " " " " " " " " " " " " " (5)
式中,N是樣本果實(shí)的總數(shù),yi是模型預(yù)測(cè)的第i個(gè)樣本的SSC值,y為所有樣本SSC實(shí)際觀測(cè)值的平均值,yi是第i個(gè)樣本的SSC實(shí)際觀測(cè)值。
本試驗(yàn)主要分為三部分,包括原始數(shù)據(jù)采集、光譜數(shù)據(jù)處理以及建立最適SSC預(yù)測(cè)模型,總體研究流程圖如圖2所示。
2 結(jié)果與分析
2.1 基于不同預(yù)處理方式的全波段建模分析
2.1.1" " 光譜曲線分析" " 筆者選取500~900 nm范圍內(nèi)的光譜進(jìn)行分析。首先,對(duì)150個(gè)米良一號(hào)獼猴桃果實(shí)樣本的光譜反射率進(jìn)行平均計(jì)算,并繪制平均反射率曲線,如圖3-A所示。在500~630 nm波段,果實(shí)的光譜反射率相對(duì)較低,這一結(jié)果與葉綠素吸收帶內(nèi)的情況相關(guān),受到葉綠素中C-H光譜敏感基團(tuán)對(duì)光的吸收影響;在500~610 nm范圍內(nèi),光譜變化緩慢;在580~610 nm波段內(nèi),光譜反射率下降,繼而穩(wěn)定;在610~750 nm波段內(nèi),光譜反射率急劇增加,這表明果實(shí)表面色素的吸收性質(zhì)發(fā)生了變化。在750~900 nm范圍內(nèi),光譜反射率持續(xù)較高且略微波動(dòng),對(duì)應(yīng)水的吸收峰,O-H光譜敏感基團(tuán)吸收率下降[36-38]。原始光譜數(shù)據(jù)可能存在儀器產(chǎn)生的電噪聲等干擾信息,為確保數(shù)據(jù)的準(zhǔn)確性與可靠性,對(duì)原始光譜數(shù)據(jù)進(jìn)行預(yù)處理[39],并對(duì)預(yù)處理后的光譜數(shù)據(jù)繪圖,結(jié)果見圖3-B~E。
在圖3-A中,651.43 nm附近的光譜曲線受噪聲和基線漂移的影響,導(dǎo)致部分樣本在該波長(zhǎng)處未能形成明顯清晰的峰值。由圖3-B可知,經(jīng)過(guò)MSC預(yù)處理后,光譜曲線高度重合,基線漂移問(wèn)題明顯緩解,但仍存在噪聲影響,部分樣本在651.43 nm處的峰值仍不明顯;SG平滑處理能進(jìn)一步提升光譜曲線的平滑度,專注于消除局部噪聲,保留了光譜的細(xì)微反射率差異(圖3-C)。然而,單獨(dú)使用SG處理可能無(wú)法全面解決問(wèn)題。當(dāng)將SG與MSC或SNV結(jié)合使用時(shí),效果更為顯著,不僅提高了光譜曲線的平滑度和特征峰的清晰度,還成功消除了651.43 nm處的無(wú)峰現(xiàn)象(圖3-D、E)。雖然SG-MSC和SG-SNV處理會(huì)改變光譜的輪廓形狀,但能夠有效揭示潛在的光譜峰。此外,SG-SNV雖能實(shí)現(xiàn)光譜標(biāo)準(zhǔn)正態(tài)化,但也伴隨著噪聲放大的問(wèn)題。
2.1.2" " 全波段建模分析" " 分別對(duì)MSC、SG、SG-MSC、SG-SNV等4種預(yù)處理后的光譜數(shù)據(jù)建立PLSR模型,以測(cè)試集的決定系數(shù)R[2P]和均方根誤差RMSEP來(lái)確定最佳預(yù)處理方法,建模結(jié)果如表2所示。通過(guò)對(duì)比和分析發(fā)現(xiàn),MSC對(duì)樣本果實(shí)光譜預(yù)處理的效果最優(yōu),其測(cè)試集的決定系數(shù)R[2P]大于0.7,表明MSC處理后的光譜反射率與樣本果實(shí)SSC的相關(guān)性得到顯著增強(qiáng),選擇MSC處理后的光譜為研究光譜。
2.2 特征波段提取結(jié)果分析
筆者采集的高光譜圖像包含194個(gè)波段,使用全波段進(jìn)行建模時(shí),分析時(shí)間長(zhǎng)且容易造成信息冗余。為此,采用CARS、SPA、RF等3種算法對(duì)經(jīng)過(guò)MSC預(yù)處理后的光譜數(shù)據(jù)分別進(jìn)行特征波段提取,提取結(jié)果分布如圖4所示。
表3為3種方法提取特征波段的具體結(jié)果,經(jīng)CARS提取的28個(gè)特征點(diǎn)主要集中在700~850 nm,但分布較為不均,部分特征點(diǎn)位于525~700 nm;經(jīng)SPA提取的30個(gè)特征點(diǎn)主要集中在725~878 nm;經(jīng)RF提取的21個(gè)特征點(diǎn)主要集中在600~700 nm和725~825 nm。3種方法所選的特征波段主要集中在750~900 nm,主要因?yàn)樵谠摬ǘ畏秶鷥?nèi)獼猴桃果實(shí)的光譜數(shù)據(jù)差異明顯,包含更多與SSC有關(guān)的光譜信息。
2.3 不同特征波段提取與機(jī)器學(xué)習(xí)方法組合的預(yù)測(cè)結(jié)果
為確定獼猴桃果實(shí)SSC預(yù)測(cè)最適的組合方案,筆者基于CARS、SPA和RF等3種方法篩選后的特征波段光譜數(shù)據(jù)作為輸入,以SSC預(yù)測(cè)值作為輸出,分別建立了4種預(yù)測(cè)模型,包括PLSR、SVR、RFR以及BPNN。表4為不同模型組合的預(yù)測(cè)結(jié)果。
經(jīng)過(guò)CARS、SPA、RF特征提取后,得到的特征點(diǎn)輸入PLSR預(yù)測(cè)模型與全光譜PLSR模型相比,3種提取方法均提升了模型的預(yù)測(cè)效果。相較于4種預(yù)處理后的全光譜PLSR模型,R[2c]提升0.010~0.092,RMSEC降低0.038 3~0.134 1;R[2p]提升0.031~0.100,RMSEP降低0.025 9~0.095 7,這說(shuō)明,以上3種方法有效地去除了光譜中的冗余信息,降低了數(shù)據(jù)的維度,提高了模型的精度,同時(shí)保留了用于反演SSC的重要光譜信息。以上結(jié)果表明,不論采用哪種建模方法,CARS特征波段的選取均能在不同程度上優(yōu)化模型,使其在解決變量數(shù)多的同時(shí)更多保留有效信息。圖5為基于CARS提取的特征波段作為輸入的4種預(yù)測(cè)模型中,訓(xùn)練集和測(cè)試集中的樣本預(yù)測(cè)值與實(shí)測(cè)值之間的關(guān)系散點(diǎn)圖。
SVR模型大多數(shù)預(yù)測(cè)點(diǎn)集中在擬合曲線上,表現(xiàn)出較好的預(yù)測(cè)性能和穩(wěn)定性,最佳模型為CARS-SVR,訓(xùn)練集和測(cè)試集的決定系數(shù)分別為R[2c]=0.930,R[2p]=0.88 2,RMSEC為0.388 7,RMSEP為0.526 0,其他SVR模型的測(cè)試集決定系數(shù)均高于0.80,這主要是因?yàn)镾VR能夠有效處理高維數(shù)據(jù),通過(guò)核函數(shù)將輸入空間的樣本數(shù)據(jù)映射到更高維的特征空間中,完成非線性變換。因此,SVR在處理復(fù)雜、高維的小樣本數(shù)據(jù)時(shí)具有顯著優(yōu)勢(shì)。
相比之下,RFR模型對(duì)不同特征變量預(yù)測(cè)模型的均方根誤差RMSE范圍為0.396~0.662,訓(xùn)練集和測(cè)試集的決定系數(shù)R[2c]范圍為0.892~0.922、R[2p]范圍為0.821~0.855。其中,CARS-RFR模型在訓(xùn)練集中(R[2c]=0.922和RMSEC=0.396 7)和測(cè)試集中(R[2p]=0.855,RMSEP=0.532 2)表現(xiàn)相對(duì)較好,但與SVR模型相比,其預(yù)測(cè)精度存在一定差距,這主要是因?yàn)楸驹囼?yàn)樣本數(shù)量較少,RFR模型訓(xùn)練過(guò)程中可能過(guò)度適應(yīng)訓(xùn)練集數(shù)據(jù)中的噪聲,而這些噪聲在測(cè)試集上并不存在泛化能力。BPNN模型由于計(jì)算量大、訓(xùn)練速度慢,且易陷入局部最小值,導(dǎo)致其訓(xùn)練過(guò)程易出現(xiàn)過(guò)擬合或欠擬合問(wèn)題[40],因此性能不如SVR和RFR模型。然而,CARS-BPNN模型的表現(xiàn)仍優(yōu)于其他BPNN模型,3種BPNN模型R2均在0.616~0.685之間,RMSE處于1.1~1.3內(nèi)。盡管BPNN預(yù)測(cè)效果相對(duì)較差,但具備一定的預(yù)測(cè)能力,后續(xù)可通過(guò)算法優(yōu)化或數(shù)據(jù)增強(qiáng)進(jìn)一步提升其性能。
圖6展示了經(jīng)PSO優(yōu)化后的獼猴桃果實(shí)可溶性固形物含量的最優(yōu)預(yù)測(cè)模型。粒子群優(yōu)化算法(PSO)主要用于優(yōu)化SVR模型中的懲罰系數(shù)c和核函數(shù)參數(shù)y,其中,c控制模型的誤差容忍度,影響模型的過(guò)擬合和欠擬合;而y決定核函數(shù)的非線性映射能力。通過(guò)全局搜索,PSO動(dòng)態(tài)調(diào)整這兩個(gè)參數(shù),提升了模型的預(yù)測(cè)精度,有效避免陷入局部最優(yōu)解。與基礎(chǔ)SVR模型相比,PSO-SVR模型的預(yù)測(cè)性能顯著提升。在訓(xùn)練集和測(cè)試集上的決定系數(shù)R[2c]和R[2p]分別提升0.019和0.031,RMSEC和RMSEP分別降低0.047 5和0.161 1。此結(jié)果與Lin等[41]關(guān)于PSO在支持向量機(jī)模型的研究一致,表明PSO通過(guò)全局搜索有效優(yōu)化了參數(shù),避免了傳統(tǒng)方法的局部最優(yōu)問(wèn)題。Houssein等[42]進(jìn)一步指出,PSO在處理復(fù)雜、高維數(shù)據(jù)時(shí)顯著提升了SVR模型的泛化能力,尤其是在小樣本數(shù)據(jù)集上表現(xiàn)出更好的穩(wěn)定性和預(yù)測(cè)精度。因此,PSO-SVR模型在應(yīng)對(duì)非線性和高維數(shù)據(jù)時(shí)表現(xiàn)尤為優(yōu)異,為獼猴桃SSC預(yù)測(cè)提供了更為可靠的解決方案。
3 討 論
筆者深入研究了獼猴桃果實(shí)可溶性固形物含量(SSC)與光譜反射率之間的對(duì)應(yīng)關(guān)系。在500~900 nm波段范圍內(nèi),針對(duì)高光譜數(shù)據(jù),采用多種預(yù)處理和特征提取方法,并結(jié)合4種機(jī)器學(xué)習(xí)模型。為進(jìn)一步提升模型的預(yù)測(cè)精度和泛化能力,利用粒子群優(yōu)化(PSO)算法對(duì)最優(yōu)基礎(chǔ)預(yù)測(cè)模型進(jìn)行參數(shù)優(yōu)化,最終建立了獼猴桃SSC預(yù)測(cè)的最優(yōu)方案。
本研究在嚴(yán)格控制的實(shí)驗(yàn)室暗箱環(huán)境中,使用高光譜成像技術(shù)對(duì)獼猴桃果實(shí)進(jìn)行圖像拍攝,并對(duì)原始光譜圖像進(jìn)行白板校正,利用ENVI軟件中的ROI提取功能提取有效光譜數(shù)據(jù)。然后對(duì)有效光譜數(shù)據(jù)分別采用MSC、SG、SG-MSC、SG-SNV等4種方法進(jìn)行預(yù)處理。通過(guò)比較不同光譜預(yù)處理效果后,發(fā)現(xiàn)MSC處理后的光譜曲線較原始光譜曲線更為緊湊,能有效去除光譜噪聲,增強(qiáng)光譜特征之間的相關(guān)性。進(jìn)一步利用PLSR模型對(duì)比分析4種預(yù)處理方法,發(fā)現(xiàn)MSC處理在建模效果上優(yōu)于其他3種方法,其R[2c]=0.695,RMSEC=0.9254;R[2p]=0.713,RMSEP=0.8954。這說(shuō)明經(jīng)過(guò)MSC處理后,光譜數(shù)據(jù)的信噪比明顯提升,即信號(hào)(果實(shí)SSC相關(guān)光譜特征)相對(duì)于噪聲(如散射和儀器噪聲引起的隨機(jī)波動(dòng))的比例增大,與劉美辰等[43]的研究結(jié)果一致,MSC預(yù)處理能夠有效去除噪聲,并增強(qiáng)數(shù)據(jù)的一致性,使不同樣本之間的光譜數(shù)據(jù)在形狀和趨勢(shì)上更為接近,有助于更容易地識(shí)別和提取與果實(shí)品質(zhì)相關(guān)的特征信息,為后續(xù)的特征提取和模型構(gòu)建提供可靠的數(shù)據(jù)基礎(chǔ)。
在建模過(guò)程中,由于共線性問(wèn)題較嚴(yán)重且高光譜數(shù)據(jù)冗余信息較多,可能導(dǎo)致建模效果不佳,因此需進(jìn)行特征波長(zhǎng)提取。經(jīng)過(guò)MSC光譜預(yù)處理后,采用CARS、SPA和RF等3種特征波段提取方法,分別獲取了28、30和21個(gè)特征點(diǎn)。波段位置數(shù)據(jù)顯示,獼猴桃果實(shí)可溶性固形物含量的光譜敏感區(qū)間主要位于750~900 nm之間,集中在740~800 nm范圍內(nèi)。這與李浩等[44]的研究結(jié)果相符,經(jīng)過(guò)CARS篩選波長(zhǎng)后,輸入特征數(shù)量明顯減少,特征點(diǎn)間的相關(guān)性也得到增強(qiáng)。通過(guò)對(duì)比特征提取前后的PLSR模型,R2表現(xiàn)出顯著提升,RMSE明顯降低。這3種特征波段提取方法均有效地簡(jiǎn)化了模型結(jié)構(gòu),顯著提高了預(yù)測(cè)模型的準(zhǔn)確性和效率。
筆者構(gòu)建了4個(gè)用于預(yù)測(cè)獼猴桃果實(shí)可溶性固形物含量的機(jī)器學(xué)習(xí)模型,包括PLSR、SVR、RFR、BPNN。模型輸入為光譜特征波段,輸出為SSC預(yù)測(cè)值。通過(guò)比較這4個(gè)模型的R2和RMSE發(fā)現(xiàn),SVR預(yù)測(cè)性能最佳,具有更好的擬合效果,預(yù)測(cè)結(jié)果的偏離程度和誤差更小。此外,利用CARS方法提取的特征波段建模效果最優(yōu),不僅提高了模型的預(yù)測(cè)速度和精度,同時(shí)提取的波段數(shù)量較少。最終確定MSC-CARS-SVR為最優(yōu)模型,其訓(xùn)練集和測(cè)試集的決定系數(shù)分別為R[2c]=0.930,R[2p]=0.882;均方根誤差分別為RMSEC=0.387 7,RMSEP=0.526 0。進(jìn)一步利用粒子群優(yōu)化算法(PSO)對(duì)SVR參數(shù)進(jìn)行優(yōu)化后,模型性能顯著提升,在訓(xùn)練集和測(cè)試集上的決定系數(shù)R[2c]和R[2p]分別提升0.019和0.031,RMSEC和RMSEP分別降低0.047 5和0.161 1。PSO通過(guò)全局搜索和信息交換機(jī)制有效避免陷入局部最優(yōu)解,顯著提高了SVR模型在小樣本和高維光譜數(shù)據(jù)中的預(yù)測(cè)精度和泛化能力,展現(xiàn)出在小樣本條件下的優(yōu)越性能。與董金磊等[45]利用高光譜技術(shù)結(jié)合SPA算法和BPNN模型預(yù)測(cè)獼猴桃果實(shí)SSC的結(jié)果相比,本研究的訓(xùn)練集相關(guān)系數(shù)R[2c]提高了0.019,均方根誤差RMSEC減少了0.554 8;說(shuō)明采用各環(huán)節(jié)的最優(yōu)策略組合模型能夠更好地預(yù)測(cè)獼猴桃SSC。劉文政等[46]的研究也表明,SVR模型在預(yù)測(cè)葡萄中的總酚和單寧含量時(shí),預(yù)測(cè)性能優(yōu)于CNN和PLSR模型,進(jìn)一步證實(shí)了SVR在處理非線性關(guān)系和噪聲數(shù)據(jù)時(shí)的優(yōu)勢(shì)。
相比之下,BPNN模型雖然具有較強(qiáng)的非線性解釋能力,但在本研究中BPNN模型預(yù)測(cè)效果最差,其中MSC-CARS-BPNN的測(cè)試集R[2p]=0.633,RMSEP=1.230 8。這可能是由于BPNN容易在訓(xùn)練過(guò)程中過(guò)度學(xué)習(xí)數(shù)據(jù)細(xì)節(jié),導(dǎo)致過(guò)擬合,從而在新數(shù)據(jù)上表現(xiàn)出較差的泛化能力[47]。盡管BPNN能夠在一定程度上緩解低估和高估現(xiàn)象,但其表現(xiàn)受制于模型結(jié)構(gòu)、測(cè)定時(shí)期和數(shù)據(jù)特性。羅浪琴等[48]利用BP神經(jīng)和SVR結(jié)合近紅外光譜技術(shù)預(yù)測(cè)核桃仁可溶性蛋白質(zhì)含量的研究結(jié)果與本研究不一致,其原因可能在于數(shù)據(jù)預(yù)處理方法和模型選擇的差異,說(shuō)明特征提取和模型優(yōu)化在不同研究中的預(yù)測(cè)性能具有關(guān)鍵影響。
目前,本研究在建模數(shù)據(jù)方面主要關(guān)注獼猴桃單一品種和單一品質(zhì)指標(biāo)的預(yù)測(cè)。隨著實(shí)際農(nóng)業(yè)生產(chǎn)對(duì)不同成熟期、不同品種以及不同果園預(yù)測(cè)需求的增加,本研究方法的適用性仍需進(jìn)一步驗(yàn)證和優(yōu)化。未來(lái)研究將擴(kuò)展試驗(yàn)和建模分析范圍,重點(diǎn)關(guān)注不同成熟度果實(shí)以及多品質(zhì)指標(biāo),以期實(shí)現(xiàn)更全面、準(zhǔn)確的水果內(nèi)部品質(zhì)指標(biāo)預(yù)測(cè),推動(dòng)水果品質(zhì)檢測(cè)與分級(jí)技術(shù)的持續(xù)改進(jìn)。
4 結(jié) 論
筆者通過(guò)針對(duì)獼猴桃果實(shí)SSC的快速無(wú)損檢測(cè),建立了一套基于高光譜技術(shù)的數(shù)據(jù)預(yù)處理、特征波段提取及機(jī)器學(xué)習(xí)預(yù)測(cè)的最優(yōu)組合方案。結(jié)果表明,MSC-CARS-SVR模型表現(xiàn)最佳。通過(guò)粒子群優(yōu)化(PSO)算法優(yōu)化SVR模型參數(shù),其測(cè)試集的決定系數(shù)R[2p]為0.913,均方根誤差RMSEP為0.364 9,表明優(yōu)化后的SVR模型能夠顯著提升預(yù)測(cè)精度,有效地預(yù)測(cè)獼猴桃果實(shí)的內(nèi)部品質(zhì)指標(biāo)。該研究為農(nóng)產(chǎn)品無(wú)損品質(zhì)檢測(cè)提供了科學(xué)依據(jù),并為獼猴桃果實(shí)品質(zhì)分級(jí)的精細(xì)化、智能化管理提供了便捷高效的技術(shù)手段。
參考文獻(xiàn) References:
[1] 劉笑宏,趙玲玲,牟紅梅,唐美玲,慈志娟,肖慧琳,蘇佳明. 獼猴桃采后保鮮技術(shù)研究進(jìn)展[J]. 保鮮與加工,2021,21(11):121-128.
LIU Xiaohong,ZHAO Lingling,MU Hongmei,TANG Meiling,CI Zhijuan,XIAO Huilin,SU Jiaming. Research progress on preservation technology for postharvest kiwifruit[J]. Storage and Process,2021,21(11):121-128.
[2] MA T,XIA Y,INAGAKI T,TSUCHIKAWA S. Non-destructive and fast method of mapping the distribution of the soluble solids content and pH in kiwifruit using object rotation near-infrared hyperspectral imaging approach[J]. Postharvest Biology and Technology,2021,174:111440.
[3] ESCRIBANO S,BIASI W V,LERUD R,SLAUGHTER D C,MITCHAM E J. Non-destructive prediction of soluble solids and dry matter content using NIR spectroscopy and its relationship with sensory quality in sweet cherries[J]. Postharvest Biology and Technology,2017,128:112-120.
[4] LI L,PENG Y K,YANG C,LI Y Y. Optical sensing system for detection of the internal and external quality attributes of apples[J]. Postharvest Biology and Technology,2020,162:111101.
[5] LI J L,SUN D W,CHENG J H. Recent advances in nondestructive analytical techniques for determining the total soluble solids in fruits:A review[J]. Comprehensive Reviews in Food Science and Food Safety,2016,15(5):897-911.
[6] 孫靜濤,羅一甲,史學(xué)偉,馬本學(xué),王文霞,董娟. 葡萄品質(zhì)無(wú)損檢測(cè)技術(shù)的研究進(jìn)展[J]. 光譜學(xué)與光譜分析,2020,40(9):2713-2720.
SUN Jingtao,LUO Yijia,SHI Xuewei,MA Benxue,WANG Wenxia,DONG Juan. Research progress on non-destructive detection technology for grape quality[J]. Spectroscopy and Spectral Analysis,2020,40(9):2713-2720.
[7] TIAN P,MENG Q H,WU Z F,LIN J J,HUANG X,ZHU H,ZHOU X L,QIU Z Q,HUANG Y Q,LI Y. Detection of mango soluble solid content using hyperspectral imaging technology[J]. Infrared Physics amp; Technology,2023,129:104576.
[8] SHAO Y Y,JI S H,XUAN G T,WANG K L,XU L Q,SHAO J. Soluble solids content monitoring and shelf life analysis of winter jujube at different maturity stages by Vis-NIR hyperspectral imaging[J]. Postharvest Biology and Technology,2024,210:112773.
[9] 林嬌嬌,蒙慶華,吳哲鋒,常洪娟,倪淳宇,邱鄒全,李華榮,黃玉清. 基于近紅外高光譜技術(shù)的杧果可溶性固形物含量無(wú)損檢測(cè)[J]. 果樹學(xué)報(bào),2024,41(1):122-132.
LIN Jiaojiao,MENG Qinghua,WU Zhefeng,CHANG Hongjuan,NI Chunyu,QIU Zouquan,LI Huarong,HUANG Yuqing. Fruit soluble solids content non-destructive detection based on visible/near infrared hyperspectral imaging in mango[J]. Journal of Fruit Science,2024,41(1):122-132.
[10] BHARGAVA A,SACHDEVA A,SHARMA K,ALSHARIF M H,UTHANSAKUL P,UTHANSAKUL M. Hyperspectral imaging and its applications:A review[J]. Heliyon,2024,10(12):e33208.
[11] 孫嘉豪,張偉,施鑒芩,李艷坤. 光譜數(shù)據(jù)預(yù)處理策略選擇及應(yīng)用[J]. 計(jì)量學(xué)報(bào),2023,44(8):1284-1292.
SUN Jiahao,ZHANG Wei,SHI Jianqin,LI Yankun. Selection and application of spectral data preprocessing strategy[J]. Acta Metrologica Sinica,2023,44(8):1284-1292.
[12] 鮑浩,張艷. 基于改進(jìn)哈里斯鷹優(yōu)化算法的光譜特征波段選擇模型研究[J]. 光譜學(xué)與光譜分析,2024,44(1):148-157.
BAO Hao,ZHANG Yan. Research on spectral feature band selection model based on improved Harris hawk optimization algorithm[J]. Spectroscopy and Spectral Analysis,2024,44(1):148-157.
[13] 浦育歌. 基于可見/近紅外光譜的蘋果霉心病與可溶性固形物在線檢測(cè)方法研究[D]. 楊凌:西北農(nóng)林科技大學(xué),2023.
PU Yuge. Research on online detection methods for apple modly core disease and soluble solids based on visible/near-infrared spectroscopy[D]. Yangling:Northwest A amp; F University,2023.
[14] PRAJAPATI A,DEHAL A,KUMAR A R. Microplastics in soils and sediments:A review of characterization,quantitation,and ecological risk assessment[J]. Water,Air,amp; Soil Pollution,2024,235(3):189.
[15] ?ZGEN? E. Advanced analytical techniques for assessing and detecting microplastic pollution in water and wastewater systems[J]. Environmental Quality Management,2024,34(1):22217.
[16] LI S Y,SONG Q M,LIU Y J,ZENG T H,LIU S Y,JIE D F,WEI X. Hyperspectral imaging-based detection of soluble solids content of loquat from a small sample[J]. Postharvest Biology and Technology,2023,204:112454.
[17] 高升,徐建華. 高光譜成像的紅提總酸與硬度的預(yù)測(cè)及其分布可視化[J]. 食品科學(xué),2023,44(2):327-336.
GAO Sheng,XU Jianhua. Hyperspectral imaging for prediction and distribution visualization of total acidity and hardness of red globe grapes[J]. Food Science,2023,44(2):327-336.
[18] 霍迎秋,張晨,李宇豪,智文濤,張炯,劉景玲. 高光譜圖像結(jié)合機(jī)器學(xué)習(xí)方法無(wú)損檢測(cè)獼猴桃[J]. 中國(guó)農(nóng)機(jī)化學(xué)報(bào),2019,40(4):71-77.
HUO Yingqiu,ZHANG Chen,LI Yuhao,ZHI Wentao,ZHANG Jiong,LIU Jingling. Nondestructive detection for kiwifruit based on the hyperspectral technology and machine learning[J]. Journal of Chinese Agricultural Mechanization,2019,40(4):71-77.
[19] 李玉闊,林苗苗,宋哲,詹栩,李曉晗,齊秀娟. 中國(guó)不同地區(qū)中獼2號(hào)獼猴桃果實(shí)品質(zhì)評(píng)價(jià)體系的建立[J]. 果樹學(xué)報(bào),2024,41(7):1368-1377.
LI Yukuo,LIN Miaomiao,SONG Zhe,ZHAN Xu,LI Xiaohan,QI Xiujuan. Establishment of comprehensive evaluation system for fruit quality of Zhongmi No. 2 kiwifruit from different regions of China[J]. Journal of Fruit Science,2024,41(7):1368-1377.
[20] 楊涵,陳謙,王寶剛,李文生,李文志,王炳策,錢建平. 利用高光譜技術(shù)預(yù)測(cè)采前獼猴桃干物質(zhì)含量的可行性試驗(yàn)[J]. 農(nóng)業(yè)工程學(xué)報(bào),2022,38(13):133-140.
YANG Han,CHEN Qian,WANG Baogang,LI Wensheng,LI Wenzhi,WANG Bingce,QIAN Jianping. Feasibility of estimating the dry matter content of kiwifruits before being harvested using hyperspectral technology[J]. Transactions of the Chinese Society of Agricultural Engineering,2022,38(13):133-140.
[21] 鄭麗娜. 基于高光譜技術(shù)的獼猴桃內(nèi)部品質(zhì)檢測(cè)研究[D]. 雅安:四川農(nóng)業(yè)大學(xué),2019.
ZHENG Lina. Study on internal quality detection of kiwifruit based on hyperspectral technology[D]. Ya’an:Sichuan Agricultural University,2019.
[22] 沈兵兵,姚星偉,王懷文. 基于高光譜技術(shù)的花椰菜農(nóng)藥殘留檢測(cè)[J]. 包裝工程,2022,43(19):173-179.
SHEN Bingbing,YAO Xingwei,WANG Huaiwen. Detection of pesticide residues in cauliflower based on hyperspectral technology[J]. Packaging Engineering,2022,43(19):173-179.
[23] 王迪,馮偉華,郭軍偉,王銳,劉惠民,宗國(guó)浩,劉紹鋒,王永勝,趙樂. 基于Savitzky-Golay平滑插值的煙草近紅外光譜模型轉(zhuǎn)移[J]. 煙草科技,2022,55(8):41-48.
WANG Di,F(xiàn)ENG Weihua,GUO Junwei,WANG Rui,LIU Huimin,ZONG Guohao,LIU Shaofeng,WANG Yongsheng,ZHAO Le. Tobacco near infrared spectral model transfer based on Savitzky-Golay smooth interpolation[J]. Tobacco Science amp; Technology,2022,55(8):41-48.
[24] 劉昊靈,張仲雄,陳昂,浦育歌,趙娟,胡瑾. 融合光譜形態(tài)特征的蘋果霉心病檢測(cè)方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2023,39(1):162-170.
LIU Haoling,ZHANG Zhongxiong,CHEN Ang,PU Yuge,ZHAO Juan,HU Jin. Detection method for apple moldy cores based on spectral shape features[J]. Transactions of the Chinese Society of Agricultural Engineering,2023,39(1):162-170.
[25] 李慶旭,王巧華,馬美湖,肖仕杰,施行. 基于可見/近紅外光譜和深度學(xué)習(xí)的早期鴨胚雌雄信息無(wú)損檢測(cè)[J]. 光譜學(xué)與光譜分析,2021,41(6):1800-1805.
LI Qingxu,WANG Qiaohua,MA Meihu,XIAO Shijie,SHI Hang. Non-destructive detection of male and female information of early duck embryos based on visible/near infrared spectroscopy and deep learning[J]. Spectroscopy and Spectral Analysis,2021,41(6):1800-1805.
[26] LIANG L,WEI L L,F(xiàn)ANG G G,XU F,DENG Y J,SHEN K Z,TIAN Q W,WU T,ZHU B P. Prediction of holocellulose and lignin content of pulp wood feedstock using near infrared spectroscopy and variable selection[J]. Spectrochimica Acta Part A:Molecular and Biomolecular Spectroscopy,2020,225:117515.
[27] YUAN R R,LIU G S,HE J G,WAN G L,F(xiàn)AN N Y,LI Y,SUN Y R. Classification of Lingwu long jujube internal bruise over time based on visible near-infrared hyperspectral imaging combined with partial least squares-discriminant analysis[J]. Computers and Electronics in Agriculture,2021,182:106043.
[28] ZHU G Z,TIAN C N. Determining sugar content and firmness of ‘Fuji’ apples by using portable near-infrared spectrometer and diffuse transmittance spectroscopy[J]. Journal of Food Process Engineering,2018,41(6):e12810.
[29] 張楷鑫. 基于光譜成像技術(shù)的煤巖特征信息檢測(cè)與識(shí)別方法研究[D]. 西安:西安科技大學(xué),2021.
ZHANG Kaixin. Research on detection and recognition of coal and rock feature information based on spectral imaging technology[D]. Xi’an:Xi’an University of Science and Technology,2021.
[30] OLAREWAJU O O,BERTLING I,MAGWAZA L S. Non-destructive evaluation of avocado fruit maturity using near infrared spectroscopy and PLS regression models[J]. Scientia Horticulturae,2016,199:229-236.
[31] 許麗佳,陳銘,王玉超,陳曉燕,雷小龍. 高光譜成像的獼猴桃糖度無(wú)損檢測(cè)方法[J]. 光譜學(xué)與光譜分析,2021,41(7):2188-2195.
XU Lijia,CHEN Ming,WANG Yuchao,CHEN Xiaoyan,LEI Xiaolong. Study on Non-destructive detection method of kiwifruit sugar content based on hyperspectral imaging technology[J]. Spectroscopy and Spectral Analysis,2021,41(7):2188-2195.
[32] 汪曉慧. 基于高光譜熒光成像技術(shù)的水蜜桃品質(zhì)參數(shù)無(wú)損檢測(cè)研究[D]. 雅安:四川農(nóng)業(yè)大學(xué),2023.
WANG Xiaohui. Study on non-destructive detection of peach quality parameters based on hyperspectral fluorescence imaging technology[D]. Ya’an:Sichuan Agricultural University,2023.
[33] CHEN D S,ZHANG F,TAN M L,CHAN N W,SHI J C,LIU C J,WANG W W. Improved Na+ estimation from hyperspectral data of saline vegetation by machine learning[J]. Computers and Electronics in Agriculture,2022,196:106862.
[34] YE F. Particle swarm optimization-based automatic parameter selection for deep neural networks and its applications in large-scale and high-dimensional data[J]. PLoS One,2017,12(12):e0188746.
[35] GAD A G. Particle swarm optimization algorithm and its applications:A systematic review[J]. Archives of Computational Methods in Engineering,2022,29(5):2531-2561.
[36] WEI X,HE J C,YE D P,JIE D F. Navel orange maturity classification by multispectral indexes based on hyperspectral diffuse transmittance imaging[J]. Journal of Food Quality,2017,2017:1023498.
[37] LI X L,WEI Y Z,XU J,F(xiàn)ENG X P,WU F Y,ZHOU R Q,JIN J J,XU K W,YU X J,HE Y. SSC and pH for sweet assessment and maturity classification of harvested cherry fruit based on NIR hyperspectral imaging technology[J]. Postharvest Biology and Technology,2018,143:112-118.
[38] 鄭藝?yán)? 基于高光譜和太赫茲光譜的甘薯品質(zhì)檢測(cè)方法研究[D]. 南昌:華東交通大學(xué),2020.
ZHENG Yilei. Research on sweet potato quality detection method based on hyperspectral and terahertz spectroscopy[D]. Nanchang:East China Jiaotong University,2020.
[39] 宋相中,熊艷梅,張錄達(dá),閔順耕. 分子光譜波長(zhǎng)選擇值得注意的幾個(gè)問(wèn)題[J]. 光譜學(xué)與光譜分析,2016,36(增刊1):181-182.
SONG Xiangzhong,XIONG Yanmei,ZHANG Luda,MIN Shungeng. Several notable problems of wavelength selection in molecular spectroscopy area[J]. Spectroscopy and Spectral Analysis,2016,36(Suppl. 1):181-182.
[40] ZHU N,WANG K,ZHANG S L,ZHAO B,YANG J N,WANG S W. Application of artificial neural networks to predict multiple quality of dry-cured ham based on protein degradation[J]. Food Chemistry,2021,344:128586.
[41] LIN S W,YING K C,CHEN S C,LEE Z J. Particle swarm optimization for parameter determination and feature selection of support vector machines[J]. Expert Systems with Applications,2008,35(4):1817-1824.
[42] HOUSSEIN E H,GAD A G,HUSSAIN K,SUGANTHAN P N. Major advances in particle swarm optimization:Theory,analysis,and application[J]. Swarm and Evolutionary Computation,2021,63:100868.
[43] 劉美辰,薛河儒,劉江平,代榮榮,胡鵬偉,黃清,姜新華. 牛奶蛋白質(zhì)含量的SSA-SVM高光譜預(yù)測(cè)模型[J]. 光譜學(xué)與光譜分析,2022,42(5):1601-1606.
LIU Meichen,XUE Heru,LIU Jiangping,DAI Rongrong,HU Pengwei,HUANG Qing,JIANG Xinhua. Hyperspectral analysis of milk protein content using SVM optimized by sparrow search algorithm[J]. Spectroscopy and Spectral Analysis,2022,42(5):1601-1606.
[44] 李浩,于滈,曹永研,郝子源,楊瑋,李民贊. 利用CARS-CNN模型的土壤有機(jī)質(zhì)含量高光譜預(yù)測(cè)[J]. 光譜學(xué)與光譜分析,2024,44(8):2303-2309.
LI Hao,YU Hao,CAO Yongyan,HAO Ziyuan,YANG Wei,LI Minzan. Hyperspectral prediction of soil organic matter content using CARS-CNN modelling[J]. Spectroscopy and Spectral Analysis,2024,44(8):2303-2309.
[45] 董金磊,郭文川. 采后獼猴桃可溶性固形物含量的高光譜無(wú)損檢測(cè)[J]. 食品科學(xué),2015,36(16):101-106.
DONG Jinlei,GUO Wenchuan. Nondestructive detection of soluble solid content of postharvest kiwifruits based on hyperspectral imaging technology[J]. Food Science,2015,36(16):101-106.
[46] 劉文政,周雪健,平鳳嬌,蘇媛,鞠延侖,房玉林,楊繼紅. 基于可見-近紅外光譜的鮮食葡萄成熟品質(zhì)關(guān)鍵指標(biāo)檢測(cè)[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2024,55(2):372-383.
LIU Wenzheng,ZHOU Xuejian,PING Fengjiao,SU Yuan,JU Yanlun,F(xiàn)ANG Yulin,YANG Jihong. Detection of key indicators of ripening quality in table grapes based on visible-near-infrared spectroscopy[J]. Transactions of the Chinese Society for Agricultural Machinery,2024,55(2):372-383.
[47] 王麗愛,馬昌,周旭東,訾妍,朱新開,郭文善. 基于隨機(jī)森林回歸算法的小麥葉片SPAD值遙感估算[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2015,46(1):259-265.
WANG Liai,MA Chang,ZHOU Xudong,ZI Yan,ZHU Xinkai,GUO Wenshan. Estimation of wheat leaf SPAD value using RF algorithmic model and remote sensing data[J]. Transactions of the Chinese Society for Agricultural Machinery,2015,46(1):259-265.
[48] 羅浪琴,王濤,劉國(guó)慶,趙文革,張銳,于軍,陸斌,陳天財(cái). 基于近紅外光譜法建立核桃仁可溶性蛋白質(zhì)含量檢測(cè)模型[J]. 果樹學(xué)報(bào),2023,40(8):1750-1761.
LUO Langqin,WANG Tao,LIU Guoqing,ZHAO Wenge,ZHANG Rui,YU Jun,LU Bin,CHEN Tiancai. A model for soluble protein content detection of walnuts based on near infrared spectroscopy[J]. Journal of Fruit Science,2023,40(8):1750-1761.