Shaokang Hou,Yaoru Liu,Qiang Yang
State Key Laboratory of Hydroscience and Engineering,Tsinghua University,Beijing,100084,China
Keywords:Tunnel boring machine(TBM)operation data Rock mass classi fication Stacking ensemble learning Sample imbalance Synthetic minority oversampling technique(SMOTE)
ABSTRACT Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,re flecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classi fier for the real-time prediction of the rock mass classi fication using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TB M tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classi fiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classi fier,seven individual classi fiers are established as the comparison.These classi fiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classi fier are optimised using the grid search method.The prediction results show that the stacking ensemble classi fier has a better performance than individual classi fiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the in fluence of sample imbalance on the prediction performance is discussed.
Tunnel boring machines(TBMs)are widely used in the construction of underground engineering.Compared with the drill and blast method,TBMs have the advantages of fast construction speed and minor environmental disturbance,which is suitable for constructing long-distance tunnels(Zheng et al.,2016;Liu et al.,2020a).However,TBMs are sensitive to geological conditions,and the uncertainty of rock mass and adverse geological conditions are the main risks in TBM excavation(Hamidi et al.,2010;Hasanpour et al.,2017;Zhou et al.,2021a).Therefore,evaluation of rock mass quality is of great signi ficance to the safety and ef ficiency of tunnel construction.On the one hand,at the design stage,TBMtype and support form selection are determined according to the rock mass classi fication obtained from geological prospecting.On the other hand,in the construction process,the parameters of TBMare adjusted adaptively according to the rock mass classes(Gong et al.,2016).Before the tunnel construction,there are many geological prospecting means,which can roughly describe the geological and surrounding rock conditions of the construction site(Li et al.,2017).However,due to the fact that the space between the cutterhead and tunnel face is narrow,it is challenging to acquire the surrounding rock parameters through traditional exploration and in situ testing methods(Liu et al.,2020b).Consequently,the limited rock parameters are not suf ficient for the adjustment and optimisation of TBMoperation parameters.Therefore,it is crucial to put forward a method that can accurately and real-time predict the rock mass classi fication in front of the tunnel face.
For rock mass classi fication,different scholars have proposed many representative theoretical methods.For example,Bieniawski(1973)proposed the rock mass rating(RMR)system in 1973 after investigating more than 300 tunnels.RMR scores the rock mass quality,mainly considering the uniaxial compressive strength(UCS)of rock mass,rock quality designation(RQD),joint spacing,joint condition(JC),groundwater state and correction coef ficient to determine the total score of rock mass,and divides the rock mass quality into five grades.The Q system for rock mass quality assessment proposed by Barton et al.(1974)is also an earlier method of rock mass classi fication,which considers the integrity of rock mass,groundwater condition,in situ stress,joint characteristics,and uses six parameters to determine the rock mass quality index re flecting the stability of surrounding rock.Furthermore,by considering the in fluence of structure and discontinuity surface conditions on the mechanical properties of rock mass based on the Hoek-Brown criterion,Hoek (1994)proposed the geological strength index(GSI)to realise the classi fication of rock mass quality.In 2002,Barton(2002)revised the Q system and explained the corresponding relationship between the new Q system and the RMR system.Besides the above methods,in China,the mainly used methods include the basic quality(BQ)method and the hydropower classi fication(HC)method(GB50487-2008,2008;GB/T50218-2014,2014).Different rock mass classi fication methods have been widely used in tunnel,mining and other underground engineering.However,the traditional theoretical rock mass classi fication methods are usually used at the preconstruction stage,and the related indices are dif ficult to be obtained during the tunnel construction process(Huang et al.,2013).Additionally,for most rock mass classi fication methods,the mapping relationship between the indices and rock mass classes is unclear,and the randomness of index distribution is hardly considered(Zheng et al.,2020).
In addition to theoretical classi fication methods,many researchers have introduced arti ficial intelligence methods for evaluating rock mass quality in recent years.These methods also explore the relationship between the factors that may affect the performance and operational parameters of TBM(Zhou et al.,2021a;Chen et al.,2021),minimizing the subjectivity and inaccuracy of arti ficial evaluation.Gholami et al.(2013)used the index parameters of the RMR system as the inputs of machine learning models to predict the RMR for the tunnel surrounding rock,and it showed that the machine learning models have more reliable prediction results than the use of empirical correlations.Salimi et al.(2017)established the correlation between the field penetration index(FPI)and rock mass quality parameters,e.g.UCS,RQD and JC using the regression tree model.Santos et al.(2021)used factor analysis to extract three common factors from the indices of the RMR system,and based on this,an arti ficial neural network(ANN)classi fier was established to predict the rock mass classi fication.Zheng et al.(2020)established a classi fier based on a leastsquares support vector machine(LSSVM)optimised by a bacterial foraging optimisation algorithm(BFOA).Also,they used geological prediction and rock strength resilience results as the inputs of the classi fier to predict the rock mass classes.Zhao et al.(2019)proposed a data-driven framework to predict the geological type thickness of an urban subway based on the values of seven physical-mechanical indices.Jalalifar et al.(2014)established the two rock mass classi fication models based on the fuzzy inference system and the multi-variable regression analysis to predict the accurate RMR,and the fuzzy model showed better prediction accuracy than the regression model.
Currently,the traditional classi fication methods have been widely used,and the research of machine learning models based on the parameters of traditional classi fication methods or parameters of the geological forecast beforehand also achieved good progress(Alimoradi et al.,2008;Shi et al.,2014).However,the parameters of the theoretical rock mass classi fication methods need to be obtained through field and laboratory tests,which cannot be easily collected in real time during the TBM tunnelling(Huang et al.,2013).Therefore,it is unable to achieve real-time and fast prediction of rock mass classes through the above measuring parameters.In the actual engineering practice,there are some geological forecast beforehand that can predict the rock mass conditions in front of the tunnel face.However,the geological forecast beforehand needs additional time and equipment,which will increase the cost of the project.Furthermore,TBMis a large equipment and occupies most of the space near the tunnel face,thus it is challenging to install the equipment of the geological forecast beforehand(Li et al.,2020).TBM can be seen as a large-scale rock testing machine,and the tunnelling and rock breaking process of TBM is essentially a process of rock-TBMinteraction(Yang et al.,2016).Therefore,in the TBM tunnelling process,the change of machine operation parameters results from the interaction between the TBM system and surrounding rocks(Zhang et al.,2019).Many studies have shown that the TBM operation parameters can be used to re flect the rock mass conditions(Yagiz,2006;Hassanpour et al.,2011;Salimi et al.,2018;Liu et al.,2020c).Additionally,during TBMtunnelling,a large volume of mechanical information of the TBMcan be automatically collected by various sensors(Jung et al.,2019).Therefore,it is feasible to predict the rock mass classi fication in front of the tunnel face in real time based on the TBM operation parameters as the inputs of machine learning models.
In most of the existing researches,different individual classi fiers are often used to predict rock mass classi fication.However,the number of valuable data in engineering fields is relatively small,and the data proportion of different rock mass classes is usually quite different in practice.Therefore,rock mass class prediction belongs to the problem of small and imbalanced samples.For this kind of problem,the individual classi fiers are easy to cause overfitting for the majority class samples,and the prediction performance is often poor for minority class samples(Ganganwar,2012;Sainin et al.,2017).Ensemble learning is a powerful technique that integrates multiple individual classi fiers to form a robust classi fier.Many studies show that ensemble learning models have a strong generalisation ability and better performance on imbalanced datasets(Salunkhe and Mali,2016;Feng et al.,2020).In addition to adopting the ensemble learning strategies,another way to overcome the sample imbalance problem is using oversampling algorithm or undersampling to change the sample proportion of different classes(Brun et al.,2018).Undersampling is a method to improve the sample imbalance by removing some majority class samples(Fan et al.,2017),while the method of oversampling is to generate some minority class samples to improve the sample imbalance(Viloria et al.,2020).When the total number of samples is small,the oversampling method may seem to be preferred,and the commonly used methods are the synthetic minority oversampling technique(SMOTE)(Chawla et al.,2002)and its improved algorithms(Panda,2017).
In this study,the stacking technique of ensemble learning is introduced.By taking support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF)and gradient boosting decision tree(GBDT)as the base classi fiers and the GBDT as the metaclassi fier,a stacking ensemble classi fier is proposed for real-time prediction of rock mass classi fication during TBM tunnelling process.A database is established based on the Songhua River diversion tunnel project in China,including 802-d TBM operation data and corresponding rock mass classi fication information.Through the data preprocessing and feature selection,a total of 7538 sample sets are obtained,and 10 crucial features are selected as the input features of classi fiers.Besides stacking ensemble classi fiers,seven individual classi fiers(i.e.SVM,KNN,RF,GBDT,decision tree(DT),multi-layers perception(MLP)and logistic regression(LR))are established,and the hyper-parameters of each classi fier are optimised by grid search method.Then,based on the randomly divided training set(90%)and test set(10%),the prediction performance of different classi fiers is evaluated,and the advantages of stacking ensemble classi fier over individual classi fiers are analysed.Additionally,the in fluence of sample imbalance on the prediction effect is discussed.
The ensemble learning classi fier is relative to the individual classi fier.By integrating multiple homogeneous or heterogeneous‘weak’classi fiers,the generalisation ability and robustness of an individual learner are effectively improved(Sun et al.,2020).Many studies have shown that the ensemble learning classi fier has better prediction performance than a single classi fier,and has been widely used in different problem scenarios(Díez-Pastor et al.,2015;Sun and Trevor,2018).Based on different integration strategies,ensemble learning can be divided into three algorithms:bagging,boosting and stacking(Polikar,2012).Bagging usually considers homogeneous weak learners trained independently and combined based on a speci fic deterministic averaging process(Breiman,1996).Boosting also considers homogeneous weak learners.It trains these weak learners sequentially in a highly adaptive way,and combines them based on speci fic deterministic strategies(Friedman,2001).Unlike Bagging and Boosting,Stacking considers heterogeneous weak learners,and it combines multiple classi fication models via a meta-learner(Wolpert,1992;Kardani et al.,2020).Fig.1 shows the principle of the stacking ensemble classification model.The stacking ensemble learning framework comprises two classi fiers,including base classi fiers(level-I)and metaclassi fier(level-II).Firstly,the original dataset is used to train the multiple base classi fiers.In the training process,in order to reduce the risk of over-fitting,thek-fold cross-validation(CV)method(Kohavi,1995)is generally used to train each base classi fier.Then,the output of the base classi fiers constitutes a new dataset,and the meta-classi fier is fitted based on the new dataset to obtain the final prediction results.The speci fic steps of the stacking algorithm are as follows:
(1)The original dataset is randomly divided into original training setDand original test setT.
(2)Each base classi fier is trained based onk-fold CV method.The original training setDis randomly divided intokequal parts(D1,D2,…,Dk).Take turns to use one part of them as the test set and the remainingk-1 parts as the training set.Thekis set as 5 in this study,which means repeating the above process 5 times.The combination of the prediction results of the base classi fiers is taken as the new training setD*of the meta-classi fier.
(3)Each base classi fier is used to predict the original testT,and the predicted results are averaged as the new test setT*of the meta-classi fier.
(4)Using the new training setD*and new test setT*to train and test the meta-classi fier,and the meta-classi fier outputs the final prediction results.
For stacking ensemble learning,selecting the appropriate base classi fiers and meta-classi fier is the key to ensure the prediction effect.In order to compare the prediction effect and generalisation ability of the stacking model,we select several commonly used classi fication models,including SVM,DT,KNN,RF and GBDT.Due to the advantages of mature theory and high ef ficiency,KNN and SVM are widely used and have good application effect in many fields(Liao and Vemuri,2002;Durgesh and Lekha,2010).RF and GBDT are tree-based algorithms based on bagging and boosting,respectively.RF can be trained in parallel,which signi ficantly improves computational ef ficiency.Moreover,the outputs of the RF model are determined by majority voting of all DTs(Breiman,2001).In comparison,the DTs of GBDT are generated serially.The output of GBDT is to add up the prediction results of all DTs or add them up weighted(Friedman,2001).From the perspective of bias and variance,RF mainly reduces error variance,and GBDT can reduce both bias and variance.Thus,a good combination of the two algorithms can ensure the effectiveness of the results.Therefore,SVM,KNN,RF and GBDT are used as the base classi fi ers in this study,and the GBDT is used as a meta-classi fi er to correct the bias of multiple classi fi cation algorithms to the training set.
2.2.1.Supportvectormachine(SVM)
SVM is a kind of machine learning method based on statistics theory,and it is often used to deal with classi fi cation problems(Vapnik,2000).For linear binary classi fi cation,assuming that the training set is(xi,yi)(i=1,2,…,n,y∈{-1,1}),the basic idea of SVMis to construct a separating hyperplane wTx+b=0,so that the samples of two different classes are separated,wherebis the function bias of separating hyperplane.The support vector is the sample points close to the separating hyperplane.The optimal separate hyperplane maximizes the distance between the support vector of two different classes of samples and the separate hyperplane(Srivastava and Bhambhu,2010).The problem of solving the optimal hyperplane is a constrained optimisation problem.By using the duality of Lagrange multipliers,it is transformed into the following optimisation problem:

whereαiandαjare the Lagrange coef fi cients,andCis the penalty coef fi cient.
The fi nal optimal classi fi cation function is as follows:

In linear indivisibility,SVMintroduces a kernel function to map the data samples from low dimensional space to high dimensional space by transforming the inner product function,making the highdimensional space linearly separable(Liu and Hou,2019).In this study,the radial basis function(RBF)is used as the kernel function.For the problem of multi-classi fi cation,SVM achieves the classi fication goal by combining several two classi fi ers.
2.2.2.K-nearestneighbor(KNN)
KNN is a classic and straightforward machine learning classi fi-
cation algorithm(Altman,1992).Assume that theis the vector to be classi fi ed.The basic principle of KNN is to fi rstly fi nd thekvectors which are most similar toin the sample space.Then count the most frequent class of thesekvectors,andis determined as this class.The similarity of two vectors is usually measured by their Euclidean distance,which can be calculated as follows:

whereDEis the Euclidian distance,is the sample vector,anddis the dimension of the samples.
The KNN algorithm mainly depends on the limited adjacent samples rather than identifying the class fi eld.Therefore,the KNN algorithm is more suitable for the sample set with more overlapping class fi elds(Imandoust and Bolandraftar,2013).
2.2.3.Randomforest(RF)
RF algorithm is a powerful supervised ensemble learning algorithm proposed by Breiman(2001).RF can be regarded as an improved bagging method,and it is developed based on DT theory(Zhou et al.,2017).The idea of RF is to use the bootstrap resampling method to extract multiple samples from the original samples,and construct a DT for each bootstrap sample.In a RF,each DT is randomly generated,and different DTs are independent of each other.For a classi fi cation problem,the fi nal classi fi cation results are determined based on the majority vote of all DTs.
In the RF algorithm,generation of DTs involving node split algorithms,including ID3,C4.5 and CART(Myles et al.,2004).In this study,the CARTalgorithm is used to construct the RF.CART uses the Gini index to measure the importance of feature attributes to realize node split.Suppose that sample setDcontainsTclasses andnfeatures(X1,X2,…,Xn),then the Gini index is as follows:

whereCtis the subset of samples belonging to classtin sample setD.After a split of nodek,the sample setDis divided intomparts(D1,D2,…,Dm)based on featureXj(j=1,2,…,n).The Gini indexGIkcan be expressed as follows:

2.2.4.Gradientboostingdecisiontree(GBDT)
GBDT is an iterative DT-based algorithm based on boosting strategy(Friedman,2001,2002).With its strong generalisation ability,GBDT is widely used to solve classi fi cation and regression problems.The CART-based DT is usually used for constructing GBDT,and the DTs are iteratively constructed(Wang et al.,2016).In each iteration,a new DT is generated,and the residuals of the previous DT are used to train the current DT.Also,in each iteration,the gradient descent method is used to increase the learning weight on the incorrectly predicted samples,so that the error of the model to the objective function is smaller than that in the previous iteration(Kuhn and Johnson,2013).The convergence condition of GBDT is that the model satis fi es the preset classi fi cation error or reaches the upper limit of the number of DTs.Finally,these trained DT classi fi ers are integrated into a robust classi fi er.
For the imbalanced sample set,there are usually two data processing methods:oversampling and undersampling.The main idea of oversampling and undersampling is to increase the number of minority class samples and remove part of the majority class samples,respectively.Through oversampling or undersampling,the number of different classes can become relative balance(Elrahman and Abraham,2013).However,unlike the professional fi elds such as natural language processing,which can easily obtain valuable massive data,there are relatively little valuable data for many problems in underground engineering fi elds.In this study,the number of valid TBM tunnelling data is relatively small.Therefore,it is not appropriate to remove the majority class samples by undersampling.Additionally,the sample number difference between different classes is relatively signi fi cant.Therefore,we use the SMOTE algorithm(Chawla et al.,2002)to process the original imbalanced training set.
SMOTE algorithm is a kind of oversampling technique for synthesising minority samples and can be regarded as an improved strategy of the random oversampling algorithm.Because random oversampling adopts the strategy of simply copying samples to increase the number of minority class samples,it is easy to produce the problem of over-fitting and reduce the generalisation ability of the classi fier.To overcome this,the basic principle of SMOTE algorithm is to analyse the minority samples and synthesise new samples based on the minority samples to add to the dataset.Fig.2 shows the schematic diagram of SMOTE oversampling.The speci fic steps of SMOTE oversampling are as follows:
(1)Suppose a minority class sample in the feature space(such as the blue ball in Fig.2).For the minority class samplexi,the Euclidean distances betweenxiand all other minority class samples are calculated to obtainknearest neighbors.Generally,kis taken as 5.
(2)Through the analysis of imbalanced samples,the sampling rateNis determined.For each minority class samplexi,several samples are randomly selected from itsknearest neighbors,assuming that one of the selected nearest neighbor samples is?xi.
(3)For the nearest neighbor sample?xiand the minority class samplexi,a new samplexnewis synthesised at a random point on their connecting line.The calculation formula is as follows:

where rand(0,1)represents a random number between 0 and 1.
In order to evaluate the prediction effect of the classi fiers,different evaluation metrics have been put forward or used in evaluating the performance of machine learning models(Luque et al.,2019;Zhou et al.,2019).For the imbalance of samples,the study selects six evaluation metrics:accuracy(ACC),precision(PRC),recall(REC),F1-score(F1),Cohen’s kappa coef ficient(Kappa)and area under the receiver operating characteristic(ROC)curve(AUC)to evaluate the prediction performance and select the best classi fier for rock mass classi fication.

Fig.3.Schematic diagram of the binary confusion matrix(taking the prediction of class II as the example).
ACCrepresents the proportion of correctly predicted samples to the total predicted samples,and theACCmetric is most widely used to evaluate the prediction performance of a classi fier.However,for imbalanced classi fication tasks,ACCis incapable of re flecting the performance of classi fiers.RECrepresents the proportion of correctly predicted samples of a certain class to all predicted samples of that class.PRCrepresents the proportion of correctly predicted samples of a certain class to the predicted samples of this class.It can be seen that there is a certain contradiction betweenPRCandREC,which re flects the discrimination ability of the model to positive samples and negative samples,respectively.F1is the composite metric ofRECandPRC,which eliminates the onesidedness of these two indices to a certain extent.These metrics can be calculated based on a confusion matrix,and the calculation formulae are as follows:


Fig.2.Schematic diagram of SMOTE oversampling.
whereTPis the true positive,which represents the number of samples that are actually of positive class and correctly predicted as the positive class by the classi fier;FNis the false negative,representing the number of samples that are actually of positive class but incorrectly predicted as the negative class;FPis the false positive,representing the number of samples that are actually of negative class but incorrectly predicted as the positive class;andTNis the true negative,representing the number of samples that are actually negative class and correctly predicted as negative class.Prediction of rock mass classi fication is a four-classi fication problem,and it can be regarded as four binary classi fication problems.To better understand the four symbols ofTP,FN,FPandTN,the schematic diagram of the binary confusion matrix(taking the prediction of class II as the example)is shown in Fig.3.
The above evaluation metrics(i.e.ACC,REC,PRCandF1)are suitable for solving binary classi fication problems.The rock mass classi fication can be regarded as the combination of four binary classi fication problems.Therefore,the four evaluation metrics can be used to evaluate the prediction effect for each class.For the total prediction effect of each classi fier,theACC_Totalcan also be calculated as the proportion of correctly predicted samples to total samples:

wherencorrectis the number of samples that are correctly classi fied,andntotalis the total number of samples.
While the other three metrics(i.e.REC_Total,PRC_TotalandF1_Total)can be calculated by the weighted macro-average across classes as follows:

whereRECiis the recall of classi,PRCiis the precision of classi,F1iis theF1-score of classi,wiis the proportion of the samples of classito the total samples.
Cohen’s kappa coef ficient(Kappa)is a robust metric that measures the proportion of correctly classi fied units after the probability of change agreement has been removed(Cohen,1960),which takes into account the probability that a pixel is classi fied by chance(Dong et al.,2013;Zhou et al.,2015,2016).Compared withACCmetric,Kappametric considers the sample imbalance to a certain extent.TheKappacoef ficient can be calculated as

whereP0is the sum of the number of correctly classi fied samples in each class divided by the total number of samples,i.e.the overall classi fication accuracy rate,ACC_Total;andPeis the expected proportion of samples correctly classi fied by chance.Assuming that the number of the real samples in each class isa1,a2,...,au,the number of the predicted samples of each class isb1,b2,...,bu,the total number of the classes isu,and the total number of samples isn,Pecan be calculated as

Table 1 shows the relative strength of agreement corresponding to theKappastatistic(Landis and Koch,1977).Kappa<0.4 is an indication of poor agreement,whileKappa≥0.4 is an indication of reasonable agreement.

Table 1Relative strength of agreement corresponding to Kappa value(Landis and Koch,1977).
TheAUCfrom the ROC curve is also a metric that can be used to evaluate the prediction accuracy of classi fiers(Bradley,1997).The ROC curve plots the true positive rate(TPR,i.e.recall)against the false positive rate(FPR=FP/(TN+FN)).The values ofAUCvary from 0.5 to 1,indicating the discrimination accuracy,which can be divided into five degrees(Bradley,1997;Zhou et al.,2019):not discrimination (0.5-0.6),poor discrimination (0.6-0.7),fair discrimination (0.7-0.8),good discrimination (0.8-0.9),and excellent discrimination(0.9-1).The ROC curve and theAUCvalue are usually used for evaluation of binary classi fiers.For the multiple classi fiers,the micro-average ROC curve and macro-averageAUCvalues are used as the evaluation metrics for the prediction performance.The micro-average ROC curve and its correspondingAUCvalue are obtained by stacking the results of all groups together,thus converting the multi-class classi fication into binary classi fication.The macro-average ROC curve and its correspondingAUCvalue are obtained by averaging all groups’results(one vs.rest),and linear interpolation was used between points of ROC curve(Wei et al.,2018).Compared with the micro-averageAUC,the macro-averageAUCis more in fluenced by the minority class samples.
In this study,taking the No.4 bid section of the Songhua River water conveyance project in China as the research object,the TBM operation database is established.Fig.4 shows the location of the study area of the Songhua River water conveyance project.The construction section is located between the Chalu River and Yinma River,and the total length of the tunnel is 22,955 m.During the construction process,the length excavated by TBM is about 20,198 m,accounting for about 88%,and the rest section is constructed using drill and blast method.The mileage of the study area is from K71+855 to K48+900,the elevation range is from 264 m to 484 m,and the buried depth is from 85 m to 260 m.The design shape of the diversion tunnel section is circular.The open TBMwith an excavation diameter of 8.03 m is used to excavate the tunnel.The main technical parameters of open TBM are listed in Table 2.
In the whole construction process of the TBM section,the operation data of TBM are collected once a second.From July 2015 to February 2018,a total of 802 d of TBM operation data were recorded.About 86,400 pieces of TBM operation data were collected every day,and 4.08 billion pieces of data were finally obtained to form the database.The actual performance of TBM equipment in different strata and operating conditions is recorded entirely.In the database,each piece of data contains 191 TB M machine parameters,time stamp information and the corresponding mileage.Fig.5 shows the TBMsystems and distribution of the acquisition parameters.Fig.6 shows the variation of four key TBMoperation parameters in a day.TBMtakes a tunnelling cycle as a working unit,which can be de fined as a process from TBM startup to shut-down.In a whole TBM tunnelling cycle,the operation parameters increase from zero to a stable value for continuous excavation and then decrease to zero.During this period,TBM advances a certain distance forward,and the footage of each tunnelling cycle is about 1.8 m.It can be seen from Fig.6 that there are 29 TB M tunnelling cycles on February 1,2016.

Fig.4.Location of the study area of the Songhua River water conveyance project.
In addition,according to the construction mileage,the lithology and rock mass classi fication along the tunnel are also recorded,as shown in Fig.7.The construction site mainly includes two types of lithology,i.e.granite and limestone,accounting for 41.62%and 58.38%,respectively.The mileage of the lithology boundary is K58+454.Based on the HC method(GB50487-2008,2008),the rock mass is classi fied into five classes,including I,II,III,IV and V,as shown in Table 3.In the HC method,the cumulative scoreTis used as the primary criterion for dividing the rock mass classes.The method comprehensively considers the factors of the ratings of rock strength,rock mass intactness degree,discontinuity conditions,groundwater condition and the attitude of the main discontinuity plane.Meanwhile,the strength-to-stress ratio,S,is also calculated below to account for the stress state effect on the surrounding rock:

Table 2Main technical parameters of open TBM.

Table 3Descriptions of rock mass classi fication as per GB50487-2008(2008).

whereRcis the UCS of intact saturated rock,Kvis the intactness index of rock mass,andσmis the maximum principal stress of surrounding rock.
In the study area,the proportions of rock mass class from II to V are 8.13%(419),66.74%(5555),20.03%(1439)and 5.1%(125),respectively.
The established database contains a large number of useless data.According to the variation law of TBM operation parameters,the raw data can be processed by constructing the state discriminant function(SDF)to remove the useless data and obtain the whole TBMtunnelling cycles(Wang et al.,2018).The SDF is written as

Through the above treatment,a total of 7525 TB M tunnelling cycles without useless data were obtained.Fig.8 shows a complete TBM tunnelling cycle and the selection of valuable data for classifiers.There is a strong correlation between TBM machine parameters and rock mass quality.The TBM tunnelling cycle can be divided into the rising phase and stable phase,as shown in Fig.8.The data of the stable phase can better re flect the rock mass quality of the construction area.By analysing the data,the duration of the rising phase is usually short and less than 5 min.The operational parameters near the end of a TBMtunnelling cycle may be unstable.Therefore,the data of the first 400 s and the last 300 s of each TBM tunnelling cycle should be removed,and the rest operational data are valid to be selected for classi fiers.However,the data in the selected area may have some outliers.In this section,the boxplot method based on the quartile and the interquartile ranges is used to eliminate the outliers(Carter et al.,2009).SupposeDsis the data point in the selected area,then the criterion for judging outliers is as follows:

whereQ3is the upper-quartile,Q1is the lower-quartile,IQRis the inter-quartile range,Lupperis the upper limit of non-outliers,andLloweris the lower limit of non-outliers.
Finally,the mean value of the rest operation data without outliers is calculated as the inputs of the classi fier to predict the rock mass classi fication.
Selection of appropriate input features has an essential impact on the prediction effect of the model.Different scholars have proposed many different feature selection methods(Kumar and Minz,2014).In this section,the Gini index in RF is used to carry out the feature selection.
The value of the Gini index is inversely proportional to the effect of node split.Therefore,the importance of features can be ranked by calculating mean decrease Gini(Shang et al.,2007).The variable importance measures(VIM)of a featureXjon nodekis as follows:

whereGIlandGIrare the Gini indices of the new left and right nodes after node split,respectively.
Then,the importance of featureXjon thei-th DT can be expressed as

whereKis the node collection in the RF.Suppose that there areNctrees in the RF,the importance of feature can be obtained as

The normalised result of the importance score of featureXjin the RF is finally obtained as


Fig.5.TBM systems and the acquisition parameters distribution.

Fig.6.Variation of four key TBM operation parameters in a day.v is the advance rate,RS is the cutterhead rotational speed,F is the total thrust,and Tc is the cutterhead torque.

Fig.7.Statistics of(a)lithology and(b)rock mass classi fication in the study area(unit:%).
The importance of feature finally calculated is the relative value,and the sum of theVIMvalues of all features is equal to 1.
On the other hand,if the two features are highly correlated,they have similar trends and may carry similar information.The existence of such features will degrade the performance of some classi fiers.Therefore,after sorting the features by the variable importance measures of RF,the highly correlated features are eliminated based on the Pearson correlation coef ficient,which can be calculated as Eq.(29).In this section,If the Pearson correlation coef ficient between the two feature is greater than 0.9,the two features are considered to be highly correlated,and we only keep one of them.


Based on the above data processing method,conducting the feature selection for 191 TB Moperation parameters,and finally,10 features are selected as the inputs of classi fiers,including cutterhead rotational speed(n),pitch angle of gripper shoes(Pags),gear sealing pressure(Gsp),pressure of gripper shoes(Pgs),output frequency of main drive motor(Ofdm),internal pump pressure(Ipp),penetration rate(Pr),control pump pressure(Cpp),torque penetration index(TPI),and roll position of gripper shoes(Rpgs).Fig.9 shows the normalised importance of the selected 10 features.The statistics of each selected features are shown in Table 4.The statistical indicators of each feature include minimum value(Min),maximum value(Max),mean value(Mean)and standard deviation(Std).

Table 4Statistics of the selected features.

Fig.8.A complete TBM tunnelling cycle and the selection of valid data for classi fiers.

Fig.9.Normalised importance of the selected 10 features.
The physical meanings of the 10 selected features or their relevance with the rock mass quality are as follows(Liu et al.,2021;Jing et al.,2019):nand Pr re flect the rock-breaking ef ficiency of TBM.Gspre flects the change of control valve and flowmeter for the TBM lubrication system caused by the variety of rock mass quality.Pgs,PagsandRpgsre flect the state of reaction force on the TBM when advancing under different rock mass qualities.CppandIppre flect the flow and pressure outputs of the TBM thrust hydraulic system,respectively.TPIis the cutterhead torque required to advance unit penetration rate,re flecting the rock mass boreability.Ofdmis a parameter of the TBM variable frequency drive motor,in fluencing the cutterhead rotational speed through the reduction ratio relationship.
In this section,we firstly establish seven individual classi fiers,including SVM,KNN,RF,GBDT,DT,LR,and MLP.Then,by using SVM,KNN,RF and GBDT as the base classi fiers,and the GBDT as the meta-classi fier,the stacking ensemble classi fier is established.Seven individual classi fiers are used for comparison with the stacking ensemble classi fier.Fig.10 shows the flowchart of the rock mass prediction for each classi fier.All classi fiers are implemented by the TensorFlow package in PyCharm using the Python language.Moreover,the training and testing of all classi fiers were processed by a computer with a CPU of Intel(R)Core(TM)i7-7700 K@4.20 GHz in a Windows environment.
According to Section 3,7538 TBMtunnelling cycles are obtained,and the ten features are selected as the inputs of the classi fier.Therock mass class is used as the outputs of the classi fier.In order to eliminate the in fluence of the data magnitude and dimension difference,it is necessary to carry out the data normalisation for each feature before the model training.In this section,theZ-score normalisation method is used to normalise the input features,making the mean value and the standard deviation of each feature to be 0 and 1,respectively.The calculation formula is as follows:

wherexis an input parameter,xZis the input parameter after normalisation,μis the mean value of the input samples,andσis the standard deviation of the input samples.
Since the input and output of the machine learning model should be numerical data,it is needed to carry out an encoding process for the labelled data.In this section,the one-hot encoding method(Potdar et al.,2017)is adopted,and the rock mass classes of II,III,IV and V are encoded as(1,0,0,0),(0,1,0,0),(0,0,1,0)and(0,0,0,1),respectively,as listed in Table 5.Then,the preprocessed dataset is divided into a training set and a test set using simple random sampling.There are 6784 samples in the training set and 754 samples in the test set,accounting for 90% and 10%,respectively.

Table 5One-hot encoding results of each rock mass class.
After the above data preparation,the stacking ensemble and individual classi fiers are established.The training set is used to construct 10-fold CV dataset,and the hyperparameter optimisation is carried out based on the 10-fold CV dataset.Finally,the test set is used to test each classi fier,and each classi fier is evaluated based on the evaluation metrics in Section 2.4.
Different machine learning models have different hyperparameters,which should be set before the model training.Hyper-parameters are the essential factor affecting the performance of machine learning models(Feurer and Hutter,2019).The hyper-parameters of different models that we mainly consider are as follows:
(1)For the SVM classi fier,the key hyper-parameters are the penalty coef ficientCand the RBF kernel coef ficientg.HyperparameterCre flects the tolerance of the SVM model to errors,and hyper-parametergdetermines the distribution of the data mapped to the new feature space.
(2)For the KNN classi fier,the key hyper-parameters are then_neighboursand theweights.Hyper-parametern-neighboursis the number of neighboring points when determining the sample classi fication,and it can be determined from 1 to 15.Hyper-parameterweightsis the distance-based voting weight of neighboring points,and it can be set as distance or uniform,considering the weight or not,respectively.
(3)For the DT classi fier,the key hyper-parameters are thecriterion,min_samples_splitandmin_samples_leaf.Hyperparametercriterionis the feature selection criterion of decision.Hyper-parametermin_samples_splitis the minimum number of samples required to split an internal node,and hyper-parametermin_samples_leafis the minimum number of samples required in a terminal node for a split to be valid.
(4)For the RF classi fier,the key hyper-parameters aremin_-samples_split,min_samples_leaf,n_estimators,max_depthandmax_features.Among them,hyper-parametersmin_samples_splitandmin_samples_leafhave the same meaning as DT classi fier.Hyper-parametern_estimatorsis the number of the DTs,hyper-parametermax_depthis the maximum depth of each DT,and hyper-parametermax_featureis the number of features randomly selected for each DT.

Fig.10.Flowchart of the rock mass prediction for each classi fier.
(5)For the GBDT classi fier,the key hyper-parameters arelearning_rate,n_estimators,max_depthandmax_features.Hyperparameterlearning_rateis the weight reduction coef ficient of each weak learner,and the other three hyper-parameters are the same as RF classi fier.
(6)For the LR classi fier,the key hyper-parameters aremax_iter,candsolver.Hyper-parametermax_iteris the maximum number of iteration,hyper-parametercis the reciprocal of the regularisation coef ficient,and hyper-parametersolverdetermines the optimisation method of a loss function.
(7)For the MLP classi fier,the key hyper-parameters arelearning_rate,max_iterandactivation.The meanings of the first two parameters are the same as mentioned above.Hyperparameteractivationis the activation function of neurons.
There are several commonly used methods for hyperparameters tuning,including the grid search method(Wistuba et al.,2015),metaheuristic algorithms(e.g.particle swarm optimisation(PSO),grey wolf optimisation(GWO),whale optimisation algorithm(WOA),moth flame optimisation(MFO),and multi-verse optimisation(MVO))(Zhou et al.,2021a,b),hold-out method,random search method,and leave-one-out method.(Kardani et al.,2020).In this section,the hyper-parameter optimisation is conducted for each classi fier based on 10-fold CV accuracy as the evaluation index.The optimal hyper-parameters of each classi fier are tuned by the grid search method.Table 6 shows the optimisation results of hyper-parameters.The hyper-parameters of stacking ensemble classi fier are set based on the optimisation results of corresponding individual classi fiers.The optimal hyper-parameters are used to set each classi fier before the model training.In addition to the optimised hyper-parameters,the other initialisation hyperparameters of each classi fier are set as the default value of each classi fier function in Scikit-learn libraries.
In this section,90%of the TBM operation data and corresponding rock mass class are used as the training set to train the established eight classi fiers.The remaining 10%of the data are used as the test set to test the trained classi fiers.In order to ensure the comparability among the classi fiers,all the classi fiers are established based on the same training and test sets,and the training and testing process of each classi fier is repeated 10 times to determine the model performance.The values of different evaluation metrics are obtained by calculating the mean values of 10 repeated tests.Table 7 lists the calculation time consumed of different classi fiers under optimal hyper-parameters.It can be seen that the training times of GBDT and stacking ensemble classi fiers are relatively long,which are up to 56.367 s and 68.295 s,respectively.The training time of the other six classi fiers is less than 3 s.However,the prediction time consumed of each classi fier for the test set is less than 0.3 s,which can be considered as‘real-time’prediction based on the trained classi fiers.

Table 6Optimisation results of hyper-parameters.

Table 7Calculation time consumed of different classi fiers under optimal hyper-parameters.
Fig.11 shows the prediction results of different classi fiers on the test set.As can be learned from the figures,the misclassi fication ratio of samples belonging to classes II and V is high,and the misclassi fication ratio of samples belonging to the other two classes is relatively low.The reason for the above phenomenon is that the sample set is imbalanced.For the total samples,trainging set samples and test set samples,the samples belonging to classes II and V are all less than 10%.Additionally,it can be easily seen that the proposed stacking ensemble classi fier has the best prediction performance on rock mass classi fication among all classi fiers,with fewest misclassi fication samples.

Fig.11.Prediction results of different classi fiers on test set:(a)SVM,(b)KNN,(c)RF,(d)GBDT,(e)DT,(f)LR,(g)MLP,and(h)Stacking.
In order to quantitatively analyse the prediction results and evaluate the performance of different classi fiers,the evaluation metrics proposed in Section 2.4 are used to evaluate each classifier’s performance,and the comparison between stacking ensemble classi fier and individual classi fiers is also analysed.Table 8 lists the evaluation metrics of each classi fier.Fig.12 shows the confusion matrix based on theRECof each classi fier.It can be seen from Table 8 and Fig.12 that:
(1)The stacking ensemble classi fier has the best prediction performance with the four highest evaluation metrics;the values ofACC_Total,Kappa,PRC_Total,REC_TotalandF1_Totalare 93.1%,0.823,0.93,0.931 and 0.928,respectively.The prediction performance of the stacking ensemble classi fier on different rock mass classes is also the best compared to other individual classi fiers.TakingRECas an example,theRECvalues of stacking ensemble classi fier for classes II,III,IV and V are 70%(30/754),98.4%(557/754),81.7%(153/754)and 57.1%(14/754),respectively.TheRECof stacking ensemble classi fier is higher than the other seven individual classi fiers.Especially for the class V,theRECvalues of the seven individual classi fiers are all less than 50%.However,the prediction performance of the stacking ensemble classi fier for class V is greatly improved withRECup to 57.1%.Also,the relativerelationship of the other two evaluation metrics(i.e.PRCandF1)among different classi fiers is similar toREC.The above analysis shows that the proposed stacking ensemble classifier has a powerful generalisation ability.
(2)Among the individual classi fiers,GBDT and RF show relatively good performance and generalisation ability than other individual classi fiers.In fact,the GBDT and RF also belong to the ensemble learning classi fiers,which combining multiple DTs in different ways.While in this study,GBDT and RF are used as the base classi fiers of stacking ensemble classi fier,thus they are regarded as the individual classi fiers.
(3)The prediction performance of SVM,KNN,DT and MLP are relatively poor,which have more misclassi fied samples than GBDT,RF and stacking ensemble classi fiers.Taking the prediction of class II as an example,theRECvalues of GBDT,RF and stacking ensemble classi fiers are 0.982,0.975 and 0.984,respectively.However,theRECvalues of SVM,KNN,DT and MLP are as low as 0.923-0.948,with more samples belonging to class III misclassi fied as class IV.Thus,among the four classi fiers(i.e.SVM,KNN,DT and MLP),the performance of the first three classi fiers is relatively good withACC_Totalof 87.1%-89%,while the performance of the MLP classi fier is relatively poor withACC_Totalof 81.3%.
(4)The LR classi fier has the worst prediction performance on rock mass classi fication,and its evaluation metrics are all the lowest among all established classi fiers.Additionally,it can be seen from Fig.12f that the LR classi fier cannot predict classes II and V.As a result,the samples belonging to class II are all misclassi fied as classes III(83.3%)and IV(16.7%).Moreover,the samples belonging to class V are also misclassi fied as classes III(14.3%)and IV(85.7%).The above analysis shows that the generalisation ability of the LR classi fier is inferior.
(5)As for theKappametric,theKappavalue of the stacking ensemble classi fier is 0.823,which means the strength of agreement is almost perfect according to Table 1.The strength of agreement of the LR classi fier is inferior,with theKappavalue of 0.38.The strength of agreement of the MLP classi fier is moderate,with theKappavalue of 0.502.In contrast,the strength of agreement of the other five classifiers are all good,with theKappavalue of 0.61-0.8.Additionally,it can be seen from Table 8 that there is a positive correlation betweenACCandKappa,and the value ofKappais smaller thanACC.TheACCandKappaof stacking ensemble classi fier are all greater than those of the other seven individual classi fiers,showing that the stacking technique can effectively improve the model performance.
Fig.13 shows the error histogram of different classi fiers.The error values of classi fication problems are discrete.Since the rock mass classes have four levels of II,III,IV and V in our study,the error of the established classi fiers is within the range of{-3,-2,-1,0,1,2,3}.The value of error represents the level difference between the actual and predicted classes.Among them,error=0 means that the predicted class is the same as the actual class,the positive error means that the level of the predicted class is higher than that of the actual class,and the negative error means that the level of the predicted class is lower than that of the actual class.As can be seen from Fig.13,the error value of each classi fier is less than 3.Moreover,the error values of the most misclassi fied samples are-1 and 1.For RF,GBDT and stacking ensemble classi fiers,there is only one sample with the absolute value of error reaching 2,and the errors of the rest of the misclassi fied samples are-1 and 1.However,the frequency with|error|=1 of stacking ensemble classi fier is less than that of RF and GBDT classi fiers.For other individual classi fiers,the frequency of samples with|error|=2 is more.It can be seen that the proposed stacking ensemble classi fier is more reasonable in the classi fication of rock mass,in which the misclassi fied samples are generally incorrectly predicted as the adjacent classes.
The ROC curves are also implemented to evaluate the prediction performance in rock mass classi fication.Fig.14 shows the ROC curves and correspondingAUCvalue of different classi fiers.Table 9 lists the micro-average and macro-averageAUCvalues of different classi fiers.It can be seen that:(1)The micro-average and macroaverageAUCvalues of the stacking ensemble classi fier are 0.989 and 0.98,respectively,which shows the best prediction performance among all the classi fiers.It is followed by GBDT classi fier(micro-averageAUC=0.985 and macro-averageAUC=0.969)and RF classi fier(micro-averageAUC=0.982 and macro-averageAUC=0.968).The SVM classi fier(micro-averageAUC=0.97 and macro-averageAUC=0.911)and KNN classi fier(micro-averageAUC=0.960 and macro-averageAUC=0.903)can be also considered as the good classi fiers.The prediction performance of DT classi fier (micro-averageAUC= 0.949 and macro-averageAUC=0.892)and MLP classi fier(micro-averageAUC=0.938 and macro-averageAUC=0.853)are relatively poor.The LR classi fier has the worst accuracy with micro-averageAUC=0.929 and macro-averageAUC=0.812.(2)Because the macro-averageAUCis more in fluenced by the minority class samples than macro-averageAUC,the difference between these two metrics can re flect the learning ability of minority to imbalanced data.As can be seen from Table 9,the difference value of stacking ensemble classi fier is the smallest,which shows better learning ability and improvement of the performance than individual classi fiers.

Fig.12.Confusion matrix based on REC of each classi fier:(a)SVM,(b)KNN,(c)RF,(d)GBDT,(e)DT,(f)LR,(g)MLP,and(h)Stacking.
Over-fitting is a common problem in the training process,which means the prediction performance on the training set is much higher than that on the test set(Cawley and Talbot,2010).It also indicates that the generalisation ability of the classi fier is poor.Fig.15 shows the relationship between the prediction accuracy of each classi fier on the training and test sets.It can be seen that all the data points fall above the linex=y,which represents that the prediction accuracy on the training set is higher than that on the test set.Generally,if the difference between prediction accuracy on the training set and the test set is too large(i.e.the data point in Fig.15 is far from the linex=y),over-fitting may occur in the training process.At present,there is no clear rule about how much difference of prediction accuracy between the training and test sets belongs to over-fitting.For most established classi fiers,the difference between prediction accuracy on the training and test sets is relatively small.The data point of the KNN classi fier falls above the linex=0.9y,while the data point of the other seven classi fiers all fall below the linex=0.9y.More speci fically,Table 10 lists the statistics of the prediction accuracy on the training set and the test set.In the first place,the difference of prediction accuracy on the training and test sets for the KNN classi fier is 11.6%,while those of other classi fiers are all less than 10%.In the second place,the ratio of prediction accuracy on the training set and the test set for the KNN classi fier is 0.88,while the ratios of other classi fiers are all less than 0.9.Additionally,for the LR classi fier,the prediction accuracy on the training set is close to that of the test set,and the data point in Fig.15 is close to linex=y.However,the prediction accuracy of the LR classi fier is low and cannot identify classes II and V.Therefore,the fitting effect of the LR classi fier for the training set is also poor.The above analysis shows that,except for KNN and LR classi fiers,the fitting effect of the other six classi fiers can be considered as good.

Fig.13.Error histogram of different classi fiers:(a)SVM,(b)KNN,(c)RF,(d)GBDT,(e)DT,(f)LR,(g)MLP,and(h)Stacking.
In general,through the analysis of various aspects,it can be concluded that the proposed stacking ensemble classi fier has good prediction performance and strong generalisation ability for rock mass classi fication.Therefore,it can be used to real-time and accurately predict the rock mass classes,which can help guide the adaptive adjustment for TBM in the tunnelling process.
In this section,the in fluence of the sample imbalance on the prediction effect is discussed.According to the above analysis,we can see that there are signi ficant differences in the number of samples with different rock mass classes,as shown in Fig.16.The overall ratio of the training and test sets is 6784/754(i.e.9/1),and the proportion of different classes in the training and test sets is also about 9/1.In the training set,the number of samples with different rock mass classes varies greatly,among which the number of samples of classes II,III,IV and V are 377,5004,1293 and 111,respectively.This may lead to a better fitting effect for samples of classes III and IV(or even over-fitting to a certain extent),and a worse fitting effect for the samples of classes II and V.Fig.17 shows the relationship between theRECvalue of each classi fier and the number of training samples.It can be seen that:
(1)TheRECvalues of different classi fiers on the test set positively correlate with the number of training samples.The more the samples in a certain class,the higher theRECvalue on the test set.
(2)With the increasing number of samples,theRECdifference of different classi fiers are gradually decreased.For the number of samples less than 500(i.e.classes V and II),theRECdifferences of different classi fiers are relatively signi ficant.For the number of samples between 1000 and 1500(i.e.class IV),theRECdifferences of different classi fiers are decreased to a certain extent.When the number of samples increases to about 5000(i.e.class V),theRECdifferences of different classi fiers become relatively small.
(3)For different rock mass classes,the prediction performance of the stacking ensemble classi fier is the best,which shows that the ensemble learning model has a more robust learning ability and generalisation ability than single classi fiers for small and imbalanced samples.Additionally,the relevant analysis in Section 4.4 also shows that the imbalance of samples impacts the classi fier’s prediction effect.
In this section,the SMOTE is used to process the imbalanced samples,making the number of samples of classes V and II increased to 1000,while keeping the number of samples of other classes unchanged.After oversampling the samples of classes V and II,a relatively balanced training set is obtained.The numbers of samples of classes II,III,IV and V in the relatively balanced training set are 1000,4998,1286 and 1000,respectively.The test set remains the same as before.Table 11 lists the statistics of the original imbalanced training set,the relativey balanced training and test sets.It can be seen that the sample proportion of different rock masses becomes more balanced after the SMOTE processing.Especially,the sample proportions of classes II and V are increased to 12.07%from 5.73%to 1.64%,respectively.
In this section,all the established classi fiers are trained and tested based on the dataset shown in Table 11.Table 12 presents the prediction accuracy of different classi fiers on the test set with the imbalanced training set and relatively balanced training set.Table 13 presents the difference of prediction accuracy between the training and test sets of different classi fiers with imbalanced training set and relatively balanced training set.After the SMOTE oversampling,the prediction accuracy of each classi fier is improved to a certain extent.Furthermore,the difference in prediction accuracy between the training and test sets is decreased.The results show that a more balanced training set is bene ficial for the learning process of classi fiers.Generally,the classi fier with a relatively balanced training set is prone to have the better fitting effect and prediction performance.

Table 9Micro-average and macro-average AUC values of different classi fiers.

Table 10Statistics of the prediction accuracy on training and test sets.

Table 11Statistics of original imbalanced training set,relatively balanced training set and test set.

Table 12Prediction accuracy of difference classi fiers on test set with imbalanced training set and relatively balanced training set.

Table 13Difference of prediction accuracy between training and test sets of difference classi fiers with imbalanced training set and relatively balanced training set.

Fig.14.ROC curves and corresponding AUC values of different classi fiers:(a)SVM,(b)KNN,(c)RF,(d)GBDT,(e)DT,(f)LR,(g)MLP,and(h)Stacking.

Fig.15.Corresponding relationship between the prediction accuracy of each classi fier on training and test sets.

Fig.16.Number of different rock mass classes in training set and test set.

Fig.17.Relationship between the REC value of each classi fier and the number of training samples.

Fig.18.Prediction results of stacking ensemble classi fier on the test set based on learning of the relatively balanced training set.
Taking stacking ensemble classi fier as an example,the in fluence of sample imbalance on the prediction performance of samples with different rock mass classes is analysed.Fig.18 shows the prediction results of the stacking ensemble classi fier on the test set based on learning of the relatively balanced training set.It can be seen that after learning the relatively balanced training set,although one sample belonging to class IV is incorrectly classi fied as class V,the prediction effects for minority class samples(i.e.classes II and V)are improved,and more samples of these two classes are correctly classi fied than before.More speci fically,Fig.19 shows the evaluation metrics comparison of the stacking ensemble classi fier with the imbalanced training set and relatively balanced training set.After the process of SMOTE oversampling for the training set,theRECvalue of class IV is slightly decreased from 0.817 to 0.81,and thePRCvalue of class IV remains unchanged.However,the values ofREC,PRCandF1for rock mass classes in other cases are improved to some extent.Especially,the values ofRECandF1for minority class samples are increased signi ficantly,in which theRECvalues of classes II and V are increased from 0.7 to 0.933,and from 0.571 to 0.714,respectively.Also theF1values of classes II and V are increased from 0.778 to 0.918,and from 0.727 to 0.833,respectively.The above analysis shows that the relatively balanced training set can effectively improve the prediction performance of the stacking ensemble classi fier to a certain extent.To sum up,for the machine models,the more balanced training set is more favorable.
In our study,the stacking technique of ensemble learning is utilised to establish the prediction model for rock mass classi fication,and the analysis results show that the stacking ensemble classi fier has stronger robust and generalisation ability than individual classi fiers.Moreover,through the machine learning algorithm,the mapping relationship between rock mass quality and critical operational parameters of TBM is established,which promotes the development of real-time prediction of rock mass classi fication during the TBM tunnelling process.Therefore,the methods of this study can be used in cases with similar construction conditions and TBM machine parameters.However,there are several limitations in our study,which can be summarised as follows:
(1)The inherent uncertainties of geological condition such as the joint/discontinuity properties,ground characteristics and localised stress states are not considered in the proposed models.
(2)The in fluence of the cutterhead wear is not considered in our models.The wear of the cutterhead will affect the state of rock breaking to a certain extent,which may in fluence the values of the TBM operational parameters under different rock mass classes.
(3)The machine learning models are established based on the assumption that the training and prediction samples are independent and identically distributed.However,different projects will have some differences in geology condition,TBMtype,construction design requirements,etc.,which will limit the applicability of the proposed classi fiers to actual projects.
This paper presents the real-time prediction of rock mass class based on the stacking ensemble classi fier and TBM operation big data.The stacking ensemble classi fiers are constructed using SVM,KNN,GBDT and RF as the base classi fiers and GBDT as the metaclassi fier.Through data processing,7538 TB M tunnelling cycles are obtained,and the mean value of the selected data without outliers in the stable phase is calculated as the input of classi fiers.Based on the tree-based feature selection and removing the highly correlated features,10 crucial features are selected to predict the rock mass classes.The dataset is divided into the training and test sets in the ratio of 9/1 using simple random sampling.Eight classi fiers are established,including SVM,KNN,GBDT,RF,DT,MLP,LR and stacking ensemble classi fiers.The grid search method is used to select the optimised hyper-parameters for each classi fier.All the classi fiers are trained by the training set,and the prediction performance of each classi fier is tested by the test set.The comparison between the stacking ensemble classi fier and other individual classi fiers is analysed.Moreover,the in fluence of sample imbalance in the training set is brie fly discussed.The speci fic conclusions are drawn as follows:
(1)Compared with the individual classi fiers,the proposed stacking ensemble classi fier has the better prediction performance,with the values ofACC_Total,Kappa,PRC_Total,REC_Total,F1_Total,micro-average and macro-average AUC equal to 93.1%,0.823,0.93,0.931,0.928,0.989 and 0.98,respectively.Also,the absolute error of the samples for stacking ensemble classi fier are less than 2,among which the absolute error of most samples is 0,and only a few samples have the absolute error of 1.Furthermore,except for the KNN and LR classi fiers,the fitting effect of the other six classi fiers can be considered as good.

Fig.19.Evaluation metrics comparison of stacking ensemble classi fier with imbalanced training set and relatively balanced training set:(a)PRC,(b)REC,and(c)F1.
(2)The stacking technique of ensemble learning can effectively improve the prediction performance of base classi fiers on rock mass classi fication.Especially for the minority class,the prediction effects of stacking ensemble classi fier are signi ficantly higher than that of individual classi fiers.Therefore,it shows that the ensemble learning model has a more powerful learning and generalisation ability than individual classi fiers for small and imbalanced samples.
(3)TheRECvalues of different classi fiers on the test set positively correlate with the number of training samples.The more the samples in a certain class,the higher theRECvalue on the test set.With the increasing number of samples,theRECdifference of different classi fiers is gradually decreased.
(4)After the SMOTE oversampling for the minority class samples,the overall prediction effects of each classi fier are improved to a certain extent.The difference in prediction accuracy between the training and test sets for each classi fier is decreased.Taking stacking ensemble classi fier as an example,after the SMOTE oversampling for the training set,although theRECvalue of class IV is slightly decreased from 0.817 to 0.81,the values ofRECandF1for minority class samples are signi ficantly increased.The results show that the classi fier with a relatively balanced training set is prone to have the better-fitting effect and prediction performance.
To sum up,the proposed stacking ensemble classi fier can be well used for the real-time prediction of rock mass classi fication.However,the sample imbalance is an existing problem,limiting the prediction effects on the minority class samples.Also,the in fluence of geological condition changes and the wear of TBMcutters are not fully considered.These unsolved problems and the model transfer learning methods will be further studied in our future work.
Declarationofcompetinginterest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to in fluence the work reported in this paper.
Acknowledgments
This study was funded by the National Natural Science Foundation of China(Grant No.41941019)and the State Key Laboratory of Hydroscience and Engineering(Grant No.2019-KY-03).Additionally,we sincerely thank the data support from the National Program on Key Basic Research Project of China(973 Program)(Grant No.2015CB058100),China Railway Engineering Equipment Group Corporation and the Survey and Design Institute of Water Conservancy of Jilin Province,China.
Journal of Rock Mechanics and Geotechnical Engineering2022年1期