A performance-based hybrid deep learning model for predicting TBM advance rate using Attention-ResNet-LSTM

2024-01-23 07:36:22SihoYuZixinZhngShuifengWngXinHungQinghuLei

Journal of Rock Mechanics and Geotechnical Engineering 2024年1期

Siho Yu ,Zixin Zhng,b,* ,Shuifeng Wng,b,** ,Xin Hung,b ,Qinghu Lei,c,d

a Department of Geotechnical Engineering,College of Civil Engineering,Tongji University,Shanghai,China

b Key Laboratory of Geotechnical and Underground Engineering,Ministry of Education,Tongji University,Shanghai,China

c Department of Earth Sciences,ETH Zürich,Zürich,Switzerland

d Department of Earth Sciences,Uppsala University,Uppsala,Sweden

Keywords: Tunnel boring machine (TBM)Advance rate Deep learning Attention-ResNet-LSTM Evolutionary polynomial regression

ABSTRACT The technology of tunnel boring machine (TBM) has been widely applied for underground construction worldwide;however,how to ensure the TBM tunneling process safe and efficient remains a major concern.Advance rate is a key parameter of TBM operation and reflects the TBM-ground interaction,for which a reliable prediction helps optimize the TBM performance.Here,we develop a hybrid neural network model,called Attention-ResNet-LSTM,for accurate prediction of the TBM advance rate.A database including geological properties and TBM operational parameters from the Yangtze River Natural Gas Pipeline Project is used to train and test this deep learning model.The evolutionary polynomial regression method is adopted to aid the selection of input parameters.The results of numerical experiments show that our Attention-ResNet-LSTM model outperforms other commonly-used intelligent models with a lower root mean square error and a lower mean absolute percentage error.Further,parametric analyses are conducted to explore the effects of the sequence length of historical data and the model architecture on the prediction accuracy.A correlation analysis between the input and output parameters is also implemented to provide guidance for adjusting relevant TBM operational parameters.The performance of our hybrid intelligent model is demonstrated in a case study of TBM tunneling through a complex ground with variable strata.Finally,data collected from the Baimang River Tunnel Project in Shenzhen of China are used to further test the generalization of our model.The results indicate that,compared to the conventional ResNet-LSTM model,our model has a better predictive capability for scenarios with unknown datasets due to its self-adaptive characteristic.

1.Introduction

With the increased exploration and utilization of subterranean space across the world,the technology of tunnel boring machine(TBM) has been widely applied in a wide range of underground construction projects due to its high mechanization level,great operational efficiency and low environmental impact(Yagiz,2017;Zhang et al.,2017,2020a,b,2022a;Gao et al.,2019;Liu et al.,2019).However,the performance of a TBM tends to fluctuate during the tunneling process because of the complicated TBM-ground interactions,such that the TBM driver has to constantly adjust the operational parameters based on empirical knowledge,which however may lead to large uncertainties and unpredictable risks(Rostami,2016;Cardu et al.,2021).Thus,in order to improve the safety and efficiency of TBM construction,it is essential to accurately and promptly predict the TBM performance (Yang et al.,2022;Zhang et al.,2022b).In general,the TBM performance is expressed by two key parameters,i.e.the advance rate(AR)and the penetration rate,where the latter is the former divided by the rotation speed of the cutterhead.The AR,which reflects TBMground interactions,often dominates the total construction time and cost,hence it is of central importance to robustly predict it(Elbaz et al.,2020;Wang et al.,2020;Fu and Zhang,2021).

In previous studies,many empirical (Yagiz,2008;Gong and Zhao,2009;Delisio et al.,2013;Fatemi et al.,2018) and theoretical(Farmer and Glossop,1980;Sanio,1985;Hughes,1986;Rostami and Ozdemir,1993;Rostami,1997)models have been developed to predict the TBM performance.Empirical models establish TBMground relationships through regression analysis of the available data from historical engineering projects.However,the applicability of such relationships to other site conditions and specific engineering projects is often plagued by large uncertainties.Theoretical models compute the TBM thrust by assuming that the interaction forces between the disc cutters and the excavation face obey a simplified linear or Gaussian distribution (Rostami and Ozdemir,1993;Rostami,2013;Labra et al.,2017).Such theoretical formulations are usually calibrated by cutting tests on a single stratum but can hardly consider the intricate TBM-ground interaction in mixed strata,limiting their applicability to real-world tunneling projects.

The recent rapid development in data collection and transmission technology enables tunnel engineers to gain access to a large amount of raw data generated from TBM operations,which presents great opportunities to apply advanced machine learning techniques for TBM performance prediction(Sheil et al.,2020).For example,Salimi et al.(2016) used two artificial intelligence-based methods,i.e.an adaptive neuro-fuzzy inference system and a support vector regression,to predict the AR of a TBM.Armaghani et al.(2018) estimated the TBM performance based on a gene expression programming model using the data from the Pahang-Selangor raw water transfer tunnel in Malaysia.Koopialipoor et al.(2019) developed a model based on deep neural networks for predicting the TBM penetration rate,which achieved a higher accuracy compared with conventional predictive models.Zhou et al.(2021) employed a hybrid model of extreme gradient boosting with Bayesian optimization to predict the AR of a TBM in hard rock based on a comprehensive compilation of 1286 datasets.Sun(2022) proposed a shield tunneling parameter matching model based on the support vector machine method and improved particle swarm algorithm to provide guidance for selecting optimal tunneling parameters.Wang et al.(2022) developed an online platform using extreme gradient boosting for estimating the penetration rate of TBM tunneling with a good accuracy obtained.Mahmoodzadeh et al.(2022) used a hybrid long short-term memory (LSTM) model enhanced by grey wolf optimization for predicting the TBM penetration rate based on 1125 datasets.Fu et al.(2023) designed a deep learning model combining a graph convolutional network and LSTM to predict the TBM vertical and horizontal deviations.Yu et al.(2023) established an attention mechanism-based dual-path ResNet(residual network)prediction model for an accurate estimation of the TBM utilization factor,defined as the ratio of the AR to the tunneling speed.Song et al.(2023) developed a new hybrid intelligent model named stacking framework for predicting the TBM performance with the aid of the whale optimization algorithm,which outperformed several other machine learning models and showed a stronger generalization capability.Wang et al.(2023a) established an ensemble model combining the XGBoost algorithm and a semi-theoretical model to predict the TBM penetration rate.Wang et al.(2023b) proposed a data-driven multi-step TBM attitude prediction model called convolutional gated-recurrent-united neural network,which can stably achieve a high accuracy within 21 steps.Among all these machine learning algorithms,deep learning methods have been increasingly used,due to their exceptional capabilities of extracting multi-dimensional and nonlinear features without a priori assumptions.An overview of previous studies using deep learning algorithms for TBM performance prediction is presented in Table 1.It is noted that most of these studies predict the advance rate or penetration rate using a conventional feed-forward neural network combined with some optimization algorithms like the particle swarm optimization method to overcome convergence problems during the training process.Among these models,LSTM(Hochreiter and Schmidhuber,1997)is usually employed to extract the long-term information embedded in the time-varying and long-lasting raw data recorded during TBM tunneling to improve the accuracy of TBM performance prediction (Fu and Zhang,2021;Wang et al.,2021;Guo et al.,2022).

Table 1Summary of previous work using deep learning algorithms for TBM performance prediction.

Table 2The searching range of hyper-parameters.

Table 3The optimal combination of hyper-parameters.

Existing deep learning algorithms,despite their wide application to TBM performance prediction,still face some limitations.First,the prediction accuracy tends to reduce when these models are applied to complex ground conditions or to datasets with significant noise.Furthermore,the model performance may be plagued by the so-called model degradation problem,i.e.the accuracy gets saturated and then degrades in a deep neural network(He et al.,2016;Qin et al.,2021;Shi et al.,2021).To overcome these shortcomings,this paper presents a novel hybrid deep learning model.In our model,LSTM is used to fully utilize the past information of TBM tunneling,such that time-dependent characteristics can be extracted to enhance the model performance.In addition,ResNet(He et al.,2016)is incorporated into our model to extract spatial characteristics associated with the nonlinear and complex tunneling environment.Meanwhile,the residual connection in ResNet can help address the degradation problem when more layers being added.Third,to improve the generalization of our model,the attention mechanism is introduced to adaptively generate self-modifying weights for variable geological conditions.Such an attention mechanism is one of the hot topics in the artificial intelligent research field and has been applied to many different problems,such as machine translation (Choi et al.,2018),action recognition (Tian et al.,2019),and text classification (Liu and Guo,2019).Recently,the attention mechanism has also been integrated into neural network models for TBM performance prediction because it can guide the model to focus on the most informative components of the task even if the environment alters as what usually occurs during TBM tunneling.For example,Pan et al.(2022)proposed an attention-based graph convolutional network for the prediction of penetration rate and energy consumption,which was superior to other machine learning algorithms;Chen et al.(2022)used a temporal pattern attention detection structure together with a temporal pattern attention mechanism module to predict TBM tunneling parameters,exhibiting a better performance than the baseline Transformer model.However,it remains unclear about the generalization of such an attention mechanism-based model,especially when dealing with a completely unseen dataset.

To comprehensively capture the complicated nonlinear and temporally-varying TBM characteristics during long-term tunneling process,we propose a performance-based hybrid model for real-time prediction of the TBM advance rate.In the model,we use ResNet and LSTM to process the spatial and temporal features of raw data,and obtain the adaptive weights for different inputs based on the attention mechanism.Additionally,the evolutionary polynomial regression (EPR) algorithm is adopted to determine the optimal combination of input parameters.The effectiveness of our model is verified based on the data of the Yangtze River Natural Gas Pipeline Project.Afterwards,the influences of the sequence length and model architecture on the prediction accuracy are analyzed.Furthermore,we also investigate the input-output correlations and analyze the application scope of our Attention-ResNet-LSTM model through a case study containing variable strata.Finally,we test the model based on another construction project,i.e.Baimang River Tunnel Project in Shenzhen,China,to evaluate the model’s generalization.

2.Methodology

2.1.Convolutional neural network (CNN) and residual network(ResNet)

A CNN,which has a strong capability in feature extraction,is composed of convolution layers and pooling layers.A local convolution operation on the input data is performed by the convolution layers and then a feature dimension reduction is realized by the pooling layers (Fig.1).The resulting transformations,also called feature maps,could reveal features that are decisive for the problem at hand(Xue and Li,2018;Kattenborn et al.,2021).A ResNet is a special CNN incorporating the technique of residual learning to solve the degradation problem of conventional deep CNNs (He et al.,2016).The basic unit or building block of a ResNet as shown in Fig.2 can be defined as

Fig.1.Schematic diagram of a convolutional neural network (CNN).

Fig.2.The building block of a residual network (ResNet) (after He et al.,2016).

where x and H(x) are the input and output vectors,respectively;and F(x)represents the residual mapping to be learned.Instead of building relationships directly between H(x) and x,a ResNet searches for a residual function F(x) which is realized through identity shortcut connections.In the case where the shallow part of a ResNet has been properly trained to fully extract the characteristics of the object,the residual mapping F(x)of the deep part of the network will be set to zero during the training process,which can avoid the degradation problem in a deep neural network.Therefore,a ResNet is used to extract features from inputs in our model.

2.2.LSTM

LSTM is a special RNN that has been commonly used in the area of natural language processing such as semantic analysis and machine translation.A traditional RNN structure is as shown in Fig.3.The value of each unit in the hidden layer at the current time step is determined jointly by the input at the current time step and the hidden unit at the previous time step,such that it can store historical information.However,the gradient of a RNN is prone to vanish or explode during the backpropagation process when the input sequence is too long.LSTM successfully solves this issue by incorporating memory cells,in which the concept of gate is used to process and save essential temporal information over extended time intervals(Hochreiter and Schmidhuber,1997).Fig.4 illustrates a typical structure of LSTM cells.The memory cell consists of three gates,i.e.forget gate,input gate and output gate.The forget gate regulates what information to be deserted,while the input gate and output gate determine what to be preserved in the current memory cell and what to be exported at each time step.The formulae for the three gates are as follows:

Fig.3.Schematic diagram of a conventional recurrent neural network (RNN).

Fig.4.A typical cell structure of LSTM.

Forget gate:

whereWf,Wi,WcandWoare the weight matrices;bf,bi,bcandbodenote the biases;sigmoid(x)=1/(1+e-x);tanh(x)=(e2x-1)/(e2x+1);the operation ⊙stands for the Hadamard product;andCtrespectively represent the candidate cell state and updated cell state at timet.These parameters remain unchanged in each LSTM cell,so that cells at different time steps share the same set of parameters.In our model,we use LSTM to further extract temporal information from multi-dimensional features.

2.3.Attention mechanism

The attention mechanism is inspired by the biological systems of humans that tend to quickly concentrate on the most distinctive and meaningful parts of a problem(Niu et al.,2021;Pan and Zhang,2022).Conventional neural networks usually get fixed weight matrices after the training and the matrices remain unchanged even when the network receives completely different inputs.Thus,conventional neural network models have difficulty in adapting themselves to fast changing environments (Irie et al.,2022).By incorporating the attention mechanism,a neural network could generate self-modifying weights and thus focus on the most relevant components of the input.Up to now,various attention mechanism methods have been proposed,and we utilize the channel attention and temporal attention mechanisms in the current paper.

2.3.1.Channel attention

As stated above,a ResNet can extract informative characteristics from multi-dimensional inputs through convolution operations,which is a process of fusing spatial and channel-wise information.It has been demonstrated that the performance of a network can be improved by explicitly modeling the interdependencies among the channels of feature maps.As a result,we add the channel attention mechanism into the ResNet to adaptively recalibrate channel-wise features,by assigning a weight parameter to each channel of feature maps.These weight parameters can still be trained through the backpropagation process.To achieve such operations,the squeeze-and-excitation block (Hu et al.,2018) is used.During the squeeze process,the global average pooling is utilized to generate channel-wise information.In the following excitation process,we use two fully-connected (FC) layers to capture channel dependencies and limit the complexity,where one FC layer with a reduction ratio reduces the dimension of the inputs and then the other FC layer returns to the original channel dimension.The final output of this block is obtained by applying a channel-wise multiplication between channel weights and feature maps generated from the ResNet(Hu et al.,2018).

2.3.2.Temporal attention

In a classic time series forecasting task using LSTM,one always directly takes the final hidden state as the output.However,the prediction accuracy will decrease as the length of input series increases (Bahdanau et al.,2015).To solve this problem,we add a temporal attention layer to the last LSTM layer so that all the hidden states can be considered.Since the hidden state at the last time step contains previous information,it is set as the standard state.We then compare all the hidden states with it and calculate their corresponding scores.The hidden state with a higher score will be assigned higher weights.Thus,the network can selectively emphasize the informative features of different inputs and have a good adaptability to both long-term and short-term input sequences.The computational process of the temporal attention layer is as follows:

wherehtis the hidden state at thetth time step,hTis the final hidden state at the last time stepT,atrepresents the attention weight that corresponds to the hidden state at thetth time step,andvTandWare the learnable matrix parameters.Here,the score function is calculated referring to Luong et al.(2015).hoin Eq.(10)is the final output of the temporal attention layer.

2.4.EPR

EPR is a hybrid regression method combining conventional numerical regression with genetic programming symbolic regression techniques,which can be used to describe correlations between multiple input and output variables (Giustolisi and Savic,2006).In our model,EPR is used as a feature selection algorithm that can be summarized into two steps.First,the genetic algorithm is applied to search for the symbolic expression of the polynomial,and the transformed variables can be expressed as

wherexirepresents theith input variable,kis the number of input variables,ESm×kis the exponential matrix obtained from the genetic algorithm,zjdenotes thejth transformed variable,mis the number of transformed variables which can be determined manually in advance.Furthermore,it should be noted that other optimization algorithms like particle swarm optimization can also be used instead of the genetic algorithm.

In the second step,the regression coefficient for each term of the polynomial is estimated by performing least-squares linear regression.The final EPR expression is derived as

whereyis the prediction result,ajdenotes the coefficient for thejth transformed variable,anda0is an optional bias term.

2.5.Evaluation parameters

To assess the performance of the prediction model,the mean absolute percentage error (MAPE) and the root mean square error(RMSE) are adopted as the evaluation parameters.Here,MAPE is scale-independent (actually there is also no need to take the scale of data into consideration).However,when the true value is close to 0,MAPE is likely to approach a meaningless infinity.RMSE is dependent on the scale of data,which is an indicator directly reflecting the prediction error.Thus,the combination of these two parameters can comprehensively quantify the model performance.Lower values of MAPE and RMSE indicate a higher prediction accuracy,where the discrepancy between predicted and measured values is smaller.The two evaluation parameters can be calculated as follows:

wherenis the total number of samples;andyiand ︿yiare the measured and predicted values,respectively.Moreover,the coefficient of determinationR2is also introduced for a more comprehensive assessment of models,and its calculation formula is as follows:

whereis the mean of samples.It is worth noting that we did not utilize this metric during the model optimization process,as it only serves as an additional evaluation indicator for comparing different prediction models.

2.6.A hybrid intelligent model

We develop a novel Attention-ResNet-LSTM deep learning model (Fig.5) by innovatively combining the different tools described in Sections 2.1-2.5.The model consists of a ResNet module,an LSTM module and two attention modules(viz.channel attention and temporal attention).The ResNet module contains convolution layers with a kernel size of 3,stride of 1 and padding of 1.The LSTM module contains LSTM layers and an FC layer used to adjust the dimension.In addition,a channel attention module and a temporal attention module are embedded into the ResNet and the LSTM,respectively.First,we use the ResNet to extract nonlinear features from input data.Second,the generated feature maps are fed into the channel attention module to adaptively capture channel dependencies;here,we use a channel-wise multiplication between feature maps and obtained channel weights.Afterwards,the weighted features are sent to the LSTM layer to obtain longterm and time-varying dependencies.The temporal attention layer is added here through whichhois obtained.Finally,we concatenatehoto the hidden state at the last time step(hTin Fig.5)and then send it to an FC layer as the final output.The numbers of ResNet blocks and LSTM layers(mandnin Fig.5,respectively)are hyper-parameters that need to be determined through further experiments.

Fig.5.Schematic diagram of the Attention-ResNet-LSTM model architecture.GAP and FC in the figure represent the global average pooling layer and the fully-connected layer,respectively.Conv1d is one-dimensional convolution layer whose kernel is convolved over a single dimension.The symbol ?denotes the channel-wise multiplication.

3.Database

3.1.Data source

The data used in this study were collected from the Yangtze River Natural Gas Pipeline Project.The pipeline passes beneath the Yangtze River between Nantong and Changshu of Jiangsu Province,China,with a crossing length of about 10.23 km,which is the key part of the entire project.A slurry pressure balance shield TBM was adopted to guarantee the face stability during the tunneling across the Yangze River under high water and earth pressures.The external and internal diameters of the tunnel are 7.6 m and 6.8 m,respectively.The segment thickness is 0.4 m and the width is 1.5 m.Fig.6 presents the geological profile of the study area.The TBM operating parameters were recorded in real time during the construction at a frequency of 1 Hz,such that in total 17.6 million records were documented.

Fig.6.Geological profile of the Yangtze River Natural Gas Pipeline Project.

3.2.Data preprocessing

The size of the original data is extremely large,containing numerous invalid data and noises that need to be excluded.In addition,there are large differences in the magnitude of the data,thus they cannot be directly used for model training.Therefore,preprocessing of the raw data is necessary before importing the data into the model (Xiao et al.,2022).

3.2.1.Data extraction

Tunneling parameters are automatically recorded per second(i.e.at a frequency of 1 Hz) through the data collection and transmission system.However,it is computationally expensive and unnecessary to use the entire raw data for training the prediction model.As a compromise,the raw data are first resampled at a time interval of 1 min (a comparative analysis indicates that the resampled data are sufficient to capture the key features of the raw data).In addition,a large number of empty data are generated during the downtime,which are also recorded in the original datasets.To build a clean database,we need to delete these empty data at the first step.A detection algorithm for empty data proposed by Zhang et al.(2020b) is used in our study,by computing

whereTHi,TORi,ARiandRSidenote the thrust force,torque,advance rate and rotation speed of the cutterhead at theith time step,respectively.If the value ofFiequals 0,we consider that the TBM machine is not in the working state,thus the corresponding data will be discarded from the original datasets.

3.2.2.Outliers detection

Not all the data obtained during the TBM tunneling are reliable.For example,some anomalous values may be generated due to external interference and/or sensor malfunction.In our study,the Mahalanobis distance is used as the criterion for detecting outliers(Mahalanobis,1936).Compared with the Euclidean distance,the Mahalanobis distance can get rid of correlations between variables,making it more suitable for outlier identification in the multivariate time series analysis.For a multivariate sequence x=（x1,x2,x3,…,xn）Twith a mean vector μ=（μ1,μ2,μ3,…,μn）Tand a covariance matrix S,the Mahalanobis distanceDMis calculated as

It can be seen from Eq.(17) that,when all the data are uncorrelated variables,the Mahalanobis distance is equal to the Euclidean distance and the covariance matrix S is a unit matrix.The 0.9-quantile ofDMis set as the threshold for outlier discrimination(Zhang et al.,2020a),i.e.any data with the Mahalanobis distance larger than this critical value are treated as outliers and then removed from the dataset.Fig.7 shows an example of outliers(marked by red circles) detected in the TBM advance rate data.

Fig.7.Example of outliers detected in the TBM advance rate data.

3.2.3.Data scaling and normalization

The data need to be then mapped to the range of 0-1 to eliminate the effects of data scale and to accelerate the model convergence.For a parameterx,the normalized value is calculated as follows:xnorm=(x-xmin)/(xmax-xmin),wherexmaxandxmindenote the maximum and minimum values of the variablex,respectively.The outputs of neural networks trained with normalized data are then transformed to the initial vector space.

3.3.Selection of input parameters

The datasets that have been preprocessed still contain more than 100 operating parameters.If we consider all of them as input parameters to predict the AR,both the complexity and calculation time of the model will increase dramatically.On the other hand,an insufficient number of inputs will make the prediction model difficult to fully learn the interrelationships between inputs and outputs,leading to a low prediction accuracy.Therefore,it is important to select proper input parameters before training a deep learning model.A combination of features having a strong correlation with the predicted label should be selected from the dataset as inputs to the neural network,which enables the network to have a good predictive performance while reducing the complexity as much as possible.Different methods have been developed and used for selecting input parameters.For example,Li et al.(2022)used the Pearson correlation coefficient to quantify the linear correlation between two variables,and then eliminated those with a correlation coefficient close to zero.Zhang et al.(2020a) used the grey relational grade (Deng,1982) to measure the degree of relevance between the variables according to the trend of their development,among which features with higher grades are selected as inputs.However,these methods can only measure the degree of correlation between one variable and another,i.e.with a one-to-one mapping.The deep learning network,however,is a many-to-one mapping model.Thus,it is essential to explore the correlations between the label and different plausible combinations of input parameters before the feature selection.

In this paper,EPR (see Section 2.4) is used to select input parameters due to its capability of quantifying correlations between multiple inputs and the output.The geological data represented by the modulus of compressibility (Es) and the characteristic value of bearing capacity (BC) are added into the database.The former reflects the deformation of the ground subject to loads and the latter is a strength parameter that reflects the strength characteristics and bearing capacity of the strata.There are multiple ways to determineBC.One of the most reliable and frequently used approaches is plate loading tests.BCis defined as the pressure corresponding to the deformation specified in the linear section of the soil pressure-deformation curve obtained by the loading test.ARis a representative parameter for TBM performance that cannot be directly adjusted by TBM operators.Instead,it is highly dependent on the TBM operational parameters which can be manually adjusted in the TBM control system(Liu et al.,2021).In our model,the rotation speed of the cutterhead(RS),thrust(TH),torque(TOR)and slurry pressure in the working chamber (SPW) are chosen as possible input features according to Wang et al.(2020) and Zhang et al.(2020b).In addition to these geological and operational parameters,the wear extent of TBM cutters may also have some effect on the tunneling performance because the specific energy for breaking rocks may increase if using worn cutters(Liu et al.,2017;Ren et al.,2018;Zhao et al.,2019;Karami et al.,2021).However,replacing cutters in time after a routine period of excavation can well eliminate such an impact.Here,we do not consider the effect of cutter wear onAR.The calculated results of different combinations usingEPRare shown in Fig.8.It can be seen that the combination ofRS,TOR,TH,BCandEsachieves the lowest value for the objective function within the least generations,meaning that it gives the minimum error with the least amount of time.Meanwhile,they are the parameters that can easily be adjusted by a TBM driver during the construction.The combination of TOR and TH also shows a strong correlation withAR.In the aspect of single variable,TH is the most significant factor that affects the value ofAR,which is also consistent with the engineering experience thatTHis generally the most dominant factor forAR.From the perspective ofEPR,we finally chooseRS,TOR,TH,BC,EsandARitself as input parameters of the proposed model.It should be noted that the combination of input parameters may be slightly altered if using data from a totally different project.However,based on massive tunneling experience,the dominant factors affecting the TBM performance have been revealed to be fixed,i.e.thrust,torque,cutterhead rotation speed and geological conditions(Fu and Zhang,2021;Liu et al.,2021;Pan et al.,2022).Thus,the above-determined combination for the Yangtze River Natural Gas Pipeline Project is considered to be transferrable to other TBM projects.

Fig.8.Results of EPR analysis.

4.Experiments and results

4.1.Dataset segmentation

First,we extract time series from the datasets as inputs for the prediction model.We use a sliding window with a stride of 1 to segment the datasets,and the length of the sliding window is set at 20,which can be optimized further as a hyper-parameter.The prediction target isARat thetth time step,which can be expressed as

where the functionfrepresents the mapping relationship fitted by the intelligent model.It should be noted that historical values ofARare also used as inputs to help the model fully exploit the historical information to increase the prediction accuracy.Then,the datasets are divided into training,validation and test sets with a ratio of 8:1:1 (see Appendix A regarding the selection of this ratio).The training set is used for model training,the validation set is used for optimization of the hyper-parameters,and the test set is used to evaluate the model performance.

4.2.Model establishment and training

All experiments were conducted using the PyTorch library on a PC with an AMD Ryzen 5800X CPU with 4.60 GHz,16 GB RAM and an NVIDIA GeForce RTX 3080 graphics card.The model was trained in a fully-supervised manner with an error backpropagation algorithm.The early stopping strategy was adopted during the training process to prevent overfitting.The random search method was used to determine the optimal hyper-parameter combination of the model,and the number of random searches was set at 100.In order to verify the performance of our Attention-ResNet-LSTM model directly,ResNet-LSTM,LSTM,GRU and RNN models were also created for comparison.It should be noted that the Adam optimization algorithm was used in all models.The searching range and optimal combination of hyper-parameters in each neural network are listed in Tables 2 and 3,respectively.

4.3.Results

The prediction results of each neural network on the test set are shown in Fig.9 with their performances presented in Table 4.In Fig.9,the left line graphs compare the measured and predicted values as a function of predicted point,the middle histograms display the discrepancy in the counts of the measured and predicted values at an interval of 1 mm/min,and the right scatter plots show the prediction accuracy compared with the perfect prediction line,i.e.the solid line with a slope of 1.It can be seen that the Attention-ResNet-LSTM model can accurately capture the variation of AR with the lowest RMSE and MAPE (1.31 mm/min and 1.52%,respectively),and the predicted values are basically in line with the measured ones(Fig.9a).When theARfluctuates locally,the model can still predict the extremum with high accuracy.In addition,the predicted values by the Attention-ResNet-LSTM model have the most matched distribution compared with the measured ones.TheRMSEof the ResNet-LSTM model,reaching 1.38 mm/min,is slightly larger than that of the Attention-ResNet-LSTM model.This ResNet-LSTM model also achieves a good prediction accuracy as shown in the scatter plot of the predicted and measuredARaround the line with a slope of 1.The histogram,however,shows that the ResNet-LSTM model predicts significantly more values around 55 mm/min compared to the real data(Fig.9b).The prediction performances of the LSTM and GRU models are worse than those of the above two models,with some of the predicted values significantly deviating from the measured AR values (Fig.9c and d).The line chart shows that the LSTM and GRU models are only sensitive to local peaks,and they perform poorly for non-peak values.Among all these prediction models,the RNN model has the lowest accuracy(Fig.9e),with the predicted values significantly larger than the actualARvalues.TheRMSEandMAPEof the RNN model are 2.82 mm/min and 4.69%,respectively,which are considerably larger than those of the other models.It should be noted that all the prediction models have poor performance for predictingARvalues below 40 mm/min.This section of data below 40 mm/min may be caused by the clogging or some anomalous operations of the TBM and thus is hard to precisely predict.

Fig.9.Predicted TBM advance rate using the test set by different prediction models: (a) Attention-ResNet-LSTM,(b) ResNet-LSTM,(c) LSTM,(d) GRU,and (e) RNN.

Table 4Performance assessment of different prediction models for predicting AR on the test set.

Additionally,the coefficient of determinationR2is also presented in Table 5.In general,the analysis usingR2is compatible with that using RMSE and MAPE,which serve as the evaluation parameters in the training procedure.However,it seems that theR2values for all models appear to be relatively low,especially for the RNN model,which may be explained by the following two reasons.First,the value ofR2is related to the mean value of samples,and thus the dispersion of the data has a significant effect on this indicator.In our dataset,most of the AR data lie in the range of 50-65 mm/min,which results in a lowR2.Second,there exists some anomalous data that were generated by sudden start/stoppage or improper operations,but the prediction models cannot provide perfectly-matched values at these points,as we have mentioned above.The offset therein leads to a lowR2.All the algorithms selected for comparison are commonly used in time-series forecasting (Cai et al.,2019;Abbasimehr et al.,2020;Qin et al.,2022)and are considered to be able to produce relevant results.We would like to emphasize that lower values of RMSE and MAPE,and a higherR2can generally provide a more accurate prediction for the same dataset.In summary,the Attention-ResNet-LSTM and ResNet-LSTM models have a better performance than other models on the test set.A further comparison of the two models will be presented in Section 5.

The training time of each epoch in Table 4 suggests that higher accuracy requires more calculation time,with the Attention-ResNet-LSTM model consuming 48.12 s per epoch,nearly 12 times that of the RNN.It is noteworthy that sometimes we may need to use the LSTM or GRU models for a quick estimation rather than using the Attention-ResNet-LSTM model for precise prediction for given limited computing resources.

In order to further investigate the characteristics of the error distribution of the Attention-ResNet-LSTM model,a violin plot and a histogram are drawn in Fig.10.It can be seen from the violin plot that the prediction errors are relatively high when theARis less than 40 mm/min or between 40 mm/min and 50 mm/min.Especially when it is below 40 mm/min,the mean prediction error for theARis close to 16 mm/min,and this also gives an explanation for the relatively lowR2values.In contrast,the errors generally range from 0 to 2 mm/min when the actualARvalue is larger than 50 mm/min.Fig.10b shows that the majority of measuredARdata are in the range of 50 mm/min to 65 mm/min,and only a small fraction of them is under 40 mm/min.That is to say,the result of error distribution on the test set is highly correlated to the distribution of the training data.Thus,it is expected that the prediction accuracy of the deep learning model can be further improved if more data less than 40 mm/min are available in the training dataset.

5.Discussion

5.1.Model application

We have demonstrated the good performance of the proposed prediction model in Section 4.3.However,when the model is applied to a real project,it should provide prediction results over a long term (say next 10-60 min) rather than only for the next minute.We use a recursive method to achieve such a multi-step prediction.Assuming that the predictedARat thetth time step was obtained according to Eq.(17),then the output can be imported into the model as input data to predict theARat the (t+1)th time step.We can achieve the long-term prediction of the TBM performance by repeating this process.To obtain the maximum time window up to which our model can predict,we conduct a longterm forecast experiment using 10 different sections of the data on the test set.The variation of RMSE as a function of the prediction time period with an interval of 5 min is plotted in Fig.11.The horizontal dashes and the square markers represent the 95% confidence intervals and the mean values,respectively,and the red dashed line links the mean values of all the cases.It can be seen that theRMSEgenerally increases with the prediction time period in the range from 5 min to 60 min.When the prediction period is smaller than about 25 min,the error increases at a relatively slow rate and maintains a small magnitude.However,when the prediction period reaches 25-35 min(the shaded area in Fig.11),theRMSEincreases significantly,indicating a rapid decrease of the prediction accuracy.The lengths of the vertical bars also increase beyond 25 min,which is a sign of instability of the prediction model.The reason might be that the prediction error accumulates during the recursive prediction,finally leading to an unreliable prediction after many time steps.To summarize,the proposed model can be used to predict the long-term TBM performance for up to about 25 min with relatively high accuracy,and the TBM drivers in practice can adjust relevant operational parameters (e.g.thrust and torque) during this time period to ensure a high tunneling efficiency.

Fig.11.Variation of the RMSE with the prediction time period for 10 sections of data in the long-term forecast experiment.The vertical bars represent the 95% confidence intervals.

5.2.Effect of sequence length

In Section 4.1,we set the value of sequence length at 20,i.e.a 20-min data sequence was considered as the input to the neural network.In this section,we will conduct a parametric study on the sequence length as it is an important parameter affecting the prediction result.A short historical sequence cannot provide enough information for the model,leading to a decrease in the prediction accuracy.On the other hand,the complexity of input features would increase dramatically if the sequence duration is too long,so that it is difficult for the model to extract valid characteristics.Consideringxtas a time series variable,the partial autocorrelation function analysis (Ghimire et al.,2019) is usually adopted to determine the degree of correlations betweenxtandxt-k.For measuredARsequence in the database,the partial autocorrelation coefficient is calculated in Fig.12.Data points falling outside the two blue lines mean that they have statistically significant correlations with the current valuext.Accordingly,the partial autocorrelation coefficient decays with the increase of the lag time,the value of which declines to around 0 at about 30-min lag and then remains basically unchanged beyond the 30-min lag.It can be inferred that historical values delaying more than 30-min lag have little effect on the instantAR,thus we set the sequence length to 10,15,20,25 and 30 min separately to explore their influences on the prediction accuracy.

Fig.12.Plot of partial autocorrelation function of AR series per minute.

As shown in Fig.13,the training time of each epoch increases with the sequence length.The required calculation time is 37.45 s when 10-min historical sequences are considered.However,it reaches 55.43 s at a 30-min interval.There is no obvious relevance between the prediction accuracy and the sequence length.The model with a sequence length of 20 min has the lowest RMSE on the test set at 1.308 mm/min,while the error increases rapidly with the decrease of the sequence length,probably because the model cannot obtain sufficient information from previous data.Thus,it might be appropriate to set the sequence length at 20 min to balance the prediction accuracy and calculation cost.

Fig.13.Impact of sequence length on the model performance and computational cost.

5.3.Effect of model structure

The numbers of layers of ResNet and LSTM might have certain impacts on the model performance.In order to obtain an optimal model structure,we will investigate the influences of these two key structural parameters in this section.

5.3.1.Number of ResNet layers

The relationship between the number of ResNet layers (basic blocks)and the model prediction accuracy is presented in Fig.14.As the number of ResNet layers increases,the training time per epoch rises linearly,from 48.12 s to 87.66 s.The prediction error of the model decreases slightly and then rises markedly with the number of ResNet layers.When there are 3 layers,the model has the highest prediction accuracy,with theRMSEbeing 1.305 mm/min,which is just slightly lower than the model with 2 ResNet layers(1.308 mm/min).We thus set the number of ResNet layers as 2 in our model to achieve a high prediction accuracy while keeping a low computational cost.

Fig.14.Impact of the number of ResNet layers on the model performance and computational cost.

5.3.2.Number of LSTM layers

A similar pattern is observed in Fig.15,where the calculation time increases linearly with the number of LSTM layers,from 48.12 s for single layer to 73.86 s for 4 layers.The number of LSTM layers has no significant effect on the model accuracy,with theRMSEvarying within a small range from 1.308 mm/min to 1.344 mm/min.It can therefore be seen that stacked LSTM layers occupy more calculation resources while having little impact on the model performance.Results from this parametric study show that single LSTM layer is enough for the proposed model.

Fig.15.Impact of the number of LSTM layers on the model performance and computational cost.

5.4.Correlation analysis between input and output parameters

Operators of TBM cannot directly change theARvalue during the construction;instead,they can only adjustARin an indirect way by altering different TBM operational parameters,e.g.RS,TORandTH.Thus,it is essential to investigate the correlations between input parameters and outputARin order to provide guidance for TBM tunneling operations.Otherwise,it would be hard for the operator to adjustARfinely even if an accurately predicted value is obtained by our model.According to the findings of Wang et al.(2020)and Fu and Zhang (2021),the input at the (t-1)th time step has the greatest impact on the predicted value ofARat thetth step,which is also confirmed in Fig.12.Therefore,we change the input parameters to be studied from the minimum value to the maximum value at the (t-1)th time step while all the other parameters remain unchanged.The relationships between the input and output parameters are presented in Fig.16.As shown in Fig.16a,ARexperiences a slight decrease at first and then maintains an upward trend with an increasingRS.Different patterns are observed inTORandTH(see Fig.16b and c): their increases are both accompanied by a decrease inAR.WhenTORexceeds 1500 kN m,the decreasing trend ofARbecomes slower.Meanwhile,the slope of theAR-THcurve reduces at the beginning and then increases afterTHreaches～3700 kN.The aforementioned conclusions are also consistent with the engineering experience,i.e.large values ofTORorTHusually indicate that the strata are hard to excavate,such that theARof the TBM decreases accordingly.

Fig.16.Relationships between the input parameters (a) RS,(b) TOR and (c) TH,and the output parameter AR.

5.5.Model performance for variable strata

During long-distance TBM tunneling,the ground condition might vary significantly,resulting in strong fluctuations in theAR.As shown in Fig.17,the stratum has a sudden change at around the 2140th predicted point,varying from stiff clay to soft clay,where the measuredARdecreases dramatically.In order to verify the model performance on the two types of strata,we use the proposed Attention-ResNet-LSTM and ResNet-LSTM models to predict theARwith corresponding error indicators presented in Table 5.Our Attention-based model has lowerRMSEandMAPEvalues for both stiff and soft clays compared to the baseline model.Meanwhile,the values of error indicators for stiff clay are slightly lower than those for soft clay.For example,theRMSEof the proposed model is 2.31 mm/min for stiff clay,while the value is 2.96 mm/min for soft clay.The reason might be that the proportion of data collected during tunneling in stiff clays occupies the majority in the training set.In terms of the overall error,RMSEandMAPEof the former model are 2.52 mm/min and 3.48%,respectively,lower than those of the latter one(2.79 mm/min and 4.39%).As shown in the inset of Fig.17,our model shows excellent performances on both stiff and soft clays,and it can even precisely capture the decreasing trend ofARat the transition point.The performance of ResNet-LSTM,however,is relatively poorer near the 2140th point.It can be inferred that our model is capable of adjusting weights properly according to the ever-changing input characteristics so that it has a better adaptability to varying geological conditions.

Fig.17.Prediction results of the Attention-ResNet-LSTM and ResNet models when the ground condition changes from stiff clay to soft clay.

5.6.Generalization capability

To examine the generalization capability of our Attention-ResNet-LSTM model,the prediction performance will be further explored on unknown datasets.New data were collected from the Baimang River Tunnel Project in Shenzhen,China.The tunnel is 3366 m in length including 2244 lining rings and is constructed by an earth pressure balance shield TBM(Xu et al.,2022).The external and internal diameters of the lining segments are 6.7 m and 6 m,respectively.The tunnel mainly crosses residual soils and slightly to highly weathered granite,as shown in Fig.18.It can be noted that the geological condition of the Baimang River Tunnel Project is completely different from that of the Yangtze River Natural Gas Pipeline Project.

Fig.18.Geological profile of the Baimang River Tunnel Project.

In order to test the model robustness comprehensively,we perform a series of experiments with a total of 0,100,1000 or 10,000 data records from the Baimang River Tunnel Project added into the original training set.The model is retrained based on the new training set if the number of newly added data records is not zero.The test set is comprised of data from the Baimang River Project and then the performance of the prediction model is examined using this dataset.It should be emphasized that adding 0 data from the Baimang River Tunnel Project means that only data from the Yangtze River Natural Gas Pipeline Project are used to predict the TBM performance in the Baimang River Tunnel Project.Here,the ResNet-LSTM is used as the baseline model.The test results are listed in Table 6.It can be observed that the prediction accuracy gradually increases with more data from the Baimang River Tunnel Project included in the training set,since the intelligent model can learn more information about the strata conditions and construction processes of the Baimang River Tunnel Project.Compared to the Attention-ResNet-LSTM,the performance of ResNet-LSTM is unstable and more sensitive to the number of new data,with RMSE varying from 30.08 mm/min to 6.13 mm/min.However,the Attention-ResNet-LSTM maintains a relatively low prediction error regardless of the amount of new data,indicating a high generalization capability.If the training set only consists of data from the Yangtze River Natural Gas Pipeline Project,the RMSE of the Attention-ResNet-LSTM is 10.93 mm/min on the new test set,much lower than that of the ResNet-LSTM.The error continually decreases with more data added into the training set,and the RMSE and MAPE become 5.68 mm/min and 17.35%,respectively,when extra 10,000 data records are included in the training set.To summarize,when being applied to a completely different project without additional data included in the training set,the performance of the Attention-ResNet-LSTM is significantly better than that of the ResNet-LSTM.Compared to the error indicators listed in Table 4,both RMSE and MAPE values show an increase due to dramatic changes in geological formations.The proposed Attention-ResNet-LSTM model is likely to provide better prediction results in such situations,because the model can adapt itself to different inputs due to the introduced attention mechanism.If we want to achieve the same performance by other prediction models,e.g.ResNet-LSTM,approximately 1000 training data from the new project need to be added for training.In the future,we will prepare a large database covering a variety of geological conditions to make the model well-trained and more generalized.Undoubtedly,the proposed Attention-ResNet-LSTM would greatly reduce the required amount of data as it can achieve satisfactory performance with limited data for training.

Table 6The performance of our prediction model for applying to the Baimang River Tunnel Project.

6.Conclusions

In this paper,a new hybrid intelligent model,Attention-ResNet-LSTM,was proposed for real-time TBM advance rate prediction.Considering the complicated characteristics of TBM tunneling,we incorporated attention mechanisms to obtain adaptive weights for varying inputs.This deep learning model also contains a ResNet module and an LSTM module,such that nonlinear spatial and temporal information can be well extracted.Data from the Yangtze River Natural Gas Pipeline Project were utilized to examine the model performance,which were preprocessed via data extraction,outlier detection and data normalization before being used for the experiments.The EPR algorithm was adopted to select the optimal combination of input parameters.The effects of sequence length and model structure as well as input-output correlations were investigated.A case study containing variable strata was conducted to test the capability of our model for handling complex ground conditions.Finally,we investigated the generalization capability of our model using another independent database from the Baimang River Tunnel Project in Shenzhen,China.The following conclusions are drawn:

(1) Through the EPR algorithm,the combination of rotation speed of cutterhead(RS),thrust(TH),torque(TOR),modulus of compressibility (Es) and characteristic value of bearing capacity(BC)was determined as the optimal combination of inputs.Among all of these parameters,THandTORexhibited strong correlations withAR.Moreover,the predictedARhad different dependencies on the operational parameters:ARdecreases slightly withRSwhenRSis lower than 0.7 r/min,beyond whichARincreases withRS;TORandTRare both negatively correlated withAR,consistent with previous engineering experience.

(2) The results of the partial autocorrelation function analysis showed that 20-min historical data sequences can achieve satisfactory performances in terms of both prediction accuracy and calculation cost.Similar parametric studies were conducted to explore the effect of model architecture,i.e.the numbers of ResNet layers and LSTM layers,on the model performance.The results suggested that two ResNet layers and one LSTM layer generate a good model architecture.The predicted values by our model fitted well with the actualARdata.We also showed that our model is superior to other intelligent models such as ResNet-LSTM,LSTM,GRU and RNN.The RMSE and MAPE of our model were 1.31 mm/min and 1.53%,respectively,both lower than those of other models.

(3) Our Attention-ResNet-LSTM model outperformed the ResNet-LSTM model when predicting theARof a TBM through variable strata conditions.We demonstrated this in a case study involving both stiff and soft clays,and the RMSE of the Attention-ResNet-LSTM model is lower than that of the ResNet-LSTM model (i.e.2.52 mm/min in comparison with 2.79 mm/min).The generalization of our model is also better than that of the ResNet-LSTM.When being applied to a different database,the Attention-ResNet-LSTM was shown to be robust and achieved a higher prediction accuracy regardless of the amount of new data in the training set.The ResNet-LSTM,however,might generate unacceptable errors when it is transferred to a completely new project with no data available beforehand.Thus,our model is able to selfadapt according to engineering and geological conditions instead of being case specific,because of the incorporation of attention mechanisms.In the future,with more data from different tunneling projects included,the Attention-ResNet-LSTM model is expected to give even better performances for new projects.Ultimately,with more precise input-output relationships established,TBM drivers would be able to adjust operational parameters to achieve safe and efficient tunneling.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The research was supported by the National Natural Science Foundation of China (Grant No.52008307) and the Shanghai Science and Technology Innovation Program (Grant No.19DZ1201004).The third author would like to acknowledge the funding by the China Postdoctoral Science Foundation (Grant No.2023M732670).

Appendix A.Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.jrmge.2023.06.010.

Journal of Rock Mechanics and Geotechnical Engineering2024年1期

Journal of Rock Mechanics and Geotechnical Engineering的其它文章: Fiber optic monitoring of an anti-slide pile in a retrogressive landslide; Prediction of high-embankment settlement combining joint denoising technique and enhanced GWO-ν-SVR method; Modelling the viscoplastic behaviour of Callovo-Oxfordian claystone with consideration of damage effect; Effect of drying-wetting cycles on pore characteristics and mechanical properties of enzyme-induced carbonate precipitation-reinforced sea sand; Modelling smear effect of vertical drains using a diameter reduction method; Spatiotemporal variations of sand hydraulic conductivity by microbial application methods