Bijn Afrsibin,Mosleh Eftekhri
a Department of Mining Engineering,Science and Research Branch,Islamic Azad University,Tehran,Iran
b Department of Mining Engineering,Faculty of Engineering,Tarbiat Modares University,Tehran,Iran
Keywords:Mode I fracture Toughness Critical stress intensity factor Linear multiple regression(LMR)Gene expression programming(GEP)
A B S T R A C T Prediction of mode I fracture toughness(KIC)of rock is of significant importance in rock engineering analyses.In this study,linear multiple regression(LMR)and gene expression programming(GEP)methods were used to provide a reliable relationship to determine mode I fracture toughness of rock.The presented model was developed based on 60 datasets taken from the previous literature.To predict fracture parameters,three mechanical parameters of rock mass including uniaxial compressive strength(UCS),Brazilian tensile strength(BTS),and elastic modulus(E)have been selected as the input parameters.A cluster of data was collected and divided into two random groups of training and testing datasets.Then,different statistical linear and artificial intelligence based nonlinear analyses were conducted on the training data to provide a reliable prediction model of KIC.These two predictive methods were then evaluated based on the testing data.To evaluate the efficiency of the proposed models for predicting the mode I fracture toughness of rock,various statistical indices including coefficient of determination(R2),root mean square error(RMSE),and mean absolute error(MAE)were utilized herein.In the case of testing datasets,the values of R2,RMSE,and MAE for the GEP model were 0.87,0.188,and 0.156,respectively,while they were 0.74,0.473,and 0.223,respectively,for the LMR model.The results indicated that the selected GEP model delivered superior performance with a higher R2 value and lower errors.
Fracture toughness,known as the critical stress intensity factor(SIF),represents the resistance of materials against crack development.It is one of the most essential parameters in fracture mechanics of solid materials(Talukdar et al.,2018;Mu?oz-Ibá?ez et al.,2020).Fracture toughness is an intrinsic characteristic of materials,which is defined as the energy required for creating a new surface in materials(Alkilicgil,2010).Three types of loading fractures(Anderson,2017)are illustrated generally in Fig.1,including mode I(opening/tensile mode),mode II(in-plane shear mode),and mode III(out-of-plane shear mode).The mode I is a pure tensile opening state that occurs more often since the tensile strength of rocks is usually lower than their shear strength and compressive strength.In mode I,when the SIF in the fracture tip exceeds the fracture toughness(i.e.KI≥KIC),the fracture initiates and propagates untilKIis less thanKIC(Guo et al.,1993;Backers,2004).

Fig.1.Three modes of loading that can be applied to fractures(Anderson,2017).
Accurate measurement of mode I fracture toughness(KIC)is of great importance in rock mechanics applications such as slope stability analysis,tunnel excavation,blasting and rock fragmentation,and hydraulic fracturing(Whittaker et al.,1992;Roy et al.,2018;Feng et al.,2020).According to the literature,the mode I rock fracture toughness could be considered as(i)A parameter to categorize rock materials;(ii)A fragmentation index;and(iii)The material property for stability analysis and modeling(Franklin et al.,1988).Scholars have suggested various methods.For example,Whittaker et al.(1992)proposed four categories of methods,including laboratory,numerical,analytical,and estimation methods,to determine the SIF.Laboratory methods adopt various approaches so that the fracture toughness is determined from the loading of the notched specimen.The notch could be made straight or in the V-shape(Anderson,2017).The American Society for Testing and Materials(ASTM E399-90,1997)introduced the first standard laboratory method to determine the fracture toughness of metallic materials.According to the ASTM suggestion,these methods were developed for core-based rock specimens as well.The rock samples in fracture toughness tests are core-based and can be divided into three groups according to their geometries including(i)cylindrical configurations,(ii)disc configurations,and(iii)half-disc configurations.Previous literature of the first group(i.e.cylindrical specimens)has used the straight edge cracked round bar bend method.In this regard,the International Society for Rock Mechanics(ISRM)introduced two methods of chevron bend and short rod(Ouchterlony,1988).These two methods were not well received due to their limitations.The main limitation of these two methods was that the setting-up of the equipment and the preparation of the specimens were complex.The methods of the second group(i.e.disc specimens)are the Brazilian disc method(Guo et al.,1993),the cracked straight through Brazilian disc(Atkinson et al.,1982),the flattened Brazilian disc(Wang and Xing,1999),the double edge cracked Brazilian disc(Chen et al.,2001),the hollow center cracked disc(Amrollahi et al.,2011),the holed-cracked flattened Brazilian disc(Tang et al.,1996),the holed-flattened Brazilian disc(Yang et al.,1997),the modified ring(Thiercelin and Roegiers,1988),and the ISRM-suggested cracked chevron notched Brazilian disc(Xu and Fowell,1994).The third group of methods(i.e.half-disc configuration)involves the ISRM proposed the semi-circular bend specimen(Zhou et al.,2012)and also the cracked chevron notched semi-circular bend(Kuruppu,1997).These methods can be used for measuring both static and dynamic rock fracture toughnesses(Zhou et al.,2012;Kuruppu et al.,2014;Wang et al.,2021).Although a wide variety of direct experimental methods have been developed to measure the mode I fracture toughness,direct measurement of fracture toughness is comparatively more difficult than some other standard laboratory rock properties such as compressive strength and tensile strength(Chen et al.,1997;Kahraman and Altindag,2004;Ke et al.,2008).In addition,after the experiments,most of the direct methods need calculation of the dimensionless SIF using analytical,numerical,semi-analytical,or semi-numerical methods(Chen and Zhang,2004).Due to the difficulty in apparatus arrangements and sample preparation as well as relatively time-consuming activities for measuring fracture toughness,many researchers have attempted to propose indirect methods based on the relationship between fracture toughness and other rock properties,e.g.density,porosity,uniaxial compressive strength(UCS),Brazilian tensile strength(BTS),and elastic modulus(E).In this case,the fracture toughness value can be estimated from these relations in ordinary rock mechanics laboratories,which are not equipped for determining fracture toughness(Kahraman and Altindag,2004).
Many empirical relationships between fracture toughness and physical and/or mechanical properties of rocks have been proposed,in addition to linear and nonlinear relationships.Some researchers(e.g.Bearman,1991;Brown and Reddish,1997;Chen et al.,1997;Chang et al.,2002;Wang et al.,2007;Roy et al.,2017,2018;Liu et al.,2020)have made some achievements on the correlation between fracture toughness and physical properties such as density,porosity,and P-wave and S-wave velocities.Some studies have also been performed to investigate and develop the correlations between the mechanical properties of rocks and mode I fracture toughness.Gunsallus and Kulhawy(1984)proposed three linear relationships between the basic rock mechanics tests including UCS,BTS,point load,and the short rod rock fracture toughness tests,respectively.Ghose and Biswas(1992)conducted UCS,BTS,National Coal Board(NCB)cone indenter index,and point load tests of hard rock(fine-and medium-grained sandstones).They used the chevron bend specimen method,and the single edge notched round bar method to determine mode I fracture toughness values.By conducting double torsion and UCS tests,they also developed significant linear correlations between UCS and fracture toughness.A number of studies(e.g.Gunsallus and Kulhawy,1984;Whittaker et al.,1992;Zhang,2002;Wang et al.,2007;Deng et al.,2012;Zhou et al.,2012;Vavro and Souˇcek,2013)investigated the linear correlations between fracture toughness and tensile strength.Zhang et al.(1998)introduced a nonlinear empirical relationship between UCS and mode I fracture toughness of marble,gabbro,and granite rocks.Ouchterlony(1991)suggested a nonlinear relationship between mode I rock fracture toughness and elastic modulus.Some nonlinear relationships between mode I fracture toughness values and brittleness index were also developed(Kahraman and Altindag,2004;Saeidi et al.,2012).Unfortunately,most of the studies were performed based on the limited rock types,and the prediction equations were proposed by considering only one effective physical or mechanical parameter.A combination of the most effective mechanical parameters is found to have a close relationship with fracture toughness(Gunsallus and Kulhawy,1984;Huang and Wang,1985;Wang et al.,2007;Vavro and Souˇcek,2013 Nejati and Moosavi,2017).
Over the past two decades,artificial intelligence(AI)approaches have become a new paradigm in scientific studies and engineering applications(Liu et al.,2021;Wang et al.,2021).Some of the previous studies used soft computing methods,including artificial neural networks,fuzzy inference system,adaptive neuro-fuzzy inference system,and random regression forest,to develop relations for predicting mode I fracture toughness based on a variety of geomechanical rock properties(Roy et al.,2018;Wang et al.,2021).It should be noted that the employed soft computing methods of the previous studies could not provide a specific prediction equation.Gene expression programming(GEP)is a tool that can be employed as a practical technique in solving complex problems because of its accuracy.The GEP can anticipate output by establishing a link between input parameters.This is the advantage of GEP compared to other soft computing techniques such as artificial neural networks(Faradonbeh et al.,2016a,b).
The main aim of this paper is to predict the mode I fracture toughness based on the basic rock mechanics parameters.The statistical linear multiple regression(LMR)and GEP were adopted to predict mode I fracture toughness of rock based on available rock mechanics data sets.It is worth noting that these methods can measure the mode I rock fracture toughness easily by avoiding the limitations of the conventional laboratory rock fracture toughness tests(i.e.setting up the equipment and complexity of specimen preparation)(Chen et al.,1997;Jin et al.,2011).
The framework of this paper is arranged as follows:Section 2 describes the theory and the process of modeling in GEP;Section 3 explains the main details of the used datasets;Section 4 presents the prediction of the mode I fracture toughness in both used methods.In section 5,the results using different methods are presented,and the performances of different models are compared,and finally,the conclusions of this paper are presented in section 6.

Fig.2.A simple chromosomes structure:(a)K-expression;and(b)ET(Faradonbeh et al.,2016b).
A genetic algorithm is a branch of evolutionary algorithms.Two modified versions,including genetic programming(GP)and GEP,have been introduced(Ferreira,2001;Armaghani et al.,2018).GEP developed by Ferreira(2001)is among the circulating algorithms.GEP combines the advantages of both its predecessor and removes their limitations.The most important feature of this method is utilizing chromosomes that can express and create strings in the form of functions and mathematical relations.GEP is a complete genotype/phenotype system which contains the simplicity of genetic algorithm and the abilities of genetic programming(Muzzammil et al.,2015;Faradonbeh et al.,2016b).In GEP,genes are subsequently expressed as phenotypes in the form of expression tree(ET)(Fig.2).Similar to other AI methods,the process begins with the random generation of an initial population consisting of individual fixed-length chromosomes.Chromosomes can contain one or more genes.Each individual chromosome in the initial population is then expressed,and the fitness can be evaluated by utilizing one of the fitness function equations.The most fitted chromosomes have more chances of selection for passage to the next generation.After selection,these are reproduced with some modifications performed by the genetic operators(Ferreira,2001).Using the two main elements of ET and chromosome,GEP compensates for the limitations of the other two methods and is capable of programming nonlinear and dynamic processes.In the GEP,the solutions are coded in the form of binary(0 and 1)fixedlength linear strings(similar to genetic algorithm)that comply with the Karva programming language(Karva is a symbolic language for chromosome introduction).These coded strings are chromosomes and can be expressed as ETs(Ferreira,2001).Among all the evolutionary algorithms,only GP and GEP can develop mathematical relationships for the independent variable of the problem.Hence,these models have gained more attention from researchers over the past decade,and are being widely used in various fields.
The step-by-step process of solving a problem using GEP comprises five main stages:
(1)Selection of the terminal set(i.e.the independent variables of the problem and the variables of the system status)should be based on the conditions of the problem.Some of the most conventional fitness functions include root mean square error(RMSE),mean absolute error(MAE),and root relative square error(RSE);
(2)Selection of the set of functions includes mathematical operators,nonlinear functions,and Boolean operators.In the GEP,a random combination of input variables and arithmetic operators is considered as the terminal set and functions sets,respectively;
(3)Selection of the model accuracy measurement index is based on the ability of the model to solve a specific problem;
(4)Selection of control components,including determination of numerical component values and qualitative variables used to control the execution of the program.The numbers of training data,testing data,chromosomes,genes,the head size,and selection of the linking function that is adjustable using the four options of addition,subtraction,multiplication,and division,are determined at this stage;and
(5)Determination of the program stopping criteria(i.e.the maximum iteration or the favorite fitness value).Fig.3 indicates a schematic view of the GEP flowchart used in this study.
In multi-genic chromosomes,each gene expresses a sub-ET including a head that can contain any terminal function or symbol and a tail that can only contain terminal symbols.These parts are regions where genetic operators are applied so that new and improved solutions are created.The GEP modeling procedure begins with the random generation of chromosomes for nominal numbers complying with the Karva language.Then,these symbolic chromosomes need to be defined as trees of various forms and sizes.Afterward,their fitness is evaluated using the fitness functions(Armaghani et al.,2018).Next,if the stopping criteria(e.g.maximum iterations or the desired fitness value)are not achieved,the best chromosomes from the first generation are chosen using the roulette-wheel method and are copied for the next generation(Faradonbeh et al.,2016b).Afterward,the major genetic operators like mutation,translocation(insertion sequence elements,root insertion sequence elements,and gene translocation),and recombination(single point,double point,and gene recombination)are applied on the chromosome based on the ratios specified by the GEP designer.These GEP operators will be described in Section 4.2.As a result,the remaining chromosomes are changed into new chromosomes.This process continues until the stopping conditions(maximum number of generations or achieving the suitable solution)are reached(Ferreira,2001;Keshavarz and Mehramiri,2015;Faradonbeh et al.,2016a;Khandelwal et al.,2016,2017;Monjezi et al.,2016;Armaghani et al.,2018).

Fig.3.The flowchart of GEP.
In this paper,a total of 60 datasets(see Table 1)were collected from various published studies.These datasets were available for various rock specimens in terms of mode I rock fracture toughness as well as the independent parameters of UCS,BTS,and elastic modulus(E).The available datasets were employed to build several models using a variety of techniques aiming to select the most accurate model.Accordingly,the parameters of BTS,UCS,and elastic modulus(E)were considered as the model inputs.These parameters were selected due to their significant influence on mode I fracture toughness of rock and the fact that it was easy to measure them.
The present study used the LMR method to compose a linear multiple relationship for predicting mode I rock fracture toughness.Multiple regression analysis is used when the value of a specific variable can be predicted based on the values of other variables.Eq.(1)indicates the general LMR model relationship for(n)independent variables.

whereYrepresents the dependent variable,X1toXnindicate independent variables,β1to βnrepresent regression coefficients,and ε is the random constant.
In this study,statistical analysis software package SPSS(version.24)was used to perform the analyses(SPSS Inc.,2018).For the linear regression analysis,the stepwise method was applied,and the predicted independent variables were introduced one by one into the equation and removed from the regression when they did not play a substantial role in it.Furthermore,a significance level(Sig.or P-value)of 0.05 was considered.The term significance level is used to refer to a pre-chosen probability.The significance level is a statistical cutoff term denoted by a P-value defined as the probability of rejecting the null hypothesis testing when it is true(Kutner et al.,2005).In other words,the significance level for a given hypothesis test is a value for which a P-value less than or equal to is considered statistically significant.The P-values range from 0 to 1.The smaller the P-value,the stronger the evidence that the null hypothesis is rejected.In this regard,if the P-value is smaller than the significance level(i.e.P-value≤0.05),there is a relation between the correlated parameters(Yagiz,2011).The univariate,bivariate,and trivariate equations were made up of independent parameters(UCS,BTS,andE)and dependent(KIC)parameters.In the first step,the univariate relationships between input parameters as well as the relationship between each of them and the dependent parameter were evaluated.Fig.4 demonstrates the determination coefficient(R2)of input parameters with the dependent parameter.Besides,Table 2 presents the univariate correlation(R)between dependent and independent parameters.The correlation coefficient(R)is used as the main index to illustrate the influence of independent parameters on one another or the dependent variable.Table 2 indicates that there is a strong correlation between variables.Accordingly,the highest correlation(0.88)was obtained between the parameters of UCS and BTS.In the second and third steps of the regression analysis,bivariate and trivariate multiple regressions between input variables andKICwere evaluated.Table 3 indicates the results of the initial analysis from LMR models.

Table 1The collected database to conduct the analyses in the present study.
In Table 3,the highest correlation and the lowest standard error are obtained for Model 4(i.e.the parameters of UCS,BTS,and E).Given the multiple regression method and its assumptions,the presence of independence among descriptive variables is essential.Multicollinearity refers to as a condition in which there are linear relationships between several variables and they could be written as a linear combination of one another(SPSS Inc.,2018).
The coefficients obtained through a model will not be valid in the case of multicollinearity since the influence of each descriptive variable on the response variable in the model includes the influences of other variables as well.Therefore,the variance of the estimation of regression coefficient would increase,and the model would practically have a large error.Thus,a slight change in the data used in the model would result in a significant change in the regression coefficients(SPSS Inc.,2018).The present study has used the criteria of tolerance and variance inflation factor(VIF)to detect multicollinearity between independent variables of the regression model.Generally,the lower the VIF and the higher tolerance,the lower the probability of collinearity between independent variables.Tolerance values lower than 0.1 or VIF values higher than 10 indicates collinearity between independent variables.In such cases,the computational relationship between these two indices would be as follows(Shrestha,2020):

whereRj2is the determination coefficient of the regression model on thejth random variable as the response variable with other random variables as independent variables.
Table 4 indicates the values obtained for the VIF and tolerance of independent parameters from multicollinearity analysis.The analysis results indicated no significant collinearity between input parameters.On the other hand,the significance level obtained from the models presented in Table 3 indicates that Model 4 is acceptable with a certainty level of 95%and none of the variables are removed in the last stage.The details regarding the influences of variables on the selected model have been demonstrated in Table 4.In this table,the constant value(C)and the unstandardized coefficient(B)can be determined as per the prediction equation.The T-value(T-test)is defined as the coefficient(B)divided by the standard error.The Tvalue is also used for each independent variable to explain how a certain independent variable significantly affects the dependent variable(Kahraman and Altindag,2004;Guo and Hamada,2013;Enayatollahi et al.,2014).On the other hand,the importance of each variable in the equation is determined based on the standard coefficient(β).This standard coefficient indicates the amount of change in the dependent variable(based on units of standard deviation)per one standard deviation of change in the independent variable.The higher the values of β in an LMR model,the more influential the variables are in determination of the dependent variable(Sari,2020).Based on this,it could be concluded from Table 4 that the UCS with β=0.44 followed by theEwith β=0.31 has the highest influence on rock fracture toughness.

Table 2Correlation matrix for the variables.
Eventually,Eq.(3)could be used to predict mode I rock fracture toughness(KIC)considering beta regression coefficients(constant regression coefficient and regression coefficient of each variable)that are specified as the unstandardized coefficients.

GeneXpro Tools software 5.0 was used for GEP modeling.The selection of the initial populations(i.e.input patterns)is of great importance in this software.Accordingly,a predicting model based on GEP was developed to predict the toughness of rock specimens.It is noted that the dataset used in GEP modeling was identical to the data used in regression analysis.One of the main objectives of the present study is to provide a reliable relationship for prediction of the rock fracture toughness.For this,the collected dataset are divided into two parts(training set and testing set).Based on several investigators(Nelson and Illingworth,1991;Faradonbeh et al.,2016b;and Armaghani et al.,2018),70% of the whole dataset(41 out of 60 datasets)were used for training and the remaining dataset(19 datasets)were employed for testing the accuracy of the models.Therefore,it is expected to obtain a relationship asKIC=f(UCS,BTS,E)for rock fracture toughness using GEP method.Accordingly,seven combinations of the independent(UCS,BTS,andE)parameters and dependent parameter(KIC)could be investigated.In other words,seven GEP models were generated based on one-to-one relationships and the combination of input parameters with mode I rock fracture toughness and were evaluated to determine the best relationship.The seven combinations include univariate,bivariate,and trivariate equations,and the best model among seven combinations of the mechanical parameters was chosen.These models are introduced in Table 5.

Table 3Summaries of estimations of LMR models.

Table 4Influence of variables on the selected model.

Table 5The seven GEP models used in this context.
In the present study,the GEP model estimation has been considered in the following stages:
(1)The fitness function is determined in the first step.The present study used the RMSE function as the fitness function.Besides,MAE andR2were selected as other criteria for model evaluation.These criteria can be calculated using Eqs.(4)-(6).

wherexipreandximesrepresent predicted values and measured values,respectively;indicates the mean of measured data;andnshows the number of data in the dataset(Armaghani et al.,2018).
(2)The second stage includes assigning terminals and functions to generate chromosomes.Terminals are the independent variables,including various influence parameters in prediction of the rock fracture toughness(i.e.E,BTS,and UCS).The selection of function sets is not completely specified and different function sets have been employed in various studies(Faradonbeh et al.,2016a,b;and Armaghani et al.,2018).Hence,the suitable function sets must be selected based on the nature of the problem using trial-and-error method.A total of 11 operators,including mathematical and trigonometric operators(i.e.addition(+),subtraction(-),multiplication(×),and division(/),EXP,^2,^(1/3),ln,sin,cos,arctan),have been used in the present study aiming to achieve the highestR2and the lowest error.
(3)The third stage is selecting chromosomes structure in the GEP.This stage includes selection of the head size and the number of genes.Ferreira(2001)suggested trial-and-error method as the best way to determine suitable parameter values in GEP model architecture.Therefore,the values of the GEP parameters will increase stage by stage,and model accuracy will be evaluated after each stage.Finally,the most suitable model will be considered based on the selected parameters.According to the method suggested by Ferreira(2001),a head size of 7 was determined.Besides,Ferreira et al.(2001)stated that the best population is made up of 30-50 chromosomes.In this regard,the number of genes determining sub-ETs in each chromosome was considered as four through trial-and-error method.In addition,30 chromosomes were assigned in each GEP model.
(4)The linking function is determined at the fourth stage.The linking function determines the relationship between sub-ETs.The four main join functions in GEP modeling include addition(+),subtraction(-),multiplication(×),and division(/)functions.The four functions mentioned were all used at the first step in the present study.Afterward,the subtraction function was selected as the most suitable function and used for linking sub-ETs in the GEP model.
(5)The fifth stage consists of selection of the genetic operators and their rates.At this stage,a combination of all optimizing operators,three types of transposition,and three types of the combination were used.Table 6 indicates the parameters used and their rates in the GEP model from GeneXpro Tools 5.0 software to predict mode I rock fracture toughness briefly.

Table 6Input parameters of the GEP model.
The operators applied to the chromosomes are mutation,transposition,and recombination.The mutation is the most efficient operator with innate modification power.This operator can occur anywhere in the chromosome,considering the length of the chromosome that must be fixed(Ferreira,2001).In the head part,any symbol can be replaced with another terminal or function,while in the tail part,mutation can only convert one terminal to another.Typically,a mutation rate ranging from 0.01 to 0.1 equivalent to two-point mutations per chromosome is used(Ferreira,2006).The new individuals are then subjected to the same process of modification,and the process continues until the maximum number of generations is reached or the required accuracy is achieved.The inversion operator is applied only to the head region of the genes.This operator randomly selects a chromosome,the gene within the chromosome is modified,and the start and termination points of the sequence are inverted(Ferreira,2006).Ferreira(2001)proposed a value of 0.1 for the inversion rate.In the GEP,there are three types of transposable elements:
(1)Short fragments with a function or terminal in the first position that transpose to the head of genes,except the root(insertion sequence(IS)elements);
(2)Short fragments with a function in the first position that transposes to the root of genes(root insertion sequence(RIS)elements);
(3)Entire genes that transpose to the beginning of chromosomes.Recommended transposition values in the literature are considered between 0.01 and 0.1(Ferreira,2001,2002).The third genetic operator which is applied to the initial individuals of chromosomes is defined as recombination(or crossover).In the GEP,there are three different types of recombination,i.e.one-point recombination,two-point recombination,and gene recombination(Ferreira,2006).The suggested value is 0.7 for the sum of these three kinds ofrecombination operators(Ferreira,2001,2002).More explanations regarding the various types of recombination can be found in Ferreira(2006).In Table 6,each of the models is first evaluated based on a specific population and separate fitness functions.Then,each model evolves based on selected genetic operators.

Table 7Results from GEP models evaluation indices.
Table 7 indicates the values obtained forRMSE,MAEandR2in both training and testing stages for univariate,bivariate,and trivariate models.Eventually,the model with the lowestRMSEandMAEas well as the highestR2will be selected as the most reliable model.In Table 7,it reveals that Model 7 is the most suitable model among the models generated by GEP.This model has four genes and 30 chromosomes with respective head and tail sizes of 7 and 22.Hence,the gene length,which is equal to the sum of head and tail sizes,is 29.Among the 36,770 generations,the best fitness was determined to be 0.86 and 0.87 for the training and testing datasets,respectively.However,Model 5 also provides close results to that of Model 7 in terms of the training dataset.Meanwhile,theMAEandRMSEvalues in this model are higher compared to Model 7.Besides,the determination coefficient of this model for the testing dataset is also lower than Model 7.Fig.5 demonstrates the sub-ETs of each gene as well as the best model(Model 7)Karva expression.

Table 8Results obtained from the proposed models.
According to the sub-ETs demonstrated in Fig.5,the mathematical equation of each gene(sub-ETs)is extracted as Eqs.7-10.The final predictor(KIC)of GEP model is achieved:


Fig.5.ET of each gene related to best model.

The values obtained from the evaluation criteria using the two methods of LMR and GEP are presented in Table 8.
In the GEP,the UCS,BTS,and elastic modulus(E)were involved in modeling the rock fracture toughness.Due to the lack of collinearity between the inputs,there was no interaction between the input parameters.As the main criterion in data division,it tried to enter the lowest and highest data values for both training and testing datasets.This helped the efficiency and reliability of the model.As shown in Table 8,the GEP model has higherR2values for both train and test datasets compared to the LMR model.Besides,the calculated values ofMAEandRMSEin the GEP model are lower than that of LMR model.It is evident that the GEP model has higher accuracy and provides more reliable values forKIC.To demonstrate the results from the GEP and LMR models,the values predicted by each model and the corresponding measured rock fracture toughness are presented in Figs.6 and 7.

Fig.6.Mode I fracture toughness measured and predicted values using LMR:(a)Training and(b)Testing(black dashed lines denote y=x).

Fig.7.Mode I fracture toughness measured and predicted values using GEP:(a)Training and(b)Testing(black dashed lines denote y=x).

Fig.8.Comparison of mode I rock fracture toughness measured and predicted values using GEP and LMR methods.
Figs.6 and 7 express that the GEP model provides higher accuracy for training and testing datasets of mode I fracture toughness in comparison with the LMR model.Based on these figures,the best predictive model for mode I fracture toughness(KIC)is the GEP withR2values of 0.86 for training and 0.87 for testing data,compared to the LMR model withR2values of 0.69 and 0.74 for training and testing data,respectively.Moreover,the performances(i.e.MAE,andRMSE)of the LMR and GEP models for training and testing datasets are listed in Table 8.Regarding testing datasets,theRMSEandMAEvalues are 0.188 and 0.156,indicating a higher degree of accuracy using the GEP model;whereas these values were obtained as 0.473 and 0.223,respectively,for the LMR method.Fig.8 displays the predictedKICusing both LMR and GEP models and their corresponding measured values.In this figure,it is clear that the GEP model is a good fit to the measured values.
In the present study,60 laboratory data datasets of mode I rock fracture toughness from previous studies were collected and investigated.Accordingly,the two methods,i.e.GEP and LMR,were selected forKICprediction.In this study,laboratory datasets were split into two groups of training and testing data.Seven models were built based on independent datasets using the LMR method.These models were developed based on the relationship between independent parameters(i.e.E,BTS,andUCS)and the dependent parameter ofKIC.Moreover,several GEP models were built to obtain a reliable relationship.In this regard,70% of the laboratory data were allocated for training.In the next step,30% of the data was used to test the built model.Comparison of the results of using LMR and GEP for predicting mode I rock fracture toughness confirmed the advantage of the GEP method against the LMR statistical method.Analyses indicated that simple basic available rock properties are enough to develop reliable prediction models.Thus,two linear and nonlinear formulas are proposed to predict the mode I fracture toughness in association with three basic mechanical properties of rocks.The results showed that there is no multicollinearity between these basic parameters,thus the proposed empirical formulas present a high level of reliability.
It is found that the nonlinear GEP based equation provides a more accurate prediction of fracture toughness than the linear one.This study confirmed that,instead of time-consuming and complex circumstances measurement of mode I fracture toughness in the laboratory,soft computing can be utilized to predict such properties faster and more economically.However,development of soft computing models requires a large amount of input datasets.According to the equation provided by the GEP model,this model yielded more accurate results(the determination coefficientR2equals to 0.87,andRMSEandMAEequal to 0.188 and 0.156,respectively).Meanwhile,values obtained from the LMR method indicated lower conformity with actual measured values(the determination coefficient ofR2equals to 0.74,andRMSEandMAEare 0.473 and 0.223,respectively).Hence,it can be concluded that GEP can adequately explain the complexity and nonlinear behavior of mode I fracture toughness;furthermore,it can also simplify complex models into a simple equation.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Journal of Rock Mechanics and Geotechnical Engineering2022年5期