Weipeng Lu,Xuefeng Yan*
Key Laboratory of Advanced Control and Optimization for Chemical Processes of Ministry of Education,East China University of Science and Technology,Shanghai 200237,China
Keywords:Linear discriminant analysis Process monitoring Self-organizing map Feature extraction Continuous stirred tank reactor process
ABSTRACTVisual process monitoring is important in complex chemical processes.To address the high state separation of industrial data,we propose a new criterion for feature extraction called balanced multiple weighted linear discriminant analysis (BMWLDA).Then,we combine BMWLDA with self-organizing map(SOM)for visual monitoring of industrial operation processes.BMWLDA can extract the discriminative feature vectors from the original industrial data and maximally separate industrial operation states in the space spanned by these discriminative feature vectors.When the discriminative feature vectors are used as the input to SOM,the training result of SOM can differentiate industrial operation states clearly.This function improves the performance of visual monitoring.Continuous stirred tank reactor is used to verify that the class separation performance of BMWLDA is more effective than that of traditional linear discriminant analysis,approximate pairwise accuracy criterion,max–min distance analysis,maximum margin criterion,and local Fisher discriminant analysis.In addition,the method that combines BMWLDA with SOM can effectively perform visual process monitoring in real time.
Process monitoring is of great significance for ensuring process safety and for improving product quality.Multivariate statistical methods,such as principal component analysis (PCA) [1],partial least squares [2],and independent component analysis [3],have been widely adopted in process monitoring.In addition,the widely used deep neural networks,such as stacked auto-encoder [4,5],deep belief network[6],convolutional neural network[7],and long short-term memory [8],have also been introduced for process monitoring.However,these methods do not provide the visualization of the industrial process,so that the industrial process still looks like a black box,which is not conducive to process understanding for operators.Visual process monitoring is the application of a visualization method to map the industrial process data to a two-dimensional map,followed by process monitoring.Visualization provides powerful insight and process understanding in the industrial context,and accelerates fault diagnosis [9].Selforganizing map (SOM) is widely used for data visualization because it can retain the topological relations and density distribution of a data set when projecting the high-dimensional data to the two-dimensional space.In addition,the common visualization methods include PCA,and some manifold learning methods,such as isometric feature mapping (ISOMAP) [10],Laplacian Eigenmaps(LE)[11],and t-distributed stochastic neighbor embedding(t-SNE)[12].SOM has the following advantages in visualization compared with PCA and manifold learning methods.First,SOM is a nonlinear method based on the neural network.PCA is a linear dimensionality reduction method and cannot cope with the nonlinear data.Second,the output of SOM is 2-dimensional,which is specially designed for data visualization.However,the manifold learning methods and PCA have their essential dimensions for dimensionality reduction.They may cause loss of information and may not correctly reflect the structure of the high-dimensional data when they are used for data visualization.Finally,SOM does not have the outof-sample problem.The out-of-sample problem is that the method does not have a mapping function when visualizing the training data,so that the method can not visualize the new sample.Some manifold learning methods,such as ISOMAP,LE,and t-SNE,have the out-of-sample problem,so they are not suitable for the classification task.Therefore,many studies have applied SOM to process monitoring [13–16].
The continuous development of modern science and technology has increased the equipment requirement of industrial systems.Massive variables are measured through sensors,which makes the high-value information difficult to observe directly.If these high-dimensional data are directly sent to the SOM for clustering,then the training complexity of the network will increase,and the training results will not reflect the true industrial operation states.This situation will result in poor visual process monitoring.The common practice to deal with the issue is to extract important features from the original data.Feng and Xu [17] used PCA to extract important features from the original data,and then used SOM to visualize these features for fault diagnosis.Chen and Yan [18,19]combined SOM with CCA and LDA to improve the performance of SOM for process monitoring.Similarly,Song et al.[20] combined SOM with CVA for process monitoring.Among these methods,the method combining SOM with LDA(LDA-SOM)can separate different classes relatively well and has relatively good performance for process monitoring.The objective of LDA is to ensure that the different classes are as far apart as possible while guaranteeing that the scatter of each class is as small as possible[21–24].Therefore,the high-dimensional data from different classes can be separated as much as possible on the output of LDA-SOM.For visual process monitoring,the distances among classes should be as large as possible,and the samples from the same class should be as concentrated as possible.Large between-class distances and small within-class distances are conducive to improving the performance of visual process monitoring.However,one problem occurs in the application of LDA.The traditional LDA criterion overemphasizes the contribution of the classes far from the center in the between-class scatter measure.Thus,the neighboring classes usually overlap.Some works have been put forward to solve this problem with the corresponding algorithm.Loog et al.[25] proposed the approximate pairwise accuracy criterion (APAC).It expresses the between-class scatter matrix (BCSM)as the sum of the covariance matrix of the means of each class pair and assigns each class pair a weight function determined by the Mahalanobis distance between the class pair.[26] also introduced a weighting strategy based on class saliency information into LDA.Bian et al.[27] proposed a new criterion for feature extraction called max–min distance analysis (MMDA).It maximizes the minimum pairwise distance of all class pairs.Li et al.[28] developed the maximum margin criterion(MMC)for feature extraction by changing the criterion function of conventional LDA to tr(Sb-Sw),where Sbis the BCSM and Swis the within-class scatter matrix (WCSM).MMC ensures that the different classes are as far apart as possible while guaranteeing that the scatter of each class is as small as possible.Local Fisher discriminant analysis (LFDA) [29–31] is an extended form of LDA and is also a commonly used method for feature extraction.
In this paper,a novel criterion for discriminative feature extraction called balanced multiple weighted linear discriminant analysis(BMWLDA) is proposed to solve the problem of LDA overoptimizing the classes that are already well separated.Its basic ideas are as follows.BMWLDA maximizes the distances between neighboring classes while minimizing the scatter of neighboring classes.Thus,the classification result is in a balanced state,that is,different classes are completely separated.For the betweenclass scatter measure,BMWLDA removes the class pairs with large pairwise distances and then assigns a weight function that is negatively correlated with the pairwise distance for each remaining class pair.For the within-class scatter measure,BMWLDA removes classes far from all other classes and then assigns one weight function that is positively correlated with the scatter of the class to each remaining class.Therefore,BMWLDA can make a relatively uniform distribution of class centers and approximately maintain class separation.
Furthermore,we combines BMWLDA with SOM (BMWLDASOM) to visualize monitoring of industrial operation processes.BMWLDA is applied to extract the discriminative features from the original industrial data and maximally separate industrial operation states in the space spanned by these discriminative feature vectors.When the discriminative feature vectors are used as the input to SOM,the training result of SOM can differentiate industrial operation states clearly.This feature greatly improves the performance of visual monitoring of these states.We also combine MMDA,APAC,LDA,LFDA,and MMC with SOM to obtain MMDA-SOM,APAC-SOM,LDA-SOM,LFDA-SOM,and MMC-SOM.BMWLDA-SOM is more capable of separating industrial operation states than LDA-SOM,APAC-SOM,MMC-SOM,MMDA-SOM,and LFDA-SOM through the simulation studies on Continuous stirred tank reactor (CSTR).Therefore,the feature extraction performance of WLDA is better than that of LDA,APAC,MMC,MMDA,and LFDA.CSTR is also applied to verify that BMWLDA-SOM can effectively perform visual process monitoring in real time.
The main contributions of the paper are summarized as follows:
(1) BMWLDA is proposed for discriminative feature extraction.BMWLDA not only solves the problem of LDA overoptimizing the classes that are already well separated,but also improves the separation among classes.
(2) The method that combines BMWLDA with SOM can effectively perform visual process monitoring in real time.
SOM is an unsupervised self-organizing algorithm that can automatically cluster input patterns[32,33].It can retain the topological relations of data and project the samples with similar attributes to close locations on the two-dimensional plane.The architecture of SOM includes one input layer and one output layer.The neurons between two layers are full connected,and lateral suppression exists between the output neurons.
Assuming there are q k-dimensional input samples,then ith sample is xi=[xi1,xi2,...,xik]T,where i=1,2,...,q.The weight parameter corresponding to the jth output neuron is wj=T,and there are m output neurons.Kohonen algorithm is a representative learning method of the SOM network.The main steps are listed as follows [34].
(1) Assign random numbers to the weight vectors of output neurons.Set iteration time t=0,and the total number of iterations is T.Determine the value of learning rate η(0),where 0<η(0)<1.
(2) Select one input pattern xi=[xi1(t),xi2(t),...,xik(t)]Tto input into the input layer.
(3) Calculate the Euclidean distance djbetween the jth output neuron and xias follows.

(4) Find the neuron b with the smallest Euclidean distance djand then call it the best-matching unit (BMU).

(5) Update the weights of the BMU and its neighboring neurons according to the following formula.

where η(t) is a learning rate,and it is formulated as

where λ is a time constant.hc,j(t) is formulated as

where dcjis the distance between the output neuron and the BMU.α(t) is the neighborhood width,and it is formulated as

(6) Select the next input pattern to input into the input layer and then return to step (3).Until all input patterns have been learned,go to step (7).
(7) Let t=t+l and then return to step(2).Until t=T,the training process is completed.
Supposing the input data contain c classes,and the number of input data is N,and each sample contains m variables.The class i has nisamples.represents the jth sample of class i.
The BCSM of LDA is

where ziis the average corresponding to class i,and z represents the average of N samples.
The target of LDA is to solve the following formula.

Eq.(9) can be solved using Eq.(10).

where λqis the eigenvalue corresponding to the eigenvectorFrom the characteristic Eq.(10),the LDA projection matrix WFDAcan be obtained,which consists of the eigenvectors corresponding to the top c-1 eigenvalues obtained by eigenvalue decomposition of (Sw)-1Sb.That is,WFDA=
One drawback of the LDA is that the traditional LDA criterion overemphasizes the contribution of the classes far from the center in the between-class scatter measure in multiple classifications,thereby resulting in overlapping of neighboring classes.Therefore,we propose a novel criterion BMWLDA to overcome this drawback.
3.2.1.New BCSM
The new BCSM uses the form proposed by Loog et al.[25],which is the sum of the covariance matrix of the means of each class pair as follows:

The class pairs with large pairwise distances are already fully separated.We should focus more on class pairs with small pairwise distances than the above-mentioned class pairs.We define a critical distance that is the average of pairwise distances of all class pairs.When the pairwise distance of the class pair is larger than the critical distance,this class pair is considered to have the large pairwise distance.Otherwise,this class pair is considered to have the small pairwise distance.c kinds of data and c(c-1)/2 class pairs are available.The critical distance is expressed as follows:

where dijis the pairwise distance of classes i and j.BMWLDA removes the class pairs with large pairwise distances in the between-class scatter measure to completely separate the class pairs with small pairwise distances.Eq.(11) can be further rewritten as

Some pairwise distances of the remaining class pairs are relatively large,whereas others are relatively small.The class pairs with small pairwise distances may overlap after feature extraction.An intuitive idea is to assign a weight functionthat is negatively correlated with the pairwise distance to each remaining class pair.The final BCSM is formulated as:

3.2.2.New WCSM
The edge class is the class far from other classes.The scatter of this class can be ignored because it does not generally overlap with other classes.In this work,the class with the distance larger than dcrifrom other classes is considered the edge class.BMWLDA removes edge classes in the within-class scatter measure.The WCSM is rewritten as

Some classes have relatively large scatter,whereas others have relatively small scatter.The scatter of class i is expressed as

The class with small scatter is still closely clustered after feature extraction.Thus,we should pay considerable attention to the class with large scatter.BMWLDA assigns one weight function that is positively correlated with the scatter of the class to each remaining class in the within-class scatter measure.The final WCSM is expressed as

The BMWLDA criterion has the same form as the LDA criterion.The objective of BMWLDA is to solve the following formula.

According to the Lagrange multiplier method,there is

where α is the Lagrange multiplier.
Let ?L/?v=0,there is

Therefore,the optimal projection vectors are equal to the generalized eigenvectors of the eigenvalue problem:

BMWLDA-SOM includes offline modeling and online monitoring for visual process monitoring.In offline modeling,BMWLDA extracts the discriminative features from the training data and maximally separates industrial operation states in the space spanned by these discriminative feature vectors.Then,the discriminative features are used to train SOM.In online monitoring,the new features are extracted from the new data collected from the industrial process by using the projection matrix VBMWLDAattained in offline modeling.The steps of visual process monitoring are listed as follows:
(1) Collect the historical data and then standardize them asXtrain.
(3) Project Xtrainonto VBMWLDAto get the training features Ttrainby using Eq.(26).
(4) Train SOM with Ttrainfollowing the algorithm steps in Section 2.After training,different classes are divided into their respective areas,and the trained BMWLDA-SOM can be used for process monitoring.
(1) Obtain the data and then normalize them as the new data Xnew.
(2) Project Xnewonto VBMWLDAto obtain the new features Tnewby using Eq.(26).
(3) Send Tnewto the trained SOM.According to the area where the mapped point falls,we can monitor the operation states of the industrial process.
Experimental data generated by simulation on a non-constant temperature process called CSTR [35–37].The simple irreversible exothermic reaction A →B reacts in the reactor.Some of the heat must be removed by the coolant flowing through the jacket to control the temperature of the reactor and the composition of the reaction product.The structure of CSTR is shown in Fig.1,and Table 1 shows the physical meaning of each variable in Fig.1.

Table1 Variable description.
The following assumptions must be made when creating the CSTR model.
(1) Keep the total volume of the material before and after the reaction in the reactor constant.
(2) Keep the materials in the reactor evenly mixed.
(3) Retain the parameter values throughout the reaction.
Under the above-mentioned assumptions,a mathematical model based on the law of conservation of energy is established as follows [35].


Fig.1.Structure of CSTR.
The meaning of parameters of the CSTR model is as follows:k0represents the reaction rate constant,E/R represents the activation energy coefficient,S represents the bottom area of the reactor;ΔH represents the reaction heat,ρ represents the reactant density,UAcrepresents the heat transfer coefficient,Cprepresents the reactant specific heat,CpCrepresents the specific heat of coolant,and VCrepresents the volume of coolant.In the CSTR process,the 10 variables CA,T,TC,h,QF,Q,CAF,QC,TF,and TCFare selected as monitoring variables.The steady operation conditions are listed in Table 2.Table 3 shows eight faults,including step and slow drift faults.The sampling interval is 0.01 min,and 600 samples are collected.The fault is introduced from the 301st sample.

Table2 Steady operation conditions of CSTR.

Table3 Eight kinds of fault description.
First,we use training data to explore the ability of SOM to separate different operation states.Then,we use test data to test its fault diagnosis ability.300 training and 300 test samples are selected for each state generated from the CSTR process.
5.2.1.Research on normal state and fault 1
The SOM network is trained using the training data of normal state and fault 1.The training result is shown in Fig.2(a).On the output map,‘‘N”indicates the label of the training sample of the normal state,‘‘F1”indicates the label of the training sample of fault 1,and the number in the parentheses indicates the number of training samples mapped into the corresponding grid.In addition,different colors represent different kinds of data.The larger the color shape in the grid,the more samples are mapped into the grid.For example,‘‘F1(5)”indicates that five training samples of fault 1 are mapped into the grid.The red represents the normal state,and the yellow represents fault 1.Fig.2(a)shows that the two kinds of data are divided into two areas with clear boundaries.The SOM network has the capability of fault diagnosis when it is completely trained.Then,the performance of fault diagnosis of the trained SOM is tested.The test result is shown in Fig.2(b).Two kinds of test data are mapped onto the correct area,and no sample is divided into the wrong class.The accuracy of fault diagnosis is 100%.
5.2.2.Research on normal state and faults 1 and 2
Normal state and faults 1 and 2 are selected for further exploring the fault diagnosis capability of SOM.Fig.3 shows that fault 1 is divided into its own area and can be completely separated from other states,whereas the normal state and fault 2 are mixed together.Therefore,we use BMWLDA to extract the discriminative features from the original data for fully separating different states,and then SOM is applied to visualize these features.This method is expected to separate different states on the output plane,and BMWLDA-SOM is researched next for fault diagnosis.
5.3.1.Research on normal state and faults 1 and 2
Normal state and faults 1 and 2 are still selected to evaluate BMWLDA-SOM.BMWLDA is used to extract the discriminative features,and then these features are send to train SOM.The training result is shown in Fig.4(a).The three kinds of data are clustered into three different areas with clear boundaries.The test result is shown in Fig.4(b).No sample is mapped into the wrong classes,and the accuracy of fault diagnosis is 100%.In BMWLDA-SOM,BMWLDA can extract the discriminative features that improve the separation between classes.Therefore,the fault diagnosis capability of BMWLDA-SOM is better than that of SOM.
5.3.2.Research on normal state and faults 1,2,3,4,5,6,7,and 8
Normal state and faults 1,2,3,4,5,6,7,and 8 are selected for further exploring the fault diagnosis capability of BMWLDA-SOM.The training result is shown in Fig.5(a).The nine classes are divided into nine different areas.The test result is shown in Fig.5(b).The normal state has three samples that are mapped into fault 1,and its accuracy is 99%.The accuracy of other classes is 100%.Although the number of fault classes has increased,BMWLDA-SOM still has a high accuracy of fault diagnosis.The experiment proves that the fault diagnosis performance of BMWLDA-SOM is effective.
APAC,MMDA,MMC,and LFDA are improved feature extraction methods based on LDA.Since SOM can project the data into lowdimensional space and maintain the topological structure and density distribution of the input data,SOM is used to visualize the class separation performance based on different feature extraction methods.Similar to BMWLDA-SOM,we combine LDA,APAC,MMC,MMDA,and LFDA with SOM to obtain LDA-SOM,APAC-SOM,MMCSOM,MMDA-SOM,and LFDA-SOM.Normal state and faults 1,2,3,4,5,6,7,and 8 are selected to evaluate the six methods.The results based on the six methods are shown in Fig.6(a)–(f).

Fig.2.Results for normal state and fault 1 based on SOM.

Fig.3.Training result for normal state and faults 1 and 2 based on SOM.
Here,an overlap rate is used to evaluate the performance of visual results based on different methods.The overlap rate of each class is equal to the sample amount in the class that overlaps with samples of other classes divided by the total sample amount of the class.Furthermore,the average overlap rate is equal to the average of the overlap rates of all classes.The overlap rate for each method is listed in Table 4.

Table4 Comparison of overlap rates of different methods with that of BMWLDA-SOM
Faults 2 and 6 are quite different from normal state,and they are edge classes.The traditional LDA criterion overemphasizes the contribution of the classes far from the center in the between-class scatter measure,which easily leads to overlapping of neighboring classes.When LDA is used for feature extraction,normal state,faults 3,4,and 8 are mixed together.APAC gives each class pair a weighting function in the between-class scatter measure.In APAC,if the Mahalanobis distance of the class pair is large,then the weight will be small.When APAC is used for feature extraction,fault 3 and normal state overlap significantly,but other states are nearly separated without overlapping.MMDA maximizes the minimum pairwise distance of all class pairs.When MMDA is used for feature extraction,normal state and faults 3 and 8 have a large overlap,whereas other states are nearly completely separated.MMC changes the objective function of traditional LDA to tr(Sb-Sw).Its training result is much better than that of LDA,and only normal state and fault 3 have a large overlap,whereas other states are nearly completely separated.When LFDA is used for feature extraction,the training result is also much better than that of LDA,and only normal state and fault 1 have a large overlap,whereas other states are nearly completely separated.
BMWLDA maximizes the distances between neighboring classes while minimizing the scatter of neighboring classes Thus,the classification result is in a balanced state,that is,different classes are fully separated.In the between-class scatter measure,BMWLDA removes the class pairs with large pairwise distances and then assigns a weight function that is negatively correlated with the pairwise distance for each remaining class pair.In the within-class scatter measure,BMWLDA removes classes far from all other classes and then assigns one weight function that is positively correlated with the scatter of the class to each remaining class.When BMWLDA is used for feature extraction,the nine classes are completely separated,and only a few samples overlap.The experiment shows that the class separation performance of BMWLDA is better than that of LDA,APAC,MMDA,MMC,and LFDA in multiple classifications.Therefore,BMWLDA-SOM is more suitable for fault diagnosis than LDA-SOM,APAC-SOM,MMDA-SOM,MMC-SOM,and LFDA-SOM.

Fig.4.Results for normal state and faults 1 and 2 based on BMWLDA-SOM.

Fig.5.Results of BMWLDA-SOM for fault diagnosis.
BMWLDA-SOM have the capacity to separate operation states of the industrial process with relatively clear boundaries according to the experiments above.This excellent visual result can greatly improve the accuracy and intuitiveness of process monitoring.When the industrial data are continuously fed into the trained BMWLDA-SOM,their mapped points are connected into a trajectory,and we can monitor the operation state of the industrial process according to the direction of the trajectory.We select the normal state and faults 2 and 3 for simulation.The entire process lasts 3 min.The data in the first 1 min are fault-free.Fault 2 is introduced after 1 min,and fault 3 is introduced after 2 min.The trajectory of monitoring is shown in Fig.7.Fig.7(a) shows that,in the first 1 min,the trajectory has been in the area of normal state.As time processes,the trajectory moves to the areas of faults 2 and 3 [see Fig.7(b)].According to this trajectory,we can intuitively and effectively monitor industrial operation states.Furthermore,Table 5 lists the fault detection rate (FDR) of each class for different methods.The FDR of each class is equal to the number of correctly detected samples in the class divided by the total number of samples in the class.In particular,the FDR of the normal state refers to its classification accuracy.PCA and kernel PCA(KPCA) are the traditional methods,and they use Hotelling’s T2statistic and 99% confidence level for process monitoring.It can be seen that the FDRs of most classes are higher in BMWLDASOM than in other methods.The bold numbers in each row in Table 5 indicate the highest FDR for this class.The average FDR of BMWLDA-SOM is higher than those of other methods,and the average FDRs of PCA and KPCA are lower than those of other methods.

Table5 Comparison of FDRs of different methods with that of BMWLDA-SOM
The traditional LDA criterion overemphasizes the contribution of the classes far from the center in the between-class scatter measure,thereby resulting in overlapping of neighboring classes.In this paper,BMWLDA is proposed to overcome the shortcomings of traditional LDA mentioned.Furthermore,BMWLDA-SOM is proposed to visualize monitoring of industrial operation process.CSTR is used to verify that the class separation performance of BMWLDA is better than that of LDA,APAC,MMC,MMDA,and LFDA.CSTR is also applied to illustrate that BMWLDA-SOM can effectively perform visual process monitoring in real time.However,there are complex nonlinear relationships among variables in the industrial process.BMWLDA is essentially a linear method and can not extract the nonlinear information in the industrial process,so it can not separate some complex faults well.The future work will extend BMWLDA to a non-linear version to improve the performance of process monitoring.

Fig.6.Visual results of different methods.

Fig.7.Trajectory of monitoring based on BMWLDA-SOM.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
The authors are grateful for the support of National Key Research and Development Program of China (2020YFA0908303),and National Natural Science Foundation of China (21878081).
Chinese Journal of Chemical Engineering2021年8期