Kangling Liu ,Xin Jin ,Zhengshun Fei,Jun Liang ,*
1 State Key Lab of Industrial Control Technology,Institute of Industrial Control Technology,Zhejiang University,Hangzhou 310027,China
2 School of Automation and Electrical Engineering,Zhejiang University of Science and Technology,Hangzhou 310023,China
Keywords:Adaptive partitioning Fault detection Fault isolation Principal component analysis
ABSTRACT In chemicalprocess,a large number ofmeasured and manipulated variables are highly correlated.Principalcomponent analysis(PCA)is widely applied as a dimension reduction technique for capturing strong correlation underlying in the process measurements.However,it is difficult for PCA based fault detection results to be interpreted physically and to provide support for isolation.Some approaches incorporating process knowledge are developed,but the information is always shortage and deficient in practice.Therefore,this work proposes an adaptive partitioning PCA algorithm entirely based on operation data.The process feature space is partitioned into several sub-feature spaces.Constructed sub-block models can not only reflect the local behavior of process change,namely to grasp the intrinsic localinformation underlying the process changes,butalso improve the fault detection and isolation through the combination of local fault detection results and reduction of smearing effect.The method is demonstrated in TE process,and the results show that the new method is much better in fault detection and isolation compared to conventional PCA method.
Proper fault detection and isolation have become an interesting research area in recent years,with growing demands for high performance,efficiency,reliability and process safety.As a data-driven methodology for extracting process information from massive data and interpreting them,PCA is a widely applied dimension reduction technique for capturing strong correlation underlying in the process measurements under normaloperation condition,expressed as a statisticalmodel[1–3].A process faultis detected ifthe measured process behavior violates the control limits defined by the model.Typically,the auto-correlated principal component values or scores and the squared prediction error(SPE)are applied for faultdetection,named as T2statistics and SPE statistics,respectively.Contribution plots are popularly adopted to identify and isolate the faulty variables causing the process to violate its normal operation region by determining the contribution of each variable to the fault detection statistics[4].The contribution plot approach is based on the assumption that the faulty variables have the largest contribution to the fault detection index.Although the contribution plot approach does not need prior fault information,itleads to obscure diagnosis results because the faulty variables increase the contribution of variables not affected by the fault,called “smearing”effect[5].
PCA gradually presents some limitations for fault detection and isolation.One major limitation is poor physical interpretation.With PCA model,each extracted principal component is a linear combination of almost all observed variables,making each principal component dif ficult to interpret physically and,in turn,providing ambiguous information for fault isolation.Various approaches have been proposed to address this drawback of PCA[6–9].Some approaches such as sparse PCA[6]and shrinking PCA[7]have been proposed to construct optimization problem,minimizing the variance of principal components and the number of nonzero elements in the loadings.Thus,sparse loading vectors with a few nonzero elements are obtained and easier to interpret.Multi-block methods[8,9]have been proposed to improve both the fault detection performance and fault interpretation,but these methods need some process knowledge for block partitioning,which is not always available,especially in complex chemical processes.To overcome this,some improved multi-block PCA approaches have been proposed,totally based on process data[10–12].
Another major limitation of PCA based fault detection and isolation approach is that the PCA model is time-invariant.Chemical processes are often slow-varying and with normal disturbances,so the application of a fixed model based fault detection and isolation might lead to mistake and fail to report the warning of fault.Huang et al.[13]proposed mixture discriminant monitoring,integrating supervised learning and statistical process control charting techniques,which also utilizes both normal and faulty historical data in process modeling.Lee et al.[14]developed adaptive multi-block PCA,adaptive consensus PCA,and adaptive multiscale PCAalgorithmsforupdating the modelstructure to dealwith changing process.Zhao and Sun[15]presented relative PCA and multiple time region based fault reconstruction modeling algorithms for fault subspace extraction and online faultdiagnosis.However,these approaches are ineffective when the historical fault information is shortage and deficient,which is a problem commonly present in complex chemical processes.
To address the above issues,this work proposes an adaptive partitioning PCA(APPCA)algorithm.The advantages of APPCA method are as follows.(1)It is easier for modeling and interpretation of the collected data matrix from process by decomposing it into several blocks.(2)The block partition step is carried out entirely based on the process data,which makes use of both historical normal data and online operation data.It is of particular interest when the process knowledge is deficient.(3)A series of sub-block PCA models are constructed adaptively according to the operating process.It not only reflects the local behavior of process change but also enhances fault detection and isolation through the combination of individual fault detection results and reduction ofthe smearing effect.In this paper,the APPCA algorithm is demonstrated and applied to fault detection and isolation schemes.A case study of TE process is used to illustrate the application of APPCA.
Consider two process measurement spaces as the analysis subjects,Xn(Nn×J)and Xf(Nf×J),each consisting of the same number of variables J and possibly different numbers of samples Nnand Nf.Xnis the normal data matrix and Xfis the faulty data matrix collected from one fault case.Subscript n denotes the normal case and subscript f denotes the fault case.Therefore,a global PCA model can be built based on the normal data matrix Xn(Nn×J)as follows.

where Tn(Nn×R)and Pn(J×R)stand for the PCs and loadings,respectively,matrices^Xnand Endenote modeled variances of Xnand residuals(un-modeled variances of Xn),respectively,and R is the number of retained PCs.The retained PCs should represent most of the normal variability of process in an optimal way.As a simple and effective approach for selecting the number of principalcomponents,CPV is popularly used.In this paper,CPV approach with 85%of normal variability can basically balance the amount of parsimony and comprehensiveness of retained PCs,so it is employed.
For the faulty data matrix Xf(Nf×J),variable scores Tfand residuals Efcan be obtained as

The residual space typically describes the material and energy balances,which is much sensitive to process faults[7].When variable residuals diverge significantly,the existence of a fault associated with the breakdown of the correlation among variables.Since an interpretable residual space can provide useful information for isolation to facilitate the detection and isolation of nascent faults,the variable partitioning method is performed based on the PCA residuals of operation data.First,Pearson's correlation coefficient Rfof variable residuals Efis calculated,and the element of row i and column j in Rfshows the correlation coefficient between variables i and j,represented as[16]

where Ef,iand Ef,jdenote residuals of the i th and j th variables,respectively,is the mean value of Ef,iand ēf,jis the mean value of Ef,j.Then the t-test is applied to Rf,resulting in significance matrix Sf,each sf-value in the matrix Sfdenotes the probability of getting a correlation when the true correlation is zero,ranging from 0 to 1.Smaller sf-value means that the correlation of the two variables is more significant.If sf-value is sufficiently small,say less than 0.05 for 95%confidence interval,the correlation between the two variables is significant.Lastly,based on matrix Sf,the variables are partitioned into some sub-blocks using the complete linkage algorithm[9],and variables within a predefined positive threshold are grouped into a sub-block.
How to determine the number of sub-blocks or the pre-defined positive threshold will greatly in fluence the performance of fault detection and isolation.Ideally,each sub-block should be well-connected and any two sub-blocks should be well-separated.Furthermore,the number of sub-blocks should be sufficiently small to ensure manageability,while the number of sub-blocks should be sufficiently large to warrant the superiority of APPCA.Therefore,the following optimization problem is constructed to determine the number of sub-blocks b and the value bfis the solution,which results in sub-blocks Ci{i=1,2,…,bf}:

where T denotes the positive threshold,within which all variables are grouped,so b is determined by T;bmaxdenotes the maximum allowable number of sub-blocks(b≤bmax)through lots of trial,which can be 10 in the case study of this work.λ>0 is employed to tune a compromise between the number of sub-blocks and the maximumdistance Tibetween two variables in the i th sub-block.In this work,bmax=10 and λ=1.f(Ti)is a piecewise linear function of Tias follows:

where Tmaxis a threshold value,which defaults to 0.05 in this work.Thus,the normal data matrix Xn(Nn×J)is partitioned into bfsubblocks Xn,i(Nn× Ji)(i=1,2,…,bf),normal data matrix of Jivariables in the i th sub-block Ciand J1+J2+…+Jb=J.Therefore,the bfsubblock PCA models based on Xn,i(Nn× Ji)(i=1,2,…,bf)is

where Tn,i(Nn× Ri)and Pn,i(J× Ri)stand for PCs and loadings of subblock Ci,respectively,matrices^Xn,iand En,idenote the modeled variances of Xn,iand residuals,and Riis the number of retained PCs of sub-block Ci.CPV with 85%is used to select the number of PCs.
PCA partitions the measurement space into two orthogonal spaces:principal component subspace and residual subspace.Each measurement is identified by its score distance to the principal component subspace and the model error on the residual subspace.With the APPCA method,fault detection is carried out as follows.
Considering the data set Xnoc∈RNc×Min normaloperation condition(NOC)and new measurements Xmea∈RNm×M,a series ofmodels can be built,including one global PCA model and b sub-block PCA models.Besides,process variables are partitioned into b sub-blocks and Xmeacan be rearranged as

where each sub-block Xi∈ RNm×Mi( i=1,2,…,b)has Mivariables,and M1+M2+…+Mb=M.Let Pi(Mi× Ki),Λi(Ki× Ki),Pg(Mg× Kg),and Λg(Kg× Kg)denote loadings,diagonal matrix with the first Kieigenvalues of the i th sub-block model,loadings,diagonal matrix with the first Kgeigenvalues of the global model,respectively,where Kiand Kgrepresent the numbers of retained PCs of the i th sub-block model and of the global model,respectively.Then,for each sample measurement vector x ∈ R1×Min Xmeaand each sub-block xi∈ R1×Mii=1,2,…,b(
),b+1 pairs of T2and SPE statistics are calculated and compared to their corresponding confidence limits:


Assume that variables associated with the fault likely exhibit large contributions.After a fault is detected,the contribution of each variable to the fault detection statistic can be used for fault isolation.Therefore,the faulty variables with large contributions are removed until the fault detection statistic is brought back within its control limit.Various formulas have been proposed for contribution computation,such as complete decomposition,partial decomposition and reconstruction based contribution.In this work,reconstruction based contribution(RBC)proposed by Alcala and Qin[19]is adopted for fault isolation.
Assume that a fault occurs and faulty variables can be reconstructed as xfj=Θjfj,where fjdenotes the fault amplitude along the fault direction Θj.Θjdoes not have to be a column vector of an identity matrix and it can be a matrix denoting a multi-dimensional fault or multiple sensor fault.Without loss of generality,the fault detection indices of reconstructed measurement vector x ∈ R1×Mand each sub-block xi∈R1×Mi(i=1,2,…,b)are as follows.

where Mgand Miare applied respectively to denote the global PCA model based and the i th sub-block model based fault detection statistic such as SPE statistic and T2statistic.In this work,in order to take both SPE and T2statistics into consideration,Mgand Miare represented as

The reconstructed faulty variable of the global model is found by minimizing φgwith respect to fj:fj=;and the reconstructed faulty variable of the i th sub-block model is found by minimizing φiwith respect to fj:fj=.Then,the RBCs with each reconstructed variable based on the global and sub-block models can be expressed as


Note that the larger the RBC value with variable j,the smaller the value of corresponding reconstructed index.Upon the above analysis,the proposed method for fault isolation is performed.Considering a measurement vector for each built model,calculate the RBCs with each reconstructed variable in the firststep.Next,performthe following steps in sequence:reconstruct the measurement vector by eliminating the information of selected faulty variables,select the variable with the maximum RBC and identify it as another faulty variable.Repeat these steps untilthe reconstructed index,withoutthe information ofselected faulty variable,is under the control limit.Assuming that b+1 models,including b sub-block models and one global model,are built and a fault is detected in Section 3,the algorithm for fault isolation is summarized as follows.
Step 1.Considering a measurement vector z∈ R1×mwith m sensor variables,assume initially that all the m variables are nonfaulty.For each sub-block vector zi∈ R1×miwith mivariables(i=1,2,…,m),m1+m2+…+mb=m,the number of faulty variable nf,i=0 and the faulty variable set xf,i=?.Considering the first sub-block model,set i=1.
Step 2.For mi-nf,inon-faulty variables in the i th sub-block model,based on the sub-block data zi,calculate each RBC with any reconstructed variable in the sub-block using Eq.(13).Then evaluate the reconstructed index with each reconstructed variable using Eq.(15).
Step 3.Selectthe non-faulty variable j?xf,iwith the maximum RBC and insert it into xf,i,so nf,i=nf,i+1.Denote nf,iselected maximum RBC and corresponding reconstructed index asandrespectively.
Step 4.Set zi=zi-Θjfj,where fjdenotes the fault amplitude along the sensor direction Θj,which is the j th column of mi× miidentify matrix.Step 5.Ifis still over the control limitof the i th sub-block model,go back to Step 2.Otherwise,set i=i+1,if i≤b,go to Step 2.
Step 6.For the global model,based on z,set the number of faulty variables nf=nf,1+nf,2+…+nf,band the faulty variable set xf=xf,1∪xf,2∪…∪xf,b.Based on the calculated RBCs ofselected faulty variables in xf,design the reconstructed direction Θj∈ R1×mas follows.For each element l(l=1,2,…,m),,whereand RBC(l)denotes the RBC of the selected l th faulty variable.With the designed directionΘj∈R1×m,calculate the RBC using Eq.(12)and evaluate the reconstructed index using Eq.(14).
Step 7.If the reconstructed index is still over the control limit of the global model,set z=z-Θjfjand proceed to the next step.Otherwise,the algorithm is terminated.
Step 8.For the m-nfnon-faulty variables,based on z,calculate each RBC with any reconstructed variable using Eq.(13).Evaluate the reconstructed index with each reconstructed variable using Eq.(15).Select the non-faulty variable j?xfwith the maximum RBC and insert it into xf,so nf=nf+1.Denote the nfth selected maximum RBC and corresponding reconstructed index asand,respectively.Set z=z- Θjfj.Ifis over the control limit of the global model,repeat Step 8.Otherwise,terminate the algorithm.
Eqs.(12)–(15)guarantee the reconstructed index to decrease monotonically during the iterations,so the algorithm converges.Note that for new measurements Z∈Rn×m,with n samples and m variables,each RBC with a reconstructed variable is calculated using each measurement vector zi∈ R1×m(i=1,2,…,n).For simplicity,in each iteration,select the non-faulty variable with the maximum average value of n measurement vectors based on RBCs and insert it into faulty variable set.When a faultoccurs,the correlation structure ofunaffected variables and affected variables is changed and the correlation coefficients are small,so they are partitioned into different sub-blocks using the complete linkage algorithm.Therefore,when the APPCA algorithm performs fault isolation for each sub-block model,the contributions are confined within the selected faulty variables of each sub-block and the fault magnitude will not smear over the non-faulty variables.
The proposed method is illustrated through the well-known TE chemical process[20].The process consists of five major operations:reactor,product condenser,vapor–liquid separator,recycle compressor and productstripper,asshown in Fig.1.The TE processhas41 measured variables(22 continuous process measurements and 19 composition measurements)and 12 manipulated variables.The plant-wide control structure recommended by Lyman and Georgakis[21]are used in this study.
In thiswork,33 variablesare selected forthe faultdetection and fault isolation,listed in Table 1,including 22 continuous measurements and 11 manipulated variables.RCW is the abbreviation of reactor coolingwater,CCW denotes condenser cooling water,and SCW stands for separatorcooling water.The agitation speed is fixed in the process.19 composition measurements are excluded since the sampling interval is unequal to that of other 34 process measurements.22 simulation datasets are collected in 21 fault modes and 1 normal mode and each datasetcontains 960 samples.In the 21 faultmodes,the process is operated in normal mode at the beginning and the fault is introduced from the 160th sample.Types of the 21 faults are provided in Table 2.

Table 1 Selected variables for fault detection and isolation in the TE process

Fig.1.Diagram of TE process.

Table 2 Process disturbances
Before the application of global PCA and the proposed APPCA method,we assume that one normal dataset is auto-scaled with zero-centerunit-variance and 21 faulty datasets are well normalized by the normal information.With the APPCA method,a series of models for each faulty dataset are built and the number of sub-block models is determined by integrating the correlation coefficients and complete linkage algorithm.The variable partitioning result for each fault is listed in Table 3.In each model the number of PCs is selected to capture over 85%of data variation.To test the APPCA method,the fault missing detection rate is chosen as the faultdetection result.Its specific faultdetection results forthe 21 faults are shown in Table 4,and the results of the global PCA are also given for comparison.The minimum missing detection rates achieved for each faulty dataset are highlighted in bold.Noted that fault missing direction rates ofthe APPCAmethod are lowerin the modes ofFault3,9 and 15,butthey hold high values,allover 0.87.Itis difficultfor both PCA and APPCA methods to detect the three types of faults,because they have little effect on the variation and mean of the overall process.Compared to the global PCA,the APPCA method exhibits the lowest fault missing detection rate in most fault modes and performs much better for the modes of Fault 5,10,16 and 19.The fault detection results of two typical Faults 5 and 10 are discussed later.The delayed time of detecting faults is also shown in Table 4.The minimum delayed times of detecting faults for each faulty dataset are highlighted in bold.Compared to the global PCA,the APPCA method presents the minimum delayed time in all 21 fault modes.

Table 4 Fault missing detection rates/delayed time of detecting faults

Table 3 Variable partitioning for 21 faults
Fault 5 is a step change in the CCW inlet temperature,fault detection results with the global PCA and APPCA methods are shown in Fig.2.The global PCA only detects the fault at the beginning,between samples 160 and 340.Then the fault is hardly detected by T2and SPE statistics,because the control loops designed to improve the process performance can compensate for the step disturbance of CCW inlet temperature and bring process variables back into their desired values.The control structure reduces the in fluence of step fault,attaining a new steady state.Since Fault 5 still exists and affects the process more or less,the process cannot completely return to its previous normal mode.It is probably mistakenly concluded that the fault is corrected after sample 340 by the control operations.The same situation exists in the four sub-blocks(1–4),as shown in Fig.2(b)–(e).However,sub-block 5 can detect the fault by the SPE statistic in the whole fault operation,as shown in Fig.2(f).The missing detection rates of T2and SPE statistics with the global PCA are calculated as 0.7575 and 0.7513,respectively,while those with the APPCA are 0.6863 and 0.The SPE statistic of APPCA will continue to informthe operators thatthe faultstillremains in the process and sub-block 5 takes the most responsibility for fault detection.To further illustrate the responsibility of sub-block 5,variables in this subblock are shown in Fig.3.Variable 33(CCW Flow)is the faulty variable,which shifts to a new value after sample 160.It can be interpreted that the CCW inlet temperature is still abnormal and the CCW flow is continuously affected.With the above analysis,the APPCA method is superior to the global PCA for Fault 5 detection and can provide some more correct information for operators.

Fig.2.Fault detection results for Fault 5.(a)global PCA;(b–f)PCA sub-blocks 1–5;solid line:monitor statistics;dashed line:99%control limit.

Fig.3.Impacts ofFault5 on variable 17(Stripper Under flow)and variable 33(CCWFlow).
Since sub-blocks 1,2,3 and 4 take most responsibility for the fault between samples 160 and 340,fault isolation is carried out in each sub-block to illustrate the most responsible variable.The reconstructed indices are shown in Fig.4.Since Variable 20(Compressor Work),Variable 11(Product Separator Temperature),Variable 18(Stripper Temperature),and Variable 4(A and C Feed)hold the smallest average values in corresponding sub-block,it is concluded that they take the most responsibility.Therefore,these process variables are the root causes of Fault 5.

Fig.4.Reconstructed indices for fault isolation of Fault 5.(a–d)sub-blocks 1–4.

Fig.5.Fault detection results for Fault 10.(a)Global PCA;(b–f)sub-blocks 1–5;solid line:monitor statistics;dashed line:99%control limit.
Next,Fault 10,a random variation in C feed temperature(stream 4),is considered.Fault detection results with both global PCA and APPCA methods are shown in Fig.5 and Table 4.Fig.5(a)shows that the global PCAdetects the faultwith a high faultmissing detection rate(0.6963 for T2statistic and 0.6937 for SPE statistic),so that most faulty samples are mistaken for fault-free samples.It fails to report the earlier warning of fault.On the other hand,the APPCA method can detect the fault with a lower fault missing detection rate(0.5112 for T2statistic and 0.1963 for SPE statistic).Particularly,as shown in Fig.5(e),the sub-block 4 model can detect Fault 10 with a high sensitivity(the fault missing detection rate for T2and SPE statistics are 0.8150 and 0.2313,respectively).The SPE statistic has a lowerfault missing detection rate than the T2statistic,indicating that Fault 10 has a significantimpact on the correlation structure of the sub-block 4 model.According to the fault detection results based on four sub-blocks(1,2,3,5)as shown in Fig.5(b),(c),(d),(f),it is worthy to note that only T2statistics for the two sub-blocks(1,2)can detecta few offaulty samples and the faultcan barely be detected by others.This reveals that Fault 10 only has a local in fluence and the process fails to compensate for it.Fig.6 shows the most faulty information about variables affected by the fault on sub-block 4 and the impacts of Fault 10 on variables in sub-block 4.To find the cause of fault 10,the reconstructed indices for fault isolation in sub-block 4 are calculated and shown in Fig.7(d).Variable 18(Stripper Temperature)is identified as the faulty variable,because the average reconstructed index in this sub-block is under the control limit after removing it.Other reconstructed indices in the sub-blocks(1,2,3)are illustrated in Fig.7(a)(b)(c),in which Variable 20(Compressor Work),Variable 27(Compressor Recycle Valve)and Variable 4(A and C feed),with the smallest average reconstruction indices in the corresponding subblock,dominate the most responsibility for the fault isolation in the three sub-blocks.Fig.8 shows the fault isolation results based on the sub-block 1–4 model using the APPCA method and the reconstructed indices for each sub-block are under the control limit after removing its faulty variable.Fig.9(a)illustrates the changes of global PCA based reconstructed indices before and after removing the first fault direction.Fig.9(b)shows the normalized variable contributions to the first fault direction constructed by the faulty variables isolated in sub-blocks 1–4 using the APPCA method.The global PCA reconstructed index is under the control limit after removing the first fault direction and faulty Variable 18 with the largest contribution dominates the firstfault direction.Based upon the above analysis,the effectof Fault 10 on the process can be explained as follows.The randomvariation ofC feed temperature in stream 4 first affects the stripper temperature(Variable 18)when it flows through the stripper,and then affects the compressor work(Variable 20)controlled by the stripper temperature and pressure.In turn,the compressor recycle valve(Variable 27)is tuned to compensate for the change of compressor work.In addition,since the stripper temperature reflects the product composition,and A and C feed(Variable 4)is also under the in fluence,these process variables can be determined as the root causes of Fault 10.

Fig.6.Impacts of Fault 10 on Variable 18(Stripper Temperature),Variable 19(Stripper Steam Flow)and Variable 31(Stripper Steam Valve).

Fig.7.Reconstructed indices for fault isolation of Fault 10.(a–d)Sub-blocks 1–4.

Fig.8.APPCA based detection and isolation results for Fault 10 and comparison of reconstructed indices before and after removing a faulty variable.(a)Variable 20 in Sub-block 1;(b)Variable 27 in Sub-block 2;(c)Variable 4 in Sub-block 3;(d)Variable 18 in Sub-block 4.

Fig.9.Global PCA based detection and isolation results for Fault 10.(a)Comparison of reconstructed indices before and after removing the first fault direction(1st FD);(b)normalized variable contributions to the 1st FD.
Consequently,upon the above analysis on the TE process,the proposed APPCA method is demonstrated to perform better in fault detection and isolation than the global PCA.
In this work,an adaptive partitioning PCA algorithm is proposed to enhance fault detection and isolation.The approach obtains residuals of online operation data by performing PCA upon the whole variables in the first step.Based on these residuals,the variables are partitioned into several blocks using the complete linkage algorithm.And then each PCA model is built based on the collected historical normal data of variables in each block.As a result,a series of sub-block models are constructed.Based on the approach,the block partitioning step is carried out adaptively on operation data and independently on process knowledge.Therefore,the built sub-block models are not fixed but change with the operation condition,which makes the fault detection results more interpretable and improves the performance of both fault detection and isolation.The case study of TE process illustrates that the method is more feasible and efficient for both fault detection and isolation.
Chinese Journal of Chemical Engineering2015年6期