Yanlin He,Yuan Xu,Zhiqiang Geng,Qunxiong Zhu*
College of Information Science&Technology,Beijing University of Chemical Technology,Beijing 100029,China
Keywords:Soft sensor Auto-associative hierarchical neural network Purified terephthalic acid solvent system Matter-element
ABSTRACT To explore the problems of monitoring chemical processes with large numbers of input parameters,a method based on Auto-associative Hierarchical Neural Network(AHNN)is proposed.AHNN focuses on dealing with datasets in high-dimension.AHNNs consist of two parts:groups of subnets based on well trained Autoassociative Neural Networks(AANNs)and a main net.The subnets play an important role on the performance of AHNN.A simple but effective method of designing the subnets is developed in this paper.In this method,the subnets are designed according to the classification of the data attributes.For getting the classification,an effective method called Extension Data Attributes Classification(EDAC)is adopted.Soft sensor using AHNN based on EDAC(EDAC-AHNN)is introduced.As a case study,the production data of Purified Terephthalic Acid(PTA)solvent system are selected to examine the proposed model.The results of the EDAC-AHNN model are compared with the experimental data extracted from the literature,which shows the efficiency of the proposed model.
An accurate model characterizing the nonlinear behavior of chemical processes is very important in effective control and monitoring of processes.These targets may be achievable through a reliable process model formulated classically on the basis of mass and energy balance[1]for chemical processes.Whereas,when someone adopts this method,some unknown parameters have to be fitted by employing nonlinear optimization methods[2].For avoiding the difficulties of traditional mathematical models,neural networks[3,4]are utilized as an alternative technology.
The using of artificial neural networks(ANNs)has been proved to be successful in function approximation for their ability to map highly nonlinear behaviors[5].The layered forward artificial neural networks(FANNs)with simple structure[6,7]are conventionally adopted in process modeling.The ability of FANNs to approximate any nonlinear relationship between a set of inputs and outputs has been illustrated[8-10].However,FANNs have a poor ability in dealing with high dimensional data.When dealing with high-dimensional datasets,neural networks with simple structures need more neurons to obtain reasonable results.As a result,more time is needed to train the networks.Moreover,the neural networks with simple structures suffer from some disadvantages such as getting local optimum values more easily and the poor ability of generalization.To overcome these weaknesses,dimension reduction needs to be carried out.To reduce the dimension,Principal Component Analysis(PCA)or PCA-based methods[11]are adopted by most researchers.As a multivariate statistical technique,PCA can visualize the original high-dimensional data on a lower dimensional picture from the most informative viewpoint.However,with more and more complex data generated from chemical processes,researchers find it harder to discover knowledge directly from the mass data,even in a low-dimensional space by using statistical techniques[12].Therefore,a more suitable dimension reduction strategy is introduced in this paper.
The Auto-associative Hierarchical Neural Network(AHNN)is one of topology strategies focusing on dealing with input dataset in a high dimension.The AHNN generally consists of two parts:groups of subnets and a major net(Fig.1).The subnets are based on some well trained Auto-associative Neural Networks(AANNs),which was proposed by Namphol et al.[13]in the process of image compression.AANN is trained using gradient-based learning algorithms.In AANN,the node number in the input layer and the output layer is the same.The activation functions used in AANN are the sigmoid functions.This structure has been proved to be easier to obtain the specific minimum error by Baldi and Hornik[14,15].And it has been reported that the AANN has the advantage of filtering the redundant information out and compressing the information with fewer hidden nodes in subnets,which can enhance the generalization ability of neural network models[16].

Fig.1.Structure of Auto-associative Hierarchical Neural Network(AHNN).
In order to make the model effective,the subnets need to be established rationally.In the paper ofAly and Atiya[17],a method called WRAPPER was used to attempt the possible combination with the attributes subnets in the Ensemble Neural Network(ENN).Professors Rokach and Maimon[18,19]had established Decision Tree according to the attribute decomposition which almost listed all the possible ensembles of attributes.Zheng et al.[20]established the subnets based on the attribute decomposition of input data based on K-means method[21].Although the performance of AHNN can be enhanced,the above methods were difficult to implementand the validity of attributes classification was not guaranteed.To solve this problem,a simple but effective method for building the subnets is introduced.In this method,an algorithm named Extension Data Attributes Classification(EDAC),based on the Extension theory[22,23],is adopted.The EDAC is used to complete the classification of the input data attributes.Then the subnets are designed according to the classification of the data attributes.
The aim of this research is to introduce a detailed procedure for establishing EDAC-AHNN soft sensor for chemical processes with a lot of input parameters.The EDAC method is used to make classifications of the input data attributes.Then the subnets of EDACAHNN are established with the classification results.As a case study,an EDAC-AHNN model for PTA system is developed.The results of the proposed model are compared with the results drawn from the literature.The results show that the model designed in this work has an acceptable generalization capability and a higher accuracy.
The Extension Data Attributes Classification(EDAC)method is based on Extension theory[22,23].The matter element model in Extenics is a good tool for describing the abstract knowledge more clearly and has been successfully applied in production operation[24].The EDAC algorithm is modified from Extension Neural Network(ENN)proposed by Wang[25-27].This method consumes less time and has a higher accuracy for classifications.In this paper,the whole EDAC algorithm is presented in a manner of matter element models,which makes it easier to be understood than the original method.It should be emphasized that the algorithm is involved in the attributes of samples but not samples.The general steps of the algorithm are given as follows:
Step 1 Suppose that there is a set of samples with the total number of n and every sample owns K-dimensional input attributes,and the vector Siis used to describe the sample:

By making samples arrays transpose,the attribute vector can be obtained:

Obviously,the attribute vectors et is the result of the transposing of the sample vector set.Different from the previous method,in our method only data attribute classification is made,which is used to build the subnet automatically.Although it does not need the expert or experience knowledge,it takes the overall properties of all samples into account.
Step 2 According to the dimension of attribute vector,the matterelement[22,23]model of input data is built,and the maximum and minimum values of its every attribute are found.Take sample S in Step 1 as an example.

where ai,max=max{aij},ai,min=min{aij},where i=1,2,…,n.Then the normalization of the data should be analyzed.It can be made by changing the matter-element RAjby deletion transformation and scaling transformation.

where T1stands for deletion transformation,T2stands for scaling transformation and NAjstands for the j-th normalization input data.
Step 3 Describe cluster centers by using the multidimensional matter-element and record the center change during data processing.

where Cjstands for the j-th cluster centers,cnjstands for the value of the n-th feature in the j-th cluster,Vnj=〈,〉stands for the classical field of cnj,stands for the lower limit andstands for the upper limit.
Step 4 According to prior knowledge,select the distance threshold parameter λ.The parameter λ is used to measure the distance between the cluster center and the desired boundary.It is a user defined parameter that must be judiciously determined from an engineering knowledge of the system requirements.Then=cnj? λ,=cnj+λ.k stands for the number of cluster and Mkstands for the number of sample in the k-th class.Read the first sample(j=1),create the first class and initialize variables of k=1 and Mk=1,then the first class center is obtained by copying and actively transforming RNA1.That is:

Then,calculate the extension distance EDlbetween RAjand the existing k-th class center:

Step 5 Next j=j+1,use Eq.(8)to calculate the extension distance between the next sample and the h-th cluster center of existing k clusters.Then find the least distance among the k extension distances.

Step 6 During the data processing,if EDp>1,it is shown that the i-th sample does not belong to the existing class,then a new cluster needs to be created.Let k=k+1 and Mk=1,then the new class center can be obtained by copying and actively transforming.
Otherwise,if EDp<1,it is shown that the j-th sample belongs to the m-th class center,then Mm=Mm+1 and the replacement transformation is carried out to update the m-th class center.

Meantime,re-determine whether the existing centers change.If the x-th sample center changes from o(old)class to m class,then Mm=Mm+1 and Mo=Mo?1.Use Eq.(9)to update its center and use risk exchange to amend Eq.(9)for updating the o class center.
Step 7 Set j=j+1,repeat the above steps until all the input samples have been compared with the existing clusters.
Step 8 If the clustering process has converged,go to Step 9,otherwise,return to Step 3.
From descriptions,the algorithm flowchart is shown in Fig.2.

Fig.2.Flowchart of classification algorithm of extension data attributes.
In this study,the EDAC method is applied to establish the subnets of the AHNN,and then the more suitable structure of the main net with good generalization ability is selected in order to model the PTA system.
Consider a typical plant data as shown in matrix X,the multivariate data can be organized in W variables and Z samples:

In the matrix X,the variables numbered from z1 to z(W?1)are input variables and the variables numbered zW are output variables.For the sake of obtaining the data attribute matrix,the output variables in matrix X should be deleted firstly,and then the data attribute matrix is shown in matrix X*by making the matrix X transpose:

The multivariate data can be organized in Z variables and(W?1)samples in matrix X*,and each row in the matrix X*stands for an attribute vector,which is the same vector as shown in Eq.(2).In the following steps,the classification duty of the data attributes can be completed according to the EDAC algorithm.
For the AHNN,the most important procedure is how to establish reasonable subnets.In AHNN the subnets are based on AANN,which owns the same number nodes in the input layer and output layer.The node number in hidden layer is smaller than the input node number.The auto-associative networks are used to reduce the dimensionality of the input vector.It is trained with the same way of Back Propagation(BP)network.When the training step is well finished,remove the output layer and keep the connection weights between the input layer and hidden layer fixed.Then attach the input and hidden layers of the dimensionality-reduction network to a standard back propagation network(the main net),where the output of hidden nodes in subnets is regarded as the “input”of the main net.It has the functions of reducing dimensions and filtering.
For AHNN,the subnets are designed with the EDAC method.Through attribute classification,some attribute categories are obtained.Each category can compose a subnet,that is to say,the subnet number of AHNN is the same with the classification number.The main net consists of one hidden layer and one output layer.For getting better generalization ability,the node number in hidden layer of main net is decided by the relation below:
11.Here I cannot stay: Ashliman points out that the tale obviously require that the abused woman give up all the privileges of her father s position when she makes her escape .Return to place in story.

where I is the number of nodes in input layer of main net.This equation is extracted from paper[28],which is adopted to decide a more suitable hidden layer node number.
Based on the above mentioned,the EDAC-AHNN algorithm generally consists of six steps.
Step 1 Sample normalization Randomly divide the sample,two-thirds of the sample are used for training sample and the remaining are used for testing sample.
Step 2 Data attribute classification Use the extension data attribute classification method mentioned in Section 2.1 for getting attribute classification results.
Step 3 Subnet establishment Establish the subnets correspond to the data attribute classification results from the previous step.The number of subnets equals to the number of attribute classification.The nodes in the input layer of each subnet depend on the dimension of individual classification groups of attributes and the hidden layer nodes design is implemented by real application.Usually the hidden nodes number is less than the input nodes number,which could be treated as the compression or filtering part of the whole network.
Step 4 Main net establishment For getting better generalization ability,the node number in hidden layer is decided by a more reasonable way with Eq.(12).
Step 5 Training processing Train the network based on BP algorithm.EDAC-AHNN is based on some well trained AANNs,which are used as subnets excluding their output layers.As a result,the weights between the input layer and the hidden layer of EDAC-AHNN subnets are kept unchanged,and the outputs of local subnets are the inputs of major net.
Step 6 Generalization processing Use the testing data sample to generalize the trained neural network.
Based on above analyses,the flowchart of establishing the EDACAHNN is shown in Fig.3.

Fig.3.Illustration of the procedure to establish an EDAC-AHNN model.
Four steps are carried out to design the EDAC-AHNN for PTA as follows.
PTA solvent system[29]is composed of solvent dehydration tower and N-butyl acetate(NBA)recovery unit.The flow chart is shown in Fig.4.PTA solvent system is an important part of the PTA production process,which is mainly used for the purification of PTA oxidation section of acetic acid.In the actual running of the PTA solvent system,there are many variables,which are related in complex relationships.Through analyses,17 factors are selected which affect the acetate consumption effectively as the input variables:feed composition(acetic acid content),feed quantity,water reflux,NBA main reflux,NBA side reflux,steam flow,produced quantity of top tower,feed temperature,reflux temperature,temperature of top tower,temperature point above the 35th tray,temperature point between the 35th tray and 40th tray,temperature point between the 44th tray and the 50th tray,tray temperature near the sensitive plate,tray temperature near the sensitive plate,controllable temperature point between the 53rd tray and the 58th tray and reflux tank level.The attributes of input variables are labeled from 1 to 17.That is to say,the feature number of PTA dataset is from 1 to 17.The output variable of the process model is the conductivity of the top tower,which can reflect changes in the acetate acid content of top tower.

Fig.4.PTA process flow chart.
Through collecting the on-site data,260 instances of PTA data are obtained.The data consist of 17 input attributes and one output attribute.The EDAC-AHNN model is used while the PTA system is running in a stable phase.When the system runs in a stable phase,the variables are all in small changes.Thus,it allows us to collect the data every an hour.The processed data are divided into 174 groups of training sample set(66.7%of data)and 86 testing sample set(33.3%of data)for testing,and all data are normalized into 0-1 range to avoid the scaling effect of parameter values.
In EDAC method,a large threshold leads to a far distance between the cluster center and the desired boundary.Then a cluster may hold many elements,which causes only few classifications.On the contrary,a small threshold leads to a short distance between the cluster center and the desired boundary.Then a cluster can only hold several elements,which causes many classifications.Few classifications or many classifications are a bad result that should be avoided.The subnets are designed according to the classifications of data attributes,so a proper value of threshold should be selected.Through trial and error,the threshold is taken as 0.15.A reasonable classification result of PTA data attributes is acquired as shown in Table 1.

Table 1 Classification results of PTA dataset with EDAC method
The subnets are built according to the data attribute categories.The number of subnets is equal to the number of attribute categories.In each individual subnet,the number of neurons in the input layer depends on the number of attributes in the corresponding category.The parameters in each category represent the inputs of the corresponding subnet.Note that if the number of attributes in a category is less than 3,there is no need for establishing AANN-based subnets.Under this circumstance,the attributes in that special category will be directly used as the“input”to the main net.During training an AANN,the compression ratio(the ratio of the number of nodes in the input layer and that in the hidden layer)is usually set between 2 and 8.The optimal node number in the hidden layer is determined according to the performance of AANN in terms of the average relative error.After the AANN is trained well,the weights between the input layer and the hidden layer are kept unchanged.By detaching the output layer,the remaining part can be used as a subnet.The AANNs-based subnets designed with EDAC are shown in Fig.5.

Fig.5.Illustration of AANN-based subnets designed with EDAC.
Then the subnets are established with the five individual attributes categories:{1,4,9,17},{2,6,7},{3,5},{8,10}and{11,12,13,14,15,16}.There are 5 categories and thus the EDAC-AHNN is composed of 5 subnets.When training the AANN,the neuron number in the hidden layer is increased gradually.According to the performance of AANN in terms of the average relative error,the optimal number of hidden neurons is determined.In Category 1,there are 4 attributes and thus there will be 4 neurons in the input layer of Subnet 1.The optimal number of neurons in the hidden layer is 2.Therefore the topology of Subnet 1 is 4-2.In Category 2,there are 3 attributes.Thus 3 neurons are needed in the input layer of Subnet 2.The optimal number of neurons in the hidden layer is determined as 2.Thus,the topology of Subnet 2 is 3-2.There are 2(less than 3)attributes in Category 3 and Category 4,so the parameters should be directly input into the main net.In Category 5,there are 6 attributes and thus there will be 6 neurons in the input layer of Subnet 5.The optimal number of neurons in the hidden layer is 4.So the topology of Subnet 5 is 6-4.
The network of EDAC-AHNN contains four layers as shown in Fig.6.The training data set consists of 17 dimensions.Through the classification and auto-association subnets,the input attributes reduced to 12 dimensions.According to Eq.(12),the node number in hidden layer of main net is 7.For EDAC-AHNN,the subnets and afterwards main net training are started with random initial weights.The initial weights are chosen in the ranges from?0.6 to 0.6 in accordance with the recommendations of Al-Shayji[30].By testing several different initial conditions,the network performance is verified.

Fig.6.EDAC-AHNN structure for modeling PTA.
The type of transfer functions is another important factor in the design of the main net.To select the most suitable transfer function for PTA,different kinds of activation functions are examined,including linear,sigmoid and hyperbolic tangent functions.In the end,the sigmoid function gives the best performance in terms of relative errors compared to other activation functions in the learning process.Thus,it is chosen to be the activation function as given below:

For the PTA data set,during the experiment,the learning speed parameter and the movement parameter of the proposed model are set as 0.3 and 0.7,respectively.The error goal of the proposed models is 0.001 and the maximum limitations are 1000 epochs of training.
The relative error is used as the criterion of accuracy described as:

In Eq.(14),it is noted that xewill not be the denominator when it equals to zero.N is the total number of samples,xiis the real value simulated in the neural network model and eiis defined as the single relative error.
The regression analysis is another widely used criterion,which provided a measure of the correlation strength.The Pearson productmoment correlation coefficient is the best known formula to calculate this index,which is shown as follows:

Some values are obtained from Sections 3.1 and 3.4,which are the real values of PTA process and the predicted values of EDAC-AHNN model and EDAC-AHNN model during training and generalization phase,respectively.
For EDAC-AHNN model,the distribution of training data is showed in Fig.7.The training is stopped and all weights have been frozen for network undergoing generalization phase.A relatively high degree of correlation between actual and predicted values of training data is observed as shown in Fig.8.Coefficient of determination(R2)of 0.8974 is obtained for training data.When the network is well trained,generalization of the network with testing data set is carried out.During generalization phase,the output of the data is not present to the network.The distribution of testing data is showed in Fig.9.A high degree of correlation(R2=0.8793)between actual and predicted values of testing data is observed as shown in Fig.10.

Fig.8.Regression analyses between the actual values and the predicted values of EDACAHNN(training data).
Some regression analyses between the predicted outputs and the actual values are carried out.Seen from the figures(Figs.8 and 10),the correlation degree obtained from EDAC-AHNN is 0.8974 for the training dataset.And the correlation coefficient obtained from EDAC-AHNN is 0.8793 for the testing dataset.
The BP model has been successfully applied to simulate nonlinear systems.And Zheng's model[20]gives a successful example of using AHNN to monitor the PTA system.In order to show the effectiveness of EDAC-AHNN model,the results of EDAC-AHNN model are compared with those of BP model and Zheng's model[20].From Table 2,it is shown that the average relative error of EDAC-AHNN for the training set is 0.2964%.The average relative error for the testing set is 0.3126%,which is smaller than those of DAD-AHNN model of Zheng[20]and BP models.It indicates that the proposed model has better generalization ability.

Fig.7.Distribution of PTA dataset with EDAC-AHNN model during the training phase.

Fig.9.Distribution of PTA dataset with EDAC-AHNN model during the generalization phase.

Fig.10.Regression analyses between the actual values and the predicted values of EDACAHNN(testing data).

Table 2 Comparison between EDAC-AHNN model and Zheng's models
Moreover the residuals for training and testing dataset with EDACAHNN model are plotted in Fig.11.The values of y axis in Fig.11 stand for the difference between the actual values and the predicted values of EDAC-AHNN.It is clear that the absolute values of the residuals are all less than 1,which are much less than the actual values.It is expected that all the residuals are distributed near the zero line in a random manner with no clustering or trending.Therefore,it can be assumed that the errors are distributed normally and the model is used for certain purpose with reasonable accuracy.
An exact soft sensor for complex chemical processes is very important for many purposes.In this article,a method using Auto associative Hierarchical Neural Network(AHNN)is introduced.The subnets of AHNN are designed from the perspective of the data attributes.The Extension Data Attributes Classification(EDAC)method is used for making classifications of input data attributes.The subnets of AHNN are established with the classification results.Finally an EDACAHNN model is established as a soft sensor for the Purified Terephthalic Acid Solvent System.Because of the method used in our work consumes less time,the EDAC-AHNN model would be remodeled quickly when the conditions of chemical processes change.
The results of training and testing data sets are determined and compared in terms of the relative error and regression analyses.The EDAC-AHNN gets a smaller relative errors and a rather high correlation coefficient,which confirms the accuracy and generalization ability of the proposed model.
In summary,the study demonstrates the capability of the artificial intelligence to model the complex and highly nonlinear relation of complex chemical processes.In addition,the proposed method is also suitable for other complex chemical processes or systems.The proposed method can provide a guide to develop models for product monitoring and model-based control strategies in chemical plants with lots of input parameters.

Fig.11.Distribution of residual for training and testing data with EDAC-AHNN model.
Chinese Journal of Chemical Engineering2015年1期