999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Facial Expression Recognition with High Response-Based Local Directional Pattern(HR-LDP)Network

2024-03-13 13:21:02SherlyAlphonseandHarshitVerma
Computers Materials&Continua 2024年2期

Sherly Alphonseand Harshit Verma

School of Computer Science and Engineering,Vellore Institute of Technology,Chennai,India

ABSTRACT Although lots of research has been done in recognizing facial expressions,there is still a need to increase the accuracy of facial expression recognition,particularly under uncontrolled situations.The use of Local Directional Patterns(LDP),which has good characteristics for emotion detection has yielded encouraging results.An innovative end-to-end learnable High Response-based Local Directional Pattern(HR-LDP)network for facial emotion recognition is implemented by employing fixed convolutional filters in the proposed work.By combining learnable convolutional layers with fixed-parameter HR-LDP layers made up of eight Kirsch filters and derivable simulated gate functions,this network considerably minimizes the number of network parameters.The cost of the parameters in our fully linked layers is up to 64 times lesser than those in currently used deep learning-based detection algorithms.On seven well-known databases,including JAFFE,CK+,MMI,SFEW,OULU-CASIA and MUG,the recognition rates for seven-class facial expression recognition are 99.36%,99.2%,97.8%,60.4%,91.1% and 90.1%,respectively.The results demonstrate the advantage of the proposed work over cutting-edge techniques.

KEYWORDS Emotion;classification;CNN;network;HR-LDP

1 Introduction

Human Computer Interaction (HCI) primarily consists of the study of interface design,with its applications concentrating on user-computer interaction.Since computers are used in almost every area of daily life,HCI applications are found in every industry,including social science,psychology,science,industrial engineering for computers,and many more.A crucial field of research in pattern recognition and computer vision is Facial Expression Recognition (FER).FER has emerged as a crucial research area within computer vision and artificial intelligence,offering profound implications for diverse applications,such as human-computer interaction,emotion-aware computing,and affective computing.Automatic emotion recognition from facial expressions is an interesting research topic that has been used in healthcare,social networks,and human-machine interactions,among other domains.To improve computer prediction,researchers in this discipline are working on methods to decode,analyze,and extract these characteristics from facial expressions.The remarkable success of this technology has led to the use of numerous deep-learning architectures to boost performance[1].Emotions have a natural influence on human behavior and are important in shaping communication and behavior patterns.Accurately analyzing and interpreting the emotional content of facial expressions is essential for a deeper understanding of human behavior.Computer systems still struggle to accurately identify facial expressions,even though it requires little to no effort for a person to recognize faces and decipher facial emotions.It is believed that analyzing a person’s facial features and determining their emotional state are incredibly tough tasks.The main obstacles are the irregularities of the human face and variations in elements such as direction,lighting,shadows,and facial posture.Research has indicated that disparate individuals can identify distinct emotional states within an identical facial expression.FER involves many hurdles,including the need for diverse training data and pictures featuring a range of ethnicities,genders,and nations,among others.Deep learning methods have been researched as a stream of techniques to achieve resilience and provide the required scalability on new forms of data [2].It is necessary to acquire a proper classification model that is both subtle to minute differences in the appearance of facial emotions and resilient to larger variations to recognize facial expressions under uncontrolled situations.For recognizing facial expressions,a variety of pre-trained deep neural networks can be used.These networks have a great number of parameters that can be learned,yet they were trained and used on quite varied applications.The neural aspects make it challenging to accurately train neural networks for facial emotion recognition.To overcome this,in this study,a big neural network that is trained on extensive facial emotion recognition datasets is chosen which is later used to train a small neural network.The small network has a lesser number of parameters to be learned than the large network.The suggested network is then built using its convolutional layers,and the complete structure is trained with facial expression photos.

The main objective is the construction of neural network models that support the input of the images in the right format and produce an output that can be mapped to a classification of emotion.After the successful building of the model,testing and troubleshooting also have to be done to maximize the accuracy and also to perform analysis via various metrics available to cross-examine the efficiency and the correctness of the model.Another major aim is to try and eliminate problems present in the dataset such as cross-oriented images,wrong facial position,alignment issues,etc.This has to be addressed because the images when they are disoriented,will lead to bad predictions due to unnecessary parallax error and wrong orientation of the images.The next issue is edge detection and the reason for performing edge detection is to enhance the facial features and boost the parts where emotion is displayed,like the position of the mouth,eyebrows,eyes,and even the nose.The alignment problems are rectified using a face detection and alignment method“Chehra”in the proposed work.The proposed High Response-based Local Directional Pattern(HR-LDP)based classification method also uses the Kirch filter which eliminates the noise in images and accurately captures the sharp edges that represent the structure of the face.The major contributions of the proposed work are as follows:

? A novel HR-LDP network-based classification is proposed in this work with a module for eliminating noise using high responses obtained from Kirsch filters that reduce the computation while increasing accuracy.

? The proposed work suggests a novel learnable HR-LDP network that reduces the number of learnable parameters compared to the existing works.

? Compared to existing deep learning-based detection algorithms,the parameters in our fully linked layers can save up to 64 times the cost while outperforming state-of-the-art techniques.

The paper is structured as follows:The state-of-the-art techniques for facial emotion recognition are reviewed in Section 2.Then,in Section 3,the suggested learnable HR-LDP network is presented.The specifics of the experimental setting are provided and the findings of the detection are then displayed and analyzed in Section 4.Finally,this paper is concluded in Section 5 with the guidelines for future research.

2 Related works

This section presents a detailed survey of the existing works.Table 1 gives a summary of the latest works in literature.In[3],the authors have used the Cohn Kanade(CK+)dataset that is available to the public.They forwarded it through four different Convolutional Neural Networks(CNN)which implement transfer learning.They were VGG-19,ResNet-50,MobileNet and Inception V3.After the image pre-processing and the feature extraction were done,they passed it through the 4 networks and compared the performance of each one with the other.Reference[4]suggested a novel technique called Facial Emotion Recognition using Convolutional neural networks(FERC)and used it for this problem.FERC is a 2-part CNN,one for removing the background of the image and the other for the classification into one of the five emotions set.They tested the algorithm with CK,Caltech,CMU and NIST datasets.In[5],the authors have used deep CNNs with 2 layers that are included with dropouts after each layer.It is passed through an activation function and then to the pooling layer.The same is repeated in the next layer.The final dense layer has 5 units representing each emotion.

Table 1:Summary of literature survey with the algorithms

In[6],the authors have developed a FER system,and it has been verified on eight different pretrained Deep CNN models with the Karolinska Directed Emotional Faces (KDEF) and Japanese Female Facial Expression (JAFFE) facial datasets.On application of a 10-fold cross-validation,the best model uses DenseNet-161.The CNN algorithms[7]are used by several works in literature that have shown superior performance.Among that,the authors in[12]have proposed a CNN-based single classifier that achieved high performance.It also performed the necessary pre-processing.The model has two Convolution layers,two sub-sampling layers and an output layer.They also used a maxpooling and flattening layer with the final activation function as SoftMax.They got an accuracy of 97.6%.Also,Reference[15]did the necessary pre-processing by taking the mean shape and mapping the dataset with the closeness from the mean shape.Notably,the authors in [16,17] conducted a comprehensive review focused on CNNs for FER.Their study explored various CNN architectures and methodologies,showcasing their effectiveness in capturing spatial hierarchies within facial images.The studies from [18] and [19],have significantly transformed FER.These works highlight the proficiency of CNN in capturing spatial hierarchies and achieving impressive performance,along with the critical contributions of data augmentation and feature extraction in improving FER accuracy and robustness.Despite the remarkable strides made in Facial Emotion Recognition(FER),the field continues to grapple with a series of substantial challenges and limitations that warrant thorough exploration.While FER algorithms [20–24] have shown proficiency in identifying basic emotions,the recognition of nuanced and subtle facial expressions remains an ongoing research frontier.The intricate interplay of various facial muscles and features,especially in complex emotional states,poses a significant challenge for current models.Inside the neural network,the different combinations of layers can accomplish a task with high accuracy.This work proposes a novel HR-LDP network-based classification that helps to attain good accuracy while classifying six datasets and learning a smaller number of parameters.The proposed work is explained in Section 3.

3 The Proposed Work

The architecture of the suggested work is shown in Fig.1.The three main elements of this network are convolutional layers,a fully connected layer for HR-LDP computation,and another fully connected layer that is proportional to a loss function.This network creates feature maps associated with expression by applying convolutional layers to the input image.The three modules that make up this network are the convolutional layer,HR-LDP layer and loss function layer,as shown in Fig.1.A classification layer is also used at the end to predict the emotions using a classification algorithm like SVM.The loss function module is used to train the parameters of the network.The convolutional and HR-LDP layers are used to extract simulated HR-LDP features,and classification layers are used to predict the emotion.The main elements of the suggested neural network are thoroughly described in this section.

3.1 Convolution Layer

The faces are detected from the sample images from the dataset using a‘Chehra’[20]face detector.In the proposed work,a face detection and alignment tool ‘Chehra’is used to solve the alignment problems.The convolutional feature maps for the original images are created by forward-propagating the unprocessed pixels via the initial module.More precisely,there are three convolutional layers in the initial module:two for convolution,one for pooling,and one for Restricted Linear Units(ReLU).In addition,it reduces the effect of initializing filter parameters.Before the ReLU layer,a batch normalization(BN)layer is used.This is depicted as

whereIis the BN layer’s input.The mean and variance ofIareμ,σcorrespondingly.Here,′Υ andβare scale and shift factors,respectively,while a constant ∈is further added to the variance to account for numerical stability.

Figure 1:The architecture of the proposed facial expression recognition network

3.2 HR-LDP Layer

The HR-LDP layer performs convolution using Kirsch masks [24] and extracts only the high responses related to shape and texture information which is then normalized using Sigmoid function and then the histograms are extracted using gate functions as in the subsequent sections.

3.2.1 Convolution Using Kirsch Filter Masks

The Kirsch masks in Fig.2 are applied on the output from the convolution layer and the eight responses are obtained on which max pooling is applied.

Figure 2:Kirsch mask

Hereσargmaxis obtained by max pooling.The pool_size=2 and strides=2 are used when creating a MaxPool2D layer.The MaxPooled output is obtained in tensor form by applying the MaxPool2D layer on the matrix.When it is applied to the matrix,the Max pooling layer will iteratively compute the maximum of each 2×2 pool with a 2 jump.The values are then normalized using the sigmoid function and given to the gate functions for histogram formation.

3.2.2 Histogram Calculation

A histogram shows the probability distribution of a quantity in different bins.Different appearance-based feature extraction techniques have been developed,which process the image using either manually applied or learnable filters and a histogram to calculate statistical data.CNN can be thought of as a collection of learnable filters when feature maps are generated at the output of convolutional layers.The feature maps are first flattened,and then they are added to a layer with all connections.A simple method for constructing the histograms of feature maps involves applying specific shifted step activation functions to the obtained feature maps and then aggregating each result as a bin of histograms.However,gradient-based learning is incorrect since the step function’s derivative is infinite at its edges and zero everywhere else,and the gate function determines the variable’s histogram in the range[0,1].

where n denotes the histogram’s number of bins.The gradient of Eq.(3)in the backpropagation stage is taken to be 2nduring 0

Here TheHhistogram’s ithbin is designated asHi.The current feature map is FM.Eis the number of feature map(FM)elements,mis the number of histogram bins,andfis the gate activation function mentioned in Eq.(4).The feature map used to calculate the histogram is called FM.In the suggested CNN,executed with average pooling operators.The input variable should fall between 0 and 1 as is expected for histogram calculation with the gate function.However,this presumption might not apply to feature maps.Consequently,the input of the gate activation function needs to be normalized to[0,1]to be used for histogram calculation.The sigmoid function can be utilized for this.Nevertheless,at very large/small values,the sigmoid is saturated.To solve this issue,

Figure 3:The feature map and gate function

Figure 4:Gate function

As in Fig.1,the feature values for the histogram computation layer are initially constrained using batch normalization to prevent sigmoid function saturation.The values are then normalized to [0,1] using a sigmoid activation function.The output of the sigmoid function is then shifted n times.Ultimately,n-gate activation functions and the n-bin histograms are calculated via average pooling.The computed histograms show feature-specific statistical data maps for the image input.Convolutional neural networks can employ this feature map histogram computation approach without any issues to the learning process.The generated histograms are then integrated into the completely connected layer of the proposed network which is explained in the following section:

3.2.3 Loss Function

The most popular SoftMax loss function is therefore utilized as in Eq.(6) to quantify the classification error following the extraction of HR-LDP features.The SoftMax loss function can optimize the likelihood of the correct class during the training stage and fine-tune the network parameters based on Back Propagation (BP).Hereiis the training sample index andnrepresents the count of training samples.[Y=Y1,Y2,Y3,...,Yn]is the label set and[Yi=yi1,yi2,yi3,...,yiv]is the prediction vector of the ithtraining sample.The predicted value is denoted byyiv,and the number of classes is indicated byv.To combine the data on facial movement during testing,the HRLDP features are taken from a video sequence and the average is calculated and converted into a feature vector.The averaged features are then classified using Support Vector Machine(SVM)classifier.Algorithm 1 describes the basic flow of the classification module.

The SoftMax loss function,which is based on the BP method,can optimize the likelihood of the correct class during the training stage and fine-tune the network parameters.The given testing sample is classified and the results are given in the next section.

4 Results and Discussion

The suggested approach uses Matlab 2018a for its experiments.

4.1 Datasets

The research makes use of six datasets,including JAFFE [26],Cohn Kanade (CK+) [27],Oulu-CASIA NIR&VIS facial expression database(OULU-CASIA)[28,29],Man Machine Interface(MMI)[30]Multimedia Understanding Group(MUG)[31]and Static Facial Expressions in the Wild(SFEW)[32,33].

4.2 Experimental Analysis

The high computational complexity is a significant limitation for state-of-the-art descriptors like Gabor.The accuracy of every other feature descriptor in literature is far lower,especially under unrestricted circumstances.So,HR-LDP is incorporated into the proposed model which achieves high accuracy under low complexity.SFEW dataset poses significant challenges because it was collected in unrestricted circumstances.Tables 2–7 demonstrate the effectiveness of the suggested strategy by listing both the count of samples that were properly identified and the count of samples that were erroneously classified.The neutral and depressed expressions are confused when predicting other images during the classification of the photos from the JAFFE dataset with the suggested method,as in the confusion matrix given in Table 2.As seen in Table 3,the CK+dataset’s classification accuracy for anger and neutral emotions is significantly lower.Expressions like neutral,happiness,and surprise are mixed up with other emotions in the MUG dataset,as shown in Table 4.The fundamental issue with the SFEW dataset is that the samples of the various classes are out of balance and that the photographs were taken in an unrestricted environment.Therefore,as seen in Table 5,more training data is required to increase accuracy.The suggested method outperforms the other current descriptors in terms of accuracy for SFEW due to its capacity to identify crisp edges and its scale and rotationinvariant characteristics.In comparison to other available datasets,the classification accuracy of the SFEW dataset is lower.When equated to the other descriptors currently used in the literature,however,SFEW obtains a greater accuracy utilizing proposed technique,as shown in Table 5.Most other facial expressions can be mistaken for the disgusted face.As in the confusion matrices provided in Tables 6 and 7,fear and sadness facial emotions cause misunderstanding with the rest of the expressions in the Oulu-CASIA dataset and MMI.

Table 2:Matrix showing the confusion in the JAFFE dataset

Table 3:Matrix showing the confusion in the CK+dataset

Table 4:Matrix showing the confusion in the MUG dataset

Table 5:Matrix showing the confusion in the SFEW dataset

Table 6:Matrix showing the confusion in the Oulu-CASIA dataset

Table 7:Matrix showing the confusion in the MMI dataset

Figs.5–10 compare the recognition outcomes.In comparison to more current methods like intercategory distinction feature fusion network [34–38] and ROI-guided deep architecture [39–42],the suggested study attains greater accuracy.Because there is less likelihood of overfitting[43–45],less data noise,improved discriminating,and improved data visualization,the proposed approach performs better.The recommended feature extraction technique automatically chooses only the relevant data needed for this activity.This work suggested using a new HR-LDP network to tackle the detection of emotions.The suggested network mixes deep learning and manually created features,and it can minimize the network parameters by producing statistical histograms.Numerous tests using the databases produced intriguing findings.Furthermore,unlike the majority of modern techniques,this suggested approach produces reliable performance.The VGG-face network [46] is chosen as the reference network for comparing the efficiency of our network in terms of time and memory intake.VGG-face is fine-tuned for face expression detection because it is utilized for face recognition.To be fair,identical training data is employed,as training parameters,and loss function in both the VGG-face network and our suggested LBP network.On an HP workstation set up as follows,the comparison experiments are conducted.Matlab 2018a,64 G of RAM,two Intel E-52620 v3 CPUs,and one NVIDIA GeForce GTX 1080 Ti GPU are all included with the Windows 10 Enterprise Edition operating system.The results of the comparison of time and memory are shown in Table 8.The table shows that,when training,our suggested network requires just 132 MB of memory,which is up to 25 times less memory than the VGG-face network.Furthermore,in training rounds,the proposed network outperforms the VGGface network.Depending on the input’s size the suggested network should be significantly faster than the VGG-face network due to the size of the proposed network.

Figure 5:Classification accuracy of JAFFE dataset

Figure 6:Classification accuracy of CK+dataset

Table 8:Comparison of mean time and cost of memory

Table 9 represents the different parameters used.The Stochastic Gradient Descent (SGD)approach is used for optimization during the training phase,with learning rate=0.01 and momentum=0.9.There are 100 training epochs,and from the thirty-first to the last epoch,the learning rate drops by 0.99 in each epoch.This setting of 0.5 for the dropout prevents over-fitting.The margin hyper-parameter is set to 0.2.The dimension of the feature vector obtained at the output of the histogram computation layer is 512 ~10=5120 since the count of histogram bins in HR-LDP is initialized to 10.Ten percent of the training in each trial is selected at random and utilized for validation.Table 10 represents the results obtained using different classifiers in the proposed work.The proposed work achieves higher accuracy when using SVM,deep learning techniques and CNN in the final classification layer of the proposed architecture.However,the SVM in the final layer has a lesser number of parameters,saving the computational cost and achieving higher accuracy.

Figure 7:Classification accuracy of MUG dataset

Figure 8:Classification accuracy of SFEW dataset

Figure 9:Classification accuracy of OULU-CASIA dataset

Table 9:Parameters used in the proposed model

Table 10:Classification outcomes using different classifiers in the classification layer of the proposed work

Figure 10:Classification accuracy of MMI dataset

4.3 Ablation Study:Analysis of Several Proposed Model Components

(i)By eliminating the histogram formation layer in the suggested work:At the output of module 2 in Fig.1 of the experiment,a max pooling layer is utilized to construct a 5120-dimensional feature vector.Next,using a loss function,the network is trained for seven classes of face emotion identification.

(iii)Changing from SoftMax loss function to chi-squared distance-based loss function:The loss function is defined in Eq.(6) as a SoftMax function.This is changed as an improved chi-squared distance-based loss function[47]as in Eq.(7).

(iii)Using the whole proposed work for emotion classification:In this experiment,face expression recognition is accomplished by using the whole HR-LDP and SVM.The results from three different cases are given in Fig.11.

Figure 11:Ablation study using three different cases

5 Conclusion

This novel HR-LDP network is suggested to tackle facial expression recognition.The suggested network mixes deep learning and manually created features,and it can minimize the network parameters by producing statistical histograms.Numerous tests using the seven databases produced intriguing findings.Furthermore,unlike the majority of modern techniques,this suggested approach produces reliable performance.Concerning SFEW photos with significant blur and occlusions,the suggested technique obtains greater classification accuracy compared to other methodologies in the literature,it achieves good accuracy.The results show that the suggested strategy improves classification accuracy across six datasets.Future research will concentrate on micro-expressions and the analysis of dynamic emotions in videos.

Acknowledgement:We thank Vellore Institute of Technology,Chennai for supporting us with the APC.

Funding Statement:The authors received no specific funding for this study.

Author Contributions:The authors confirm contribution to the paper as follows:study conception and design,draft manuscript preparation:Sherly Alphonse.analysis and interpretation of results:Harshit Verma.All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials:Both CK and JAFFE are openly accessible datasets.On request,more datasets from specific authors are available.Access the MUG dataset at https://mug.ee.auth.gr/fed/.Access the Oulu-CASIA dataset at https://paperswithcode.com/dataset/oulu-casia.Access the MMI dataset at https://mmifacedb.eu/.Visit https://paperswithcode.com/dataset/sfew to get the SFEW dataset.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

主站蜘蛛池模板: 国产区人妖精品人妖精品视频| 国产无人区一区二区三区| 97久久精品人人做人人爽| 国产免费羞羞视频| 国产91久久久久久| 亚洲综合天堂网| 免费A级毛片无码无遮挡| 国产高潮视频在线观看| 亚洲一本大道在线| www.亚洲国产| 91视频99| 亚洲欧美成人| 国产91麻豆视频| 黄色a一级视频| 丁香六月激情综合| 日本免费a视频| 一本久道久久综合多人 | 亚洲性视频网站| 日韩免费成人| 亚洲av无码成人专区| 露脸真实国语乱在线观看| 又爽又黄又无遮挡网站| 青草精品视频| 亚洲无码视频图片| 香蕉国产精品视频| 亚洲国产AV无码综合原创| 亚洲精品无码人妻无码| 久久国产精品无码hdav| 最新加勒比隔壁人妻| 99久久99这里只有免费的精品| 亚洲最新地址| 最新日本中文字幕| 九色视频在线免费观看| 天天视频在线91频| 国产欧美日韩专区发布| 狠狠干欧美| 青青草国产免费国产| 久久成人免费| 亚洲最新在线| 国产一区二区影院| 免费在线视频a| 青青青草国产| 国产成人高清精品免费| 欧美国产中文| 青草视频在线观看国产| 国产主播在线一区| 国产尤物在线播放| 亚洲娇小与黑人巨大交| 黄色一及毛片| 成人韩免费网站| 久久五月视频| 制服丝袜一区| 亚洲色成人www在线观看| a级毛片一区二区免费视频| 欧美福利在线| 欧美成人免费一区在线播放| 91小视频在线观看| AV熟女乱| 亚洲国产亚综合在线区| 欧美一级高清免费a| 2022国产91精品久久久久久| 欧美精品亚洲日韩a| 极品国产一区二区三区| 精品国产aⅴ一区二区三区| 爆操波多野结衣| a级毛片免费网站| 亚洲欧美激情另类| 亚洲V日韩V无码一区二区| 免费一级毛片在线观看| 久草性视频| 国产精品视频999| 毛片网站免费在线观看| 黄色网站在线观看无码| 欧美亚洲日韩不卡在线在线观看| 亚洲国产精品美女| 永久在线播放| 国产综合精品一区二区| 国产成人综合久久精品尤物| 国产欧美日韩va| 她的性爱视频| 精品一区二区久久久久网站| 国产精品视频久|