999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Grasshopper KUWAHARA and Gradient Boosting Tree for Optimal Features Classifications

2022-08-24 07:03:08RababHamedAlyAzizaHusseinandKamelRahouma
Computers Materials&Continua 2022年8期

Rabab Hamed M.Aly,Aziza I.Hussein and Kamel H.Rahouma

1The Higher Institute for Management and Information Technology,Minya,61768,Egypt

2Department of Electrical and Computer Engineering,Effat University,Jeddah,KSA

3Electrical Engineering Department,Faculty of Engineering,Minia University,Minia,6111,Egypt

Abstract: This paper aims to design an optimizer followed by a Kawahara filter for optimal classification and prediction of employees’performance.The algorithm starts by processing data by a modified K-means technique as a hierarchical clustering method to quickly obtain the best features of employees to reach their best performance.The work of this paper consists of two parts.The first part is based on collecting data of employees to calculate and illustrate the performance of each employee.The second part is based on the classification and prediction techniques of the employee performance.This model is designed to help companies in their decisions about the employees’performance.The classification and prediction algorithms use the Gradient Boosting Tree classifier to classify and predict the features.Results of the paper give the percentage of employees which are expected to leave the company after predicting their performance for the coming years.Results also show that the Grasshopper Optimization,followed by “KF” with the Gradient Boosting Tree as classifier and predictor,is characterized by a high accuracy.The proposed algorithm is compared with other known techniques where our results are fund to be superior.

Keywords: Metaheuristic algorithm;KUWAHARA filter;Grasshopper optimization algorithm;and Gradient boosting tree

1 Introduction

Nowadays,many companies are solving problems about their employees’performance by using artificial intelligence for prediction to get practical decisions in the companies.Many companies depend on the prediction of employees’performance which helps the companies to make quick and reasonable decisions.In addition,this drives the company to be successful.Organizations are paying attention to how to reduce the usage of paper in their decisions.It costs numerous resources for them.The first step to reduce this problem is identifying which employee will resign by using prediction techniques[1].

Optimization techniques play an important role in prediction.The optimization process helps to get the prediction values more accurately and faster than any other methods.Optimization refers to the process of finding optimal solutions to a specific problem.Optimization techniques are applied in prediction methods using Machine Learning (ML) and Deep Learning (DL) [2].Prediction with optimization is considered a technique of analyzing data.

On the other hand,there are many kinds of datasets that are high dimensional and contain irrelevant features.These datasets have useless information and affect the performance of prediction methods.Many authors introduced a set of methods to solve these problems.Feature selection is one of the methods which solve the problems of high-dimensional datasets[3].

Note that,the accuracy of classification and prediction does not depend on the large selection features.The classification is divided into two groups:a)Binary classification.b)Multi classifications[4].The classification is very more practical with optimization method.In this paper,we will use optimization for classification based on the feature selection.The main category is using the binary classification based on Grasshopper Optimization as a classifier in the prediction model [5,6].The work of this paper is divided into some parts.The first part is collecting datasets of the company employees.The second part is clustering,and visualizing data based on hierarchical clustering with principal components analysis.The optimizer is built to select optimal data features.The optimizer type is called“Grasshopper Optimize”.

A KUWAHARA Filter (GOKF) follows the optimizer.This new design of optimizer helps to select the optimal features based on KUWAHARA Filter(KF).KF is a non-linear smoothing filter used in image processing for adaptive noise reduction.The fact that any edges are preserved when smoothing makes it especially useful for feature extraction and segmentation.KF is based on placing asymmetric square neighborhood around each pixel of images or data of datasets and dividing it into four square sub-regions.The value of the central pixel or data is replaced by the average of general data over the most homogeneous sub-region.The sub-region refers to the lowest standard deviation values.This filter helps the optimizer to rapidly select the best solution and get the best performance.

Both prediction and classification are based on Gradient Boosting Tree.The results of the proposed technique will be compared with other results based on Gradient Boosting Classifier Tree(GBT)obtained by using Quadratic Discriminant Analysis Function(QDF).

The rest of the paper is organized as follows:Section(2):briefly introduces the literature review.Section (3):shows methodology.Section (4):discusses the empirical results of design.Eventually,conclusions are drawn in Section(5).

2 Literature Review

Several theories have been proposed for optimization techniques.Some techniques focus on how to use them in classifications and feature extractions,while others concentrate on predictions.In this section,we will show a number of the previous research which focused on different studies of optimization in different fields.Various authors have been focusing on growing ML for business studies and predicting the performance of the target of work[7].

Authors in[7],presented three main experiments to predict employee attrition.The first experiment focused on Support Vector Machine(SVM)and K-Nearest Neighbors(KNN),and the second experiment showed the usage of Adaptive Synthetic(ADASYN)to overcome the class imbalance.At the same time,the third experiment involved using manual under-sampling to balance the classes.The results were achieved using 12 features based on the random forest as a feature selection method.

Furthermore,certain authors described techniques of ML to classify the best employees in companies.Authors in[8],presented different methods of ML algorithms.ML algorithms are KNN(Neighbor K-Nearest),Na?ve Bayes,Decision Tree,Random Forest in addition to two techniques were called stacking and bagging.The results showed that Random Forest was the best method of classification.In addition to that,the Random Forest,stacking,and bagging methods achieved withdrawals of 88%.

In[9],the author described the prediction techniques based on a Hybrid of K-means clustering and naive Bayes classifier.The method achieved high accuracy in testing employee performance.

In[10],authors presented the prediction of employee attrition based on several ML models.The models were developed automatically and achieved high accurate results in prediction.

Numerous authors introduced the ML algorithms to describe the prediction of employee turnover.In [11],authors explored the application of Extreme Gradient Boosting (XGBoost) technique.That showed significantly higher accuracy for predicting employee turnover.

Moreover,authors in [12],introduced a study of how to design a system of an automatic job satisfaction based on an optimized neural network.This study was consisted of various parts.The initial part was preprocessing which was applied to convert data into numeric data.The second part was data analysis which was introduced by using three factors.Each factor described the details with the analysis of each employee.The third part showed how to determine the correlation between the factors.The authors added the genetic algorithm to enhance the quality of factors and described neural network to predict the employee satisfaction level.

On the other hand,the DL based on optimization is considered one of the more practical prediction techniques.The optimizations have been described in different research and have shown the benefit of several optimization designs such as pipeline applications.In[13],the authors described DL with pipeline optimization for the Korean language framework.The paper showed that the entity extraction and the classification were based on the F1-score.The accuracy and F1-score were 98.2%,98.4%for intent classification,97.4%and 94.7%for entity extraction.The authors showed that it is the best accuracy through the experiment of this model.

ML and DL are playing a vital role in the early diagnosis as it is important in treating diseases.There are different methods to diagnose several cases of different diseases.In [14],authors demonstrated methods by making a survey of ML techniques for diagnosing several diseases.

Likewise,in[15]authors described discrete wavelet method to enhance the images of livers disease datasets based on Optimization of Support Vector Machines(OSVM)with Crow Search Algorithm(OSVCSA).OSVCSA is used for accurate diagnosis of livers diseases.The accuracy of classification 99.49%.

LSTM plays a significant role in predicting pandemic diseases.In[16],authors introduced studies of how to predict data of COVID-19.The prediction of data is based on LSTM method and GRU by using python.The paper showed that LSTM achieved higher accuracy than GRU in prediction of COVID-19 data.

In[17],the authors introduced new techniques of ML based on supervised learning and genetic optimization for occupational disease risk prediction.There were three ML methods which were introduced and compared.One of them was based on K-Means and another one was based on Support Vector Machines and K-Nearest Neighbours (KNN).The last approach was based on a genetic algorithm.The results described that the three techniques were clustering-based techniques that allowed a deeper knowledge,and they were helpful for further risk forecasting.

In [18],the authors described a new technique of segmentation for COVID-19 in chest X-rays.They introduced a multi-task pipeline with special streaming of classifications and that helped in growing advances of deep neural network models.That helped them to train separately specific types of infection manifestation.They evaluated the proposed models on widely adopted datasets,and they demonstrated an increase of approximately 2.5%.On the other hand,they achieved a 60%reduction in computational time.

Recently,certain authors involved DL in different complex medical research such as therapeutic antibodies.Authors,in [19],showed that the optimization with DL can be used in the prediction of antigen specificity from antibodies.

As it is known,ML is of a significant benefit in predicting future outcomes.In addition to that,there are numerous Occupational Accidents around the world.Some authors introduced ML to predict the Occupational Accidents such as in[20].

In[20],authors optimized ML to predict outcomes such as injury,near miss,and property damage using occupational accident data.They applied different methods of ML and optimizations such as genetic algorithm(GA)and particle swarm optimization(PSO)to achieve higher degree of accuracy and robustness.They also introduced case study to reveal its own potentiality and validity.

In addition,there are some filters had been used in different application and approved practical results and helps in classification and predictions techniques such as KF[21].

In [21],authors introduced KF as filter with K-means cluster to extract the optimal features from images of tumors to help in segmentation process.The design help to extract tumor and help in classification process to achieve result near to 95%.Based on the review of the literature presented above,in the following section,the new method which is based on optimization with filter for employee performance will be identified,and a new optimization technique will be introduced.

3 Methodology

The work of this paper consists of several stages show Fig.1 as follows:

1.Data preparation.

2.Building Optimization and prediction model.

Figure 1:The system block diagram

3.1 Data Preparation

The first part of data preparation is based on the clustering analysis.As known,the filtration of data is the most frequent data manipulation operation.The filtration of this part after using the data is based on the library of python called“pandas”.The filtration and analysis with pandas are based on summarizing characteristics of data such as patterns,trends,outliers,and hypothesis testing using descriptive statistics and visualization[22,23].

The clustering analysis of data is based on“hierarchical clustering”.This method is used to seek and build a hierarchy of clusters.Hierarchical clustering is considered an update of the performance of K-means clusters.K-means clusters are based on four stages:

? First,decide the number of clusters(k)

? Second,select k as a random point from the data as centroids

? Third,how to assign all the points to the nearest cluster centroid

? The last stage is to calculate the centroid of newly formed clusters,and then repeat the last two steps.

The problem in K-Means clusters is the necessity of predefining the number of clusters because there are certain challenges with K-means which try to make clusters of the same size.The hierarchical clustering presented to improve this problem,so it is more practical,especially in the biggest data.There are two methods into hierarchical clustering as shown below in Fig.2:

Figure 2:General example of agglomerative and divisive hierarchical clustering methods

1.Agglomerative hierarchical clustering

2.Divisive hierarchical clustering.

In this paper,the most similar points or clusters in hierarchical clustering were processed by a series of fusions of the n objects into groups which called agglomerative.

The mathematical formula of an agglomerative method is as the following:-

-Pn,Pn-1,...,P1are observations of clusters for an agglomerative hierarchical clustering where Pncontains n single object clusters and P1consists of a single collection involving all n cases.

At each stage,the most two similar are combined.Note that,for the introductory stage each cluster has an individual object,and the mounts are joined and there are different aspects of defining distance(or similarity)between clusters[23].

-Single linkage agglomerative method:it represents the distance between the closest pair of objects,where only pairs consisting of one object from each group are taken into consideration.The distance D(r,s)is determined as(1).

-Complete linkage agglomerative method:it shows the distance between the furthest pair of objects,one from each group.The distance D(r,s)is measured as(2).

where r,s are distance between the two clusters(k,m).

-Average linkage agglomerative method:it reflects the mean of distances between all pairs of objects.Each pair includes one object from each group.The distance D(r,s)is computed as(3).

where the sum of all pairwise spaces between cluster r and cluster s“Trs”.In addition to that,the size of clusters are(Nr,Ns).

In this paper,this average linkage clustering method was applied hierarchical clustering,and after that the features of principal component analysis were added to reduce the dimensionality and increase interpretability based on the mathematical formula of it which is introduced in[24,25].

3.2 Building Optimization and Prediction Model

In this paper,this part will focus on the design of the optimizer and prediction or classifier model.The prediction model will be built based on different parts.One of these parts is visualizing data to see the performance of employees before the prediction.In this part,the visualization is divided into two categories.The first category is based on the number of employees with several projects through a set of years.The second category is creating Label Encoder Object(LEO)with splitting the data of datasets.The last part is building a model of optimization and prediction.The optimizer is based on Grasshopper Optimization(GO)followed by KF to select optimal features which help in the classifier stage.

The optimization part is based on GOKF.The first part is extracting the features of values and select optimal features based on the constructions of GOKF.

The GO is decreasing the dimensionality of data or to select the optimal feature vectors with using KF.The last part is the classifier using GBT which was applied as a predictor for the performance of employees[26].

The datasets in this paper are collected from two datasets of the online employee databases[23,27].The datasets were collected from the HR department to study the performance of employees and that helps in their decisions about employees after four years of working and that will be shown in the section of results in details.After the collection of data,the clustering method was achieved.Then the features were extracted which were optimized by using GOKF to extract and decrease the dimensionality of data or to select the optimal feature vectors.The cause of using KF with GO is the enhancement of it is more suitable in feature extraction technique[21].

The GO depends on three components (gravity Gr,social relationship Si,and horizontal wind movement Wi)which affect the flying route of grasshoppers.

The search process is based on the following equation:-

where s is the strength of social forces andPi,jis the distance between ith and jth grasshopper that is estimated asPi,j=The unit vector of i,j is indicated byas shown in(5).

We replaced this part by using KF equations as follows:-

As known,the KF filter will be applied by using the concept of his equation by dividing the regions into four regions.The regions have based on arithmetical meanmi(x,y)and standard deviationσi(x,y)and the output of the KF filterP(x,y)for any point(x,y)as shown in Eq.(6)[28-29].

The social relation of direction of swarm is s[26-28].The equation of s can be described as follows:

where b is the attractive force,r is the distance between grasshoppers and L is attractive length.Fig.3 shows the primitive corrective patterns of GO.On the other hand,the mathematical expression of grasshopper interaction can be presented by(8).

Notably,upk,Ipk are lower and upper boundand c is a coefficient which is used for reducing the comfort,repulsion,and attraction regions and it is considered the target to get the best solutions.In addition to that,k is a dimension which indicates.

The equation of parameter c can be described as follows:-

where N is the maximum iterations.

Then,the GBT will applied to classify the features which extracted from optimal solution of optimization technique.As known,GBT involves subsampling the training dataset and training individual learners on random samples created by subsampling.The GBT design in some steps:-

? The first step in the GBT was to initialize the model with some constant value.The building of the model is used to predict the observations in the training features.For simplicity we take an average of the target column and assume that to be the predicted value.

? The difference in classification is in the calculation of average of the target column by using the log of values to get the constant value after initializing the model with some constant values based on Eq.(10)

where L is loss function,p is probability of prediction and yiis the observed value.The python programming library“SKLEARN”is applied to achieve the results of the GBT and GO with K filter algorithms[30].

Figure 3:The primitive corrective patterns of Grasshopper optimization

3.3 The General Pseudo-Code of Design of GOKF

Algorithm:1:Generate the initial population of Grasshopper Pi(i=1,2,...,n)based on the KF with a few steps:-Build sub-windows for data input as the same work for data from images.-Calculation of averages and variances on sub-windows.-Choice of the index with minimum variance.-Build the filtered features by using nested loop-Extract P(x,y)for data input.2:initialize Cmax,Cmin and maximum number of iteration N 3:Evaluate the fitness f(Pi)for each Grasshopper Pi based on P(x,y)data points.4:T is the best solution 5:While(L>N)do 6:Update C1,C2 using Eq.(8).7:For i=1to M(all M grasshopper in the population using Eq.(7))(Continued)

do?Normalize distance between the grasshoppers based on Eqs.(3),(5).?Update the position of the current grasshopper based on Eq.(7).?Bring the current grasshopper back(outside boundaries).End for.8:Update T if there is the best solution 9:L=L+1 10:End While Return the best solution(The best solution is the features selection to classifier=yi for GBT Eq.(10))

4 Results and Discussion

This paper is applying prediction of employee performance under several stages:-

? Collecting the data of employees:data were collected from sample historical data for departments in the organization throughout the last five years of this sample.After that,the data is visualized by using python library to show the performance of employees based on the data collection as shown in Figs.4,5 and 6.Each figure shows the details of the structure of the collecting data.The Fig.4 shows that the total number of employees which left the company through the last five years and Fig.5 shows the total number of years spends in company.Contrast with the pervious figures,Fig.6 shows the number of employees with the total number of projects which achieved the targets in point of time through the last five years.It will be noticed that the total numbers of employees are 6000 which includes both the current and left employees.

? The next step is extracting the features from the dataset based on the clustering operation.The clustering analysis of data is based on“hierarchical clustering”.

? The classification of data introduced in some steps;First,the dismissal of employees depends on a critical factor with total number of projects through the last five years.If an employee worked from 4-6 projects through the last years,he/she is less expected to leave the company.Second,the time of work through the company is important factor to take decisions about the performance of employees.The decisions are based on the total number of hours which an employee spent in company.Notably,there is a huge drop between 3 and 4 years experienced employees.On the other hand,the percentages of employees left are 25% of the total of employees.Most of the employees are receiving salary either medium or low.The tester of Information Technology(IT) department is having the maximum number of employees followed by customer support and developer.

? Building the prediction model:this part is based on GBT:

? First is extracting features from python after that,the data were saved in CSV file.

? Second part is building GOKF to extract optimal features for classification.

? Third part gives optimal features based on GOKF and applies the prediction function based on the GBT using python function.The accuracy of the classification got 96.7% based on Eqs.(11)-(13).It is considered higher and authentic accuracy.The classification report is shown in Tab.1.

Figure 4:The total of dismissed or left employees

Figure 5:The total number of years spends in company

Figure 6:The Total number of employees per projects

where TP is True Positive,TN is True Negative;FP is False Positive and FN False Negative.

Table 1:The report of GOKF system based on confusion matrix

The GO is based on hierarchical clustering,and CNN achieved a higher degree of accuracy in prediction as the other introduced methods in [25]which give more practical optimal solutions as shown in “Tab.2”.In Tab.2 the QDF refers to Quadratic Discriminant Analysis Function with Kmeans clusters[28]and GBT is Gradient Boosting Tree with K-means clusters[29].

In[29],authors introduced two unsupervised pattern recognition algorithms based on K-means clusters and called QDF and Gaussian Mixture Model(GMM).The accuracy was achieved 96%based on QDF and the same method was applied on the data of this paper and approved the same results but the method of GO-KF is faster and more practical for achieving the accuracy in more accurate time than the other method.Furthermore,in [30],authors applied the GBT for diabetes mellitus diagnosis system and achieved accuracy near to 97%.When the same method was applied for our datasets achieved the same performance when it was compared with the result of this paper.Tab.2 shows the compare between pervious work methods and the method of this paper.

Table 2:Compare between the method of this paper with the other method from previous work

5 Conclusion

This paper introduced a technique for optimal classification and prediction of employees’performance.This technique is composed of a grasshopper optimizer followed by a Kawahara filter(KF).The employees’data are collected and then processed using a modified hierarchical K-means clustering method.The filter is used to obtain the best features of employees which match their best performance.This is done in two axes.Firstly,data of employees has been collected.From this data,the performance of each employee is calculated and illustrated.Secondly,the classification techniques are applied to classify the employee performance and prediction techniques are carried out to predict this performance in the future.This is done by obtaining the employees features.The Gradient Boosting Tree classifier is utilized for the purposes of features classification and prediction.The model has been applied.The percentage of employees,who are expected to leave the company after predicting their performance for the coming years,is calculated.The results were found highly accurate.A discussion of the results and a comparison with the previous research methods are explained.The proposed algorithm is found to be superior.

Acknowledgement:The author would like to thank the editors and reviewers for their review and recommendations.

Funding Statement:The author received no specific funding for this study.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

主站蜘蛛池模板: 日本三区视频| 久久亚洲国产一区二区| 一级毛片基地| 国产又粗又爽视频| 国产呦精品一区二区三区网站| 美女国产在线| 国产精品免费入口视频| 一级成人a做片免费| 国产一二三区在线| 人妻丰满熟妇啪啪| 国产97色在线| 欧美精品啪啪| 91久久偷偷做嫩草影院| 亚洲欧美另类中文字幕| 亚洲性色永久网址| 免费国产黄线在线观看| 99久久国产自偷自偷免费一区| 日韩精品免费一线在线观看| 九九九精品成人免费视频7| 亚洲伊人久久精品影院| 久久毛片网| 视频二区中文无码| 亚洲一区网站| 精品国产免费观看一区| 女人av社区男人的天堂| 在线欧美a| 国产精品久久久久久久久kt| 亚洲国产黄色| 亚洲日韩第九十九页| 美女免费黄网站| 青青久视频| 午夜国产不卡在线观看视频| 亚洲视屏在线观看| 国产毛片高清一级国语| 国产91精品久久| 国产人妖视频一区在线观看| 欧美亚洲第一页| 亚洲欧美日本国产综合在线| 啪啪国产视频| 午夜免费小视频| 在线观看免费人成视频色快速| av一区二区三区高清久久| 漂亮人妻被中出中文字幕久久| 一级毛片基地| 日韩大片免费观看视频播放| 亚洲AV成人一区二区三区AV| 国产日本欧美在线观看| 在线无码私拍| 91精品小视频| 国产精品无码一二三视频| 四虎亚洲国产成人久久精品| 国产微拍一区| 97人妻精品专区久久久久| 国产精品19p| 少妇露出福利视频| 亚洲午夜久久久精品电影院| 亚洲精品欧美重口| 久久精品这里只有精99品| 亚洲天堂高清| 国产精品无码AV中文| 香蕉视频国产精品人| 露脸真实国语乱在线观看| 伊人久久久久久久| 亚洲黄色成人| 视频二区亚洲精品| 91精品久久久久久无码人妻| 人妻精品久久无码区| 91小视频版在线观看www| 国产精品福利尤物youwu| 国产成人高清精品免费软件 | 久久久久夜色精品波多野结衣| 日本成人在线不卡视频| 日本五区在线不卡精品| 国产精品自在拍首页视频8| 波多野结衣视频一区二区 | 亚洲有无码中文网| 国产精品永久免费嫩草研究院| 天天综合网在线| 免费网站成人亚洲| 国产jizzjizz视频| 人妻夜夜爽天天爽| julia中文字幕久久亚洲|