999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

LEARNING RATES OF KERNEL-BASED ROBUST CLASSIFICATION*

2022-06-25 02:13:18ShuhuaWANG王淑華

Shuhua WANG (王淑華)

School of Information Engineering,Jingdezhen Ceramic University,Jingdezhen 333403,China

E-mail:w614sh@126.com

Baohuai SHENG (盛寶懷)?

Department of Finance,Zhejiang Yuexiu University,Shaoxing 312030,China Department of Applied Statistics,Shaoxing University,Shaoxing 312000,China

E-mail:shengbaohuai@163.com

Abstract This paper considers a robust kernel regularized classification algorithm with a non-convex loss function which is proposed to alleviate the performance deterioration caused by the outliers.A comparison relationship between the excess misclassification error and the excess generalization error is provided;from this,along with the convex analysis theory,a kind of learning rate is derived.The results show that the performance of the classifier is effected by the outliers,and the extent of impact can be controlled by choosing the homotopy parameters properly.

Key words Support vector machine;robust classification;quasiconvex loss function;learning rate;right-sided directional derivative

1 Introduction

The problem of classification is one of the most considered in learning theory.The goal of classification is to construct a classifier with a small misclassification error.Of the many classical learning methods that have been provided,the support vector machine (SVM) has been one of the most successful learning algorithms (see for example[1–6]).

LetXbe a given compact set in thed-dimensional Euclidean space Rd(input space),and letY={-1,1}(representing the two classes).Letρbe a fixed but unknown probability distribution onZ=X×Ywhich yields its marginal distributionρXonXand its conditional distributionρ(·|x)=P (·|x) atx∈X.Then we denote bythe samples i.i.d.(independently and identically drawn) according to the distributionρ.

As we know,the goal of classification is to produce a binary classifier C:X→Y.The misclassification error of C,which is used to measure the prediction power of C,is defined by

The best classifier,called the Bayes rule,is given by

The classifiers considered in this paper are induced by real-valued functionsf:X→R as Cf=sgn (f),which is defined by sgn (f)=1 iff(x)≥0 and sgn (f)=-1 otherwise.Recall that the regression ofρis

It is easy to see thatfc(x)=sgn (fρ)(x).

The Tikhonov regularization technique is an effective method for solving ill-posed problems ([7–10]).In recent years,many classification algorithms generated from Tikhonov regularization schemes have been developed and the convergencerates have been investigated (see,for example,[11–17]and references therein).We now briefly introduce the general framework of the kernel regularization classification algorithm.

Given a classifying loss functionV(r):R→R+,the generalization error is defined as

where F is the set of all of the measurable functions onX.

We call HKa reproducing kernel Hilbert space (RKHS) associated with a Mercer kernelK(x,y)(see for example[17–23]) if,for allf∈HKandx∈X,it holds that

The general kernel regularized classification associated with a given reproducing kernel Hilbert space HKtakes the form

whereλ>0 is the regularization parameter.

Based on model (1.1),a large number of scholars have studied the kernel regularization classification algorithms with some convex loss functions (see,for example,[11–14,16,17,19]).However,most of these research results do not consider the impact of outliers on the performance of the algorithms.It is known that,in many practical applications,outliers often occur in real data sets.Outliers can be described as data points that are far away from the pattern set by the majority of the data (see[24–26]).Outliers usually have a large impact on non-robust methods and can even make the whole statistical analysis useless if a non-robust loss is used (see,for example,[27,28]).

In the literature of learning theory,a great deal of effort has been made to improve the robustness of the SVM and other similar learning algorithms.Robust loss functions for SV classification have been introduced and have been intensively studied (see,for example,[27–30]).Unfortunately,robustness comes with a cost:when minimizing a convex loss,even a single data point can dominate the result;that is,any convex loss function exhibits necessarily unbounded sensitivity to even a single outlier.The classical literature achieves bounded outlier sensitivity by considering redescending loss functions or bounded loss functions[27],both of which are inherently non-convex.As observed,non-convex loss functions are necessary to alleviate the effect of outliers.Along these lines,S.Suzumura,K.Ogawa,et al.[31]introduced a novel homotopy approach for the non-convex robust SVM learning.The basic idea is to introduce parameters,which bridge the standard SVM and fully robust SVM to express the influence of outliers.The loss functionVθ,s(r):R→[0,∞) that they defined is

where[1-r]+:=max{0,1-r},θ∈[0,1] ands≤0.

It can be seen thatVθ,s(r) is characterized by a pair of parameters.We refer toθandsas the tuning parameters which govern the influence of outliers.Figure 1 shows the loss functions for severalθands.

Figure 1 The robust classification loss function with different values of θ and s

The robust SV classification algorithm defined in[31]is

wheref(x)=w?φ(x)(withφ(x) being the feature map implicitly defined by a kernelK),wis a vector in the feature space,and?denotes the transpose of vectors and matrices.C>0 is the penalty parameter.

Algorithms and experiments in[31]illustrate the significance of this homotopy approach.The purpose of this paper is to analyze the convergence rate of the excess misclassification error associated with the non-convex loss functionVθ,s(r) defined as in (1.2),and to understand how the choice of the homotopy parameters affect the error.The main difference between our work and reference[31]is that the error bound of the algorithm is estimated theoretically,while reference[31]mainly studies the solution method of the model from the perspective of numerical calculation.

The convex analysis method has become an effective error analysis method in the theoretical research field of machine learning (see,for example,[32–39]).Since the loss functionVθ,sis a non-convex function,the usual convex method cannot be used.In[39],the convergence rate of kernel regularized robust SV regression is provided by using an improved convex analysis method.Differently to the regression problem,the performance of classification algorithm is usually reflected by the convergence rate of the excess misclassification error rather than the excess generalization error considered in[39].Thus,the convergence analysis of classification algorithms is more difficult than that of regression algorithms.In order to overcome this difficulty,we improve the method in reference[39]by the technique of a Fenchel-Legendre conjugate,and an important comparison inequality is derived.

The rest of this paper is organized as follows:in Section 2,the kernel regularized robust SV classification model considered in this paper is given.Section 3 presents the main results of the paper,where a comparison inequality and the learning rate are provided.Some proofs for the main results are given in Section 4.

2 The Kernel Regularized Robust SV Classification

In this section,a regularized robust SV classification model is provided.

With the loss functionVθ,s(r) given by (1.2),we define the generalization error off:X→R as

Also,the general kernel regularized classification associated with the reproducing kernel Hilbert space HKtakes the form

In this paper,we consider a partition of the samplesinto two disjoint sets I and O.The samples in I and O are defined as Inliers and Outliers,respectively.We impose the restriction that the marginyif(xi) of an Inlier should be larger than or equal tos,while that of an Outlier should be smaller thans.In other words,

where M:={1,...,m}.Let P={I,O}∈2mdenote the partition,where 2mis the power set of

With these notions in hand,the kernel regularized homotopy algorithm is defined as (see[31])

where|I|sand|O|sare the cardinality of I and O,respectively,and|I|s+|O|s=m.

Take

We know,from the analysis process of[31]and[31,Proposition 2.1],that(f) is a convex function on HK.Then,(2.2) can be rewritten as

Note that whenθ=1 ors=-∞,algorithm (2.3) is equivalent to the standard formulation of the SV classification with the hinge lossV(r)=[1-r]+(see[11]).

Our goal is to investigate the excess misclassification error for algorithm (2.3).In the present work,the sample error estimate for algorithm (2.3) is investigated with an improved convex approach and a capacity independent error is provided.The excess misclassification error will be shown by a K-functional associated with model (2.1).

3 Error Analysis

3.1 The comparison inequality

We need to bound the excess misclassification errorto measure the performance of algorithm (2.3).The algorithm is designed by minimizing a penalized empirical error associated with the loss functionVθ,s,so relations between the excess misclassification error and the excess generalization errorbecome crucial.A systematic study of this problem is done in[37].

In[11],the following important results are derived:

Lemma 3.1([11]) LetVbe a classifying loss satisfyingV′′(0)>0.Then there exists a constantcV>0 such that for any measurable functionf:X→R,it holds that

LetV(r)=[1-r]+.Then for any measurable functionf:X→R,it holds that

In this subsection we shall give a comparison inequality forVθ,s(r).

Denoteη=η(x)=P (Y=1|X=x) and Φη(r)=ηVθ,s(r)+(1-η)Vθ,s(-r).Then it holds that

Forη∈[0,1],we define the optimal conditionalVθ,s-error as

Then the optimalVθ,s-error satisfies

We definer*:[0,1]→R as

It is easy to see that,for any nonnegative loss functionV,H-(η)≥H(η) holds.

Definition 3.2([37]) We say that the loss functionVis classification-calibrated if,for any,it holds thatH-(η)>H(η).

Definition 3.3([37]) Given a loss functionV:R→[0,∞),we define the Ψ-transform function Ψ:[-1,1]→[0,∞) by,where

andg**:[-1,1]→R is the Fenchel-Legendre biconjugate ofg:[-1,1]→R,which is characterized by epig**=epig.HereSis the closure of the convex hull of the setS,and epigis the epigraph of the functiong,that is,the set{(x,t):x∈[0,1],g(x)≤t}.

Lemma 3.4([37]) For any nonnegative loss functionV,classification-calibration implies that Ψ is invertible on[0,1],and that the following statements are equivalent:

(a)Vis classification-calibrated;

(b)Ψ(τ)>0 for allτ∈(0,1].

For the loss functionVθ,s(r) considered in this paper,we have the following result:

Lemma 3.5LetVθ,s(r) be the loss function defined as (1.2),τ∈[-1,1].Fors≤-1,we have

and for-1<s≤0,we have

wheredθ,s=2θ+(1-θ)(1-s).

Proof(1) Fors≤-1,we discussH(η) andH-(η).We consider the form of Φη(r) which is a convex combination ofVθ,s(r) andVθ,s(-r).

Forη=0,Φη(r)=Vθ,s(-r),eachr≤-1 makes Φη(r)=0.Similarly,forη=1,Φη(r)=Vθ,s(r),eachr≥1 makes Φη(r)=0.For 0<η<,Φη(r) decreases strictly on (-∞,-1]and increases on[-1,+∞).For<η<1,Φη(r) decreases strictly on (-∞,1]and increases on[1,+∞).Forη=,Φη(r) decreases strictly on (-∞,-1]and increases on[1,+∞),and Φη(r)≡1 on[-1,1].

Therefore,by the definition ofr*(η),we have

This implies thatr*(η)=sgn (η-) for allη∈[0,1]and

By the definition ofH(η),we have

On the other hand,we need to derive the concrete expression ofH-(η).According to (3.2),we combine the monotonicity of Φηdiscussed above with the constraint conditionr(2η-1)≤0.The detailed derivation of this is given below.

Forη=0,Φη(r)=Vθ,s(-r).Since 2η-1=-1<0,the constraint condition implies thatr≥0.Φη(r) attains the minimum value atr=0,and the minimum value is Φη(0)=Vθ,s(0)=1.Similarly,forη=1,Φη(r)=Vθ,s(r).Since 2η-1=1>0,the constraint condition implies thatr≤0.Φη(r) attains the minimum value atr=0,and the minimum value is Φη(0)=Vθ,s(0)=1.,2η-1≤0,the constraint condition implies thatr≥0.Since Φη(r) increases on[0,+∞),Φη(r) attains the minimum value atr=0,and the minimum value is Φη(0)=ηVθ,s(0)+(1-η)Vθ,s(0)=1.,the constraint condition implies thatr≤0.Φη(r) decreases strictly on (-∞,0],Φη(r) attains the minimum value atr=0,and the minimum value is Φη(0)=ηVθ,s(0)+(1-η)Vθ,s(0)=1.

Thus,for any 0≤η≤1,we have

(2) For-1<s≤0,we discussH(η) andH-(η).

Forη=0,Φη(r)=Vθ,s(-r),it is easy to see that Φη(r)=0 for eachr≤-1.Similarly,forη=1,Φη(r)=Vθ,s(r),eachr≥1 makes Φη(r)=0.For,Φη(r) decreases on (-∞,-1]∪[-s,1]and increases on[-1,-s]∪[1,+∞).

By a simple calculation,we get that

Then it follows immediately from 0<η<that Φη(-1)<Φη(1).Therefore the minimum value must be attained at-1.

According to the above analysis,we have that

Hence,for any 0≤η≤1,

Letdθ,s=2θ+(1-θ)(1-s).Then

Now we are in a position to discussH-(η).

Forη=0 orη=1,in a fashion similar to the case ofs≤-1,we have that Φη(r) attains the minimum value Φη(0)=Vθ,s(0)=1.For,2η-1≤0,the constraint condition implies thatr≥0.Φη(r) increases on[0,-s]∪[1,+∞),and decreases on[-s,1],so Φη(r) has two local minimum values:Φη(0)=1 and Φη(1)=(1-η)Vθ,s(-1)=dθ,s(1-η).Ifthendθ,s(1-η)≥1,otherwisedθ,s(1-η)<1.It can be concluded that

Combining (3.10) and (3.11),for anyη∈[0,1],we have that

By (3.9) and (3.12),we get that

and by using the facts that we have,a,b∈R,it follows that

This completes the proof. □

For the loss functionVθ,s(r) defined as (1.2),F(xiàn)igures 2 and 3 show the graphs of Φη(r) with some values of homotopy parameters,while the graphs of Ψ(τ) andH(η) are shown in Figure 4.

Figure 2 Φη(r) with θ=0.25.s=-1.5 in the left sub figure,s=-1 in the middle sub figure,and s=-0.5 in the right sub figure

Figure 3 Φη(r) with θ=1.s=-1.5 in the left sub figure,s=-1 in the middle sub figure,and s=-0.5 in the right sub figure

Figure 4 Ψ(τ) and H (η) for the loss function Vθ,s with θ=0.25 and s≤-1 or-1<s≤0.Ψ(τ) with θ=0.25 and s≤-1 or s=-0.5 in the left sub figure,H (η) with θ=0.25 and s≤-1 or s=-0.5 in the right sub figure

Letting Ψ be the function defined in Definition 3.3,according to the analysis of[37],the following lemma holds:

Lemma 3.6([37])(1) For any measurable functionf:X→R,any nonnegative loss functionV,and any probability distribution onX×{-1,+1},it holds that

Proposition 3.7LetVθ,s(r) be the loss function defined as (1.2).Then for any measurable functionf:X→R,it holds that

ProofFor any measurable functionf:X→R,it is immediately apparent from (3.13) that

According to the results given in Lemma 3.5,for allτ∈(0,1],we have that Ψ(τ)>0.By Lemma 3.4,it follows that the loss functionVθ,s(r) is classification-calibrated,and that Ψ is invertible on[0,1].We now analyze the comparison inequality between R (sgn (f))-R (fc) andfrom two cases according the value of the parameters.

Case 1:s≤-1.Since~Ψ(τ)=|τ|is a convex function,by Lemma 3.6,we have that

Takingτ=R (sgn (f))-R (fc),it follows that,fors≤-1,we have

It is easy to see that forθ=1 ors→-∞,the comparison relation,given by Proposition 3.7,is in accordance with the comparison relation (3.1) associated with the hinge loss.

3.2 The learning rates

In this subsection,we give results about the excess misclassification error of the algorithm (2.3).Let us begin with some concepts regarding convex analysis (see[40]).

Let (H,‖·‖H) be a Hilbert space,and letF(f) be a function defined on a convex setS?H.We say thatFis a quasiconvex function onSif

holds for allf1,f2∈Sand anyα∈[0,1].

If,for allf1,f2∈Sand anyα∈[0,1],it holds that

then we callFa convex function onS.

Letting B (f,ε)={g∈H:‖g-f‖H<ε}be theε-ball of Hilbert space H,we callV:H→R the local Lipschitz nearf∈H if,for someε>0,it holds that

whereLis called the Lipschitz constant,which depends uponfandε.

A nonnegative continuous functionV:R→[0,∞) is called locally Lipschitz loss if,for alla≥0,there exists a constantca>0 such that

Moreover,fora≥0,the smallestcais denoted byLa.Finally,if we havethen we say thatVis Lipschitz continuous.Ifa=+∞,then we say thatV(t) is a Lipschitz loss on R.

For the loss functionVθ,s(r) defined as (1.2),it is easy to show thatVθ,s(r) is a quasiconvex function on R and thatVθ,s(r) is a Lipschitz continuous function with the Lipschitz constantLθ,s=1 forr>sandLθ,s=θforr≤s,i.e.,

Definition 3.8Let (H,‖·‖H) be a Hilbert space,and letF(f) be a quasiconvex function from H to R∪±∞.We callD+F(f,h) the right-sided Gateaux directional derivative ofFatf∈H with respect tohif there existsD+F(f)∈H such that

Moreover,D+F(f,h) can be written as

It is easy to see that ifFis Lipschitz continuous on H,then,for?h∈H,we have that

whereLis the Lipschitz constant,i.e.,

By the definition of D+F (f),we know that ifF(f) attains its minimal value atf0∈H,then

We now state the main result of this paper.

Theorem 3.9Let (HK,‖·‖K) be the reproducing Hilbert space associated with the kernelKx(·)=K(x,·),Vθ,s(r) be the loss function defined as (1.2),andand R (f) be defined as above.Then,for anyδ∈(0,1),with 1-δ,it holds that

Remark 3.10The K-functionaldenotes the approximation error.By the Lipschitzian ofVθ,s,we know that there exists a constantC>0 such thatsatisfies

The decay rate for the right-hand side of (3.23) may be described by a modulus of smoothness ([33]),and the convergence ofis determined by the capacity of HK(see[41,42]).

Remark 3.11If the parametersθandλare chosen such that form→+∞we have that,then,for a givenδ∈(0,1),with 1-δ,we have that

and the convergence rate can be controlled by choosing the homotopy parameters properly.

Remark 3.12Whens→-∞,we have that|I|s→mand|O|s→0.Therefore,the learning rates given have the homotopy continuous property.

4 Proofs

Now we are in a position to prove Theorem 3.9.We need to prove the following lemmas first:

Lemma 4.1LetF(f):HK→R∪±∞be a convex function.Then,for anyf,g∈HK,it holds thatF(f)-F(g)≥D+F(g,f-g)=〈D+F(g),f-g〉HK.

ProofSinceF(f) is a convex function,for anyf,g∈HKandα∈[0,1],it holds thatF(αf+(1-α)g)≤αF(f)+(1-α)F(g).This implies thatF(g+α(f-g))≤F(g)+α(F(f)-F(g)),so we have that

Lettingα→0+in the left of inequality (4.1),our conclusion follows. □

Lemma 4.2Let (HK,‖·‖K) be the reproducing Hilbert space associated with the kernelKx(·)=K(x,·),and letVθ,s(r) be the loss function defined as in (1.2).Then,for anyh∈HK,it holds that

ProofBy the definition of,we have that

The reproducing property of HKyields that

Taking the above formula into (4.4),we have that

This proves (4.2).Using the same argument as above,we obtain equation (4.3).The proof is complete. □

By the same token,we have that

Since Ω(f) and Ωz(f) are also strict convex functions,we have that

Lemma 4.3Letbe defined as in (4.7) and (4.8),respectively.Then,

ProofCombining the definitions ofwith Lemma 4.1,we have that

Combining (4.11) with (4.2) and (4.3),we have that

We have thus proven (4.9). □

Lemma 4.4([43]) Letξbe a random variable taking values in a real separable Hilbert space H on a probability space (Ω,F(xiàn),P).Assume that there is a positive constantLsuch that

Then,for alln≥1 and 0<δ<1,it holds that

The next Lemma gives a bound for the sample error.

Lemma 4.5Letbe defined as in (4.7) and (4.8),respectively.Then,for anyδ∈(0,1),it holds,with 1-δ,that

ProofAccording to Lemma 4.3,it follows that

Then (4.12) implies that

where we have used the fact that|D+[1-·]+|≤1.

According to Lemma 4.4,for anyδ∈(0,1),with,it holds that

By the same token,for anyδ∈(0,1),with,we have that

(4.13) and (4.15),together with (4.16),yield our conclusion. □

A bound for the generalization error is provided in the following Lemma:

Lemma 4.6LetVθ,s(r) be the loss function defined as in (1.2),and letandbe defined as above.Then,for anyδ∈(0,1),with 1-δ,it holds that

ProofApplying the Lipschitz continuity of the loss functionVθ,swith the Lipschitz constantsatisfyingLθ,s≤1,and according to Lemma 4.5,we obtain that

It is now obvious that the lemma holds. □

For anyδ∈(0,1),by Lemma 4.6 and Proposition 3.7,with 1-δ,it holds that

Thus,we have completed the proof of Theorem 3.9. □

主站蜘蛛池模板: 丁香五月激情图片| 不卡无码网| 在线观看视频99| 69国产精品视频免费| 国产v精品成人免费视频71pao| 最新精品国偷自产在线| 国产视频 第一页| 国产成人1024精品| 中文天堂在线视频| 日韩AV手机在线观看蜜芽| 四虎精品黑人视频| 亚洲熟女偷拍| 国产精品天干天干在线观看| 91九色视频网| 色一情一乱一伦一区二区三区小说| 亚洲国产在一区二区三区| 欧美中出一区二区| 啦啦啦网站在线观看a毛片| 99精品福利视频| 亚洲人视频在线观看| 国产日本欧美在线观看| 欧美人在线一区二区三区| 一级毛片视频免费| 香蕉久人久人青草青草| 精品免费在线视频| 久久永久视频| 亚洲精品在线影院| 好吊色国产欧美日韩免费观看| 干中文字幕| 99精品在线看| 欧美性猛交一区二区三区 | 成人精品亚洲| 亚洲久悠悠色悠在线播放| 国产a在视频线精品视频下载| 国产精品七七在线播放| 婷婷五月在线视频| 91视频青青草| 亚洲精品无码日韩国产不卡| 好紧太爽了视频免费无码| 亚洲日韩Av中文字幕无码| 欧美不卡在线视频| 免费a在线观看播放| 一本色道久久88| 国产精品视频久| 亚洲男人天堂2018| 久久国产精品麻豆系列| 成人一区专区在线观看| 免费毛片网站在线观看| 免费在线成人网| 国产一区二区精品福利| 亚洲人成网站色7777| 亚洲中文字幕无码mv| 欧美一道本| 国产91小视频在线观看| 日本免费一级视频| 国产成人乱码一区二区三区在线| 国产成人一区| 国产91精品最新在线播放| 午夜色综合| 亚洲欧美日韩综合二区三区| 91麻豆精品视频| 欧洲欧美人成免费全部视频| 国产爽爽视频| 无码专区在线观看| 日本在线免费网站| 国产一级毛片yw| 国产偷国产偷在线高清| 国产国模一区二区三区四区| 国产精品女人呻吟在线观看| 亚洲欧美另类中文字幕| 中文字幕免费在线视频| 亚洲第一黄片大全| 国产成人精品一区二区| 日韩国产一区二区三区无码| 国产综合色在线视频播放线视| 国产美女精品一区二区| 欧美不卡视频在线观看| 精品人妻系列无码专区久久| 亚洲第一天堂无码专区| 欧美人与牲动交a欧美精品| 在线综合亚洲欧美网站| 97青草最新免费精品视频|