999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

A Dual Discriminator Method for Generalized Zero-Shot Learning

2024-05-25 14:43:14TianshuWeiandJinjieHuang
Computers Materials&Continua 2024年4期

Tianshu Wei and Jinjie Huang,2,?

1School of Computer Science and Technology,Harbin University of Science and Technology,Harbin,150006,China

2School of Automation,Harbin University of Science and Technology,Harbin,150006,China

ABSTRACT Zero-shot learning enables the recognition of new class samples by migrating models learned from semantic features and existing sample features to things that have never been seen before.The problems of consistency of different types of features and domain shift problems are two of the critical issues in zero-shot learning.To address both of these issues,this paper proposes a new modeling structure.The traditional approach mapped semantic features and visual features into the same feature space;based on this,a dual discriminator approach is used in the proposed model.This dual discriminator approach can further enhance the consistency between semantic and visual features.At the same time,this approach can also align unseen class semantic features and training set samples,providing a portion of information about the unseen classes.In addition,a new feature fusion method is proposed in the model.This method is equivalent to adding perturbation to the seen class features,which can reduce the degree to which the classification results in the model are biased towards the seen classes.At the same time,this feature fusion method can provide part of the information of the unseen classes,improving its classification accuracy in generalized zero-shot learning and reducing domain bias.The proposed method is validated and compared with other methods on four datasets,and from the experimental results,it can be seen that the method proposed in this paper achieves promising results.

KEYWORDS Generalized zero-shot learning;modality consistent;discriminator;domain shift problem;feature fusion

1 Introduction

Traditional image classification methods need to collect a large number of images with annotations for model training,but for some new things that cannot massively collect training images,traditional image classification methods can not directly classify the new things.The emergence of zero-shot learning can solve this problem.Zero-shot learning learns from existing samples and then infers the categories of new things.Zero-shot learning recognizes new things using linguistic descriptions of the new things,and we refer to the linguistic descriptions as semantic features in this paper.

Two types of features are needed in zero-shot learning:Sample features(visual features)and the semantic features mentioned above.These two types of features belong to different feature spaces,and aligning these two types of features is very important.Aligning semantic and visual features is usually done by mapping them to the same feature space[1–5].We refer to these methods as embedding methods [6,7].However,these methods sometimes only consider information from the seen classes,which can cause a decrease in the accuracy when classifying the unseen class samples.

Addressing the problem of misclassification results for unseen classes,some researchers add information about unseen classes to their models,methods commonly used nowadays are generative models[8–11].Although the generative models can get good classification results,these methods need to train the generative model first and then use the generative model to obtain pseudo-samples about unseen classes.Then,a classifier is trained using pseudo-samples.The generative model methods make the process more complicated than other methods.Incorporating unseen class semantic features into the loss function [12] or adding a calibration term to the classification [13,14] is another technique to increase the classification accuracy of unseen class samples.In addition,some literature has also noted that the similarity between features also leads to a decrease in the zero-shot classification accuracy.Zhang et al.[15] proposed imposing orthogonality constraints between semantic features to differentiate between semantic features of different classes.This approach increased the differences between different categories and alleviated domain shift problems.

We have similarly employed adding information about unseen classes to the model.Unlike the methods mentioned above,a new feature alignment method is proposed in our model.In this paper,except the traditional mapping approach,we further use a dual discriminator approach to align the semantic and visual features.Instead of increasing the distance between different categories’visual and semantic features,we increased the consistency between the hidden space visual features with all class semantic features.This approach not only aligns features but also provides information about unseen classes.A new feature fusion approach is also used for classifier training to alleviate the bias problem.Our contributions are as follows:

(1) We propose a new model structure for solving the alignment problem of different modal features and the domain shift problem.

(2) To make a better alignment of semantic and visual features,this paper proposes a dual discriminator module and this dual discriminator method can provide information about the unseen classes.

(3) We propose a new feature fusion method by which the seen class features are perturbed to reduce the degree to which the classification results in the model are biased toward the seen classes and provide information on the unseen classes.

(4) Our method was validated on four different datasets.The experimental results demonstrate that the proposed model obtains promising results,especially in aPY dataset(5.1%).

2 Related Works

2.1 Zero-Shot Learning

Semantic features and visual features belong to different feature spaces with different dimensions,respectively.Usually,it is a choice to map these two features to the same feature space.Figs.1a and 1b show the two mapping methods: From semantic space to visual space and from visual space to semantic space.Liu et al.[6]proposed a Low-Rank Semantic Autoencoder(LSA)to enhance the zeroshot learning capability.Before classification,they used a mapping matrix to map semantic features to visual space.Tang et al.[4]mapped visual features to the semantic space and realized feature alignment and classification by calculating the mutual information between semantic features and visual features.In addition to the two mapping methods in Figs.1a and 1b,common feature space can be used in some literature.Hyperbolic spaces can maintain a hierarchy of features.Liu et al.[16] proposed to map the visual features and the semantic features into hyperbolic space.Li et al.[17] used direct sum decomposition for semantic features;the semantic features were decomposed into subspaces.The method in the literature [17] embedded semantic features and visual features into the common space.In addition,another method that maps semantic features to the visual space while projecting visual features to the semantic space.This method reduces the domain shift problem and allows better alignment of both features[5,18,19].These methods mentioned above only consider the information of the seen class when training the models but ignore the information provided by the unseen class semantic features.The compression of the unseen class information leads to the misclassification of the samples of the unseen class.Especially for generalized zero-shot,neglecting the unseen class information can cause most samples to be biased towards the seen classes.

Figure 1: Embedding methods

2.2 Domain Shift Problem

Since the unseen class samples only appear in the test set and the distribution is not the same between the seen class samples and the unseen class samples,this leads to a bias in the model when classifying the unseen class samples,and this phenomenon is domain shift problem.Especially for test sets containing the seen class categories,the unseen class samples are more likely to be misclassified as one of the seen class categories.Adding information about unseen classes to the model is proposed to address the problem mentioned above.Some researchers proposed generative models to generate unseen class samples[8–10,20].These methods use pseudo-samples instead of real samples for training the classifier.Huynh et al.[12] proposed another method.They proposed to add a term about the unseen class information in the loss function so that the information about the unseen class will not be too compressed.In addition to these two methods mentioned above,Jiang et al.[21] used class similarity as the coefficients in the loss function to improve the classification accuracy.In order to make semantic features more distinguishable,some researchers have imposed constraints on the semantic features of all classes,and such restrictions can distinguish the semantic features of different classes.In this way,all the features can be better categorized when mapped to the same feature space and alleviate the domain shift problem.Wang et al.[22]proposed to add orthogonal constraints to class prototypes in all class prototypes.Zhang et al.[15]proposed bi-orthogonal constraints on the latent semantic features and used the discriminator to reduce the modality differences.Zhang et al.[23]proposed corrected attributes for both seen and unseen class semantic features;the corrected attributes can be discriminative in zero-shot learning and alleviate the domain shift problem.Shen et al.[24]used spherical embedding space to classify the unseen class samples,this method used different radius and spherical alignments on angles to alleviate the prediction bias.

In the literature [15],the authors proposed the use of an adversarial network to distinguish the semantic features and visual features.Our method also uses a discriminator for the semantic features and visual features.Still,there is no orthogonality restriction on the semantic features in our method,and this paper employs a dual discriminator approach to align the features of different modalities.This dual discriminator can provide part of the information about the unseen class.To alleviate the problem that most of the unseen class samples are always classified into seen classes,we propose a feature fusion method that can reduce the seen class’s information and increase the unseen class’s information to some extent.

3 A Dual Discriminator Method for Generalized Zero-Shot Learning

3.1 Definition of Problem

The training set can be denoted byT={Xt,At,Yt}.We useU={Xu,Au,Yu} to represent the unseen classes.Xrepresents the visual features,Arepresents the semantic features andYrepresents the labels.We use the subscripttanduto represent seen classes and unseen classes.In conventional zero-shot learning(CZSL),the unseen samples can be classified into the unseen classes.In generalized zero-shot learning(GZSL),test samples are classified into all classes(both seen and unseen classes).

3.2 The Architecture of the Proposed Method

The proposed method is shown in Fig.2.We only consider GZSL in this paper.The visual featuresXtare encoded to get the hidden space featuresZt1,Zt2,andZt1=Zt2.The hidden space features are aligned with the seen class semantic featuresAtand unseen class semantic featuresAuthrough two discriminators.The features in the hidden space are decoded to get new visual featuresandand the new visual features are fused with the original visual features as the input featuresf1andf2to the classifier.We use lowercase letters to represent a feature.Each part of the model is described in detail below.

Semantic features and visual features belong to different feature spaces;mapping these two features to the same feature space and maintaining the consistency of these two features is an essential issue in zero-shot learning.Inspired by the literature [25],we use the latent space visual features to make the different modality features consistent.

In the literature [15],the authors used a discriminator to discriminate the different modality features.Different from the literature [15],we use two discriminators to enhance the consistency of the two modality features.We take one of the discriminators as an example,and its structure is shown in Fig.3.Inspired by generative adversarial networks[26],a discriminator can be used in generative adversarial networks to distinguish whether the sample is a generated sample or a real sample.This approach can make the generated samples more similar to the real samples.In this paper,we regard the hidden space visual features obtained by using the encoder as generative samples and regard the semantic features as real samples so that the discriminator can make the hidden space visual features more similar to the semantic features and enhance the visual features consistent with the semantic features.Also,to reduce the domain shift problem and increase the information of the unseen class,a discriminator is used for the semantic featuresAuand the hidden spatial visual features.

Figure 2: The proposed method

The other discriminator has the same structure as Fig.3.Inspired by Wasserstein Generative Adversarial Nets(WGAN)[26],we write the loss function of the discriminator in the following form:

Figure 3: The structure of the discriminator

Here,λ1andλ2represents the coefficients.D1andD2represent the two discriminators,where D1denotes the discriminator associated withZt1andAtand D2denotes the discriminator associated withZt2andAu.The subscriptPrepresents the distribution of the data.In this paper,our calculation ofis slightly different from that in the literature[26],we computeby=δ?Zt1+(1-δ)?Atandδ~U(0,1).is computed asThese two discriminators align semantic and visual features and add the information of unseen classes.The encoder in Fig.1 can be seen as the generator and mean(?)represents the mean value.The loss function is shown in Eq.(2):

The hidden visual featureszt1are passed through the decoder to get the new visual featureswhich need to be consistent with the original visual features,and this relationship can be written as:

Similarly,for the hidden spatial featureszt2to get the new visual featuresthrough the decoder,the loss function concerning the original visual features is written as:

Here,Δx=xt-We first compute Δx,then we compute Eq.(4).We useΔxinstead ofxtbecausezt2contains a portion of the information of the unseen class,and we want to reduce the compression of the knowledge of the unseen classes after the decoder.In the latent space,we also want the different modality features to be consistent with each other.

If only the featuresXtare employed as input features to the classifier.The results will biased to the seen classes.So,before inputting the features into the classifier,feature fusion is used,as shown in

Here,μ1andμ2are coefficients.Feature fusion is equivalent to adding perturbations to the original visual features,which can compress the information about the seen classes and provide information about the unseen classes.The cross-entropy is used as the loss function in the classifier,yirepresents the true label andrepresents the predicted label.

The total loss function is:

whereβis the coefficient.The model proposed in this paper is optimized by alternating optimization method.The discriminator is firstly trained by Eq.(1),and then the other networks in the model are trained by Eq.(10).

4 Experiments

We validate our model on four datasets: Animals with Attribute 1 (AWA1) [27],Animals with Attribute 2(AWA2)[28],Attribute Pascal and Yahoo(aPY)[29]and Caltech-UCSD Birds-200-2011(CUB)[30].The details of these four datasets are shown in Table 1.

Table 1: The details of the four datasets

In the proposed model,we use the RMSProp method to optimize the discriminator modules and the Adam method to optimize the other part of the proposed model.The learning rate is 0.001 for AWA1 and AWA2 datasets,and the learning rate is 0.006 for CUB and aPY datasets.The output of the first layer in the encoder contains 512 units,and the output of the first layer in the decoder contains 256 units.The output dimensions of the fully connected layer in the discriminator are 1024 and 256.We setμ1=0.5 andμ2=1 in our model.The visual features and semantic features are taken from the literature [28].The dimension of the visual features is 2048.The complexity of the model are as follows:The flops for AWA1,AWA2,CUB and aPY are 4.86 M,4.86,6.77 M and 4.68 M,and the byte are 2.44 M,2.44 M,3.39 M,2.35 M.

4.1 Results of GZSL

The proposed method is compared with other methods in GZSL settings.The evaluation method is taken from the literature [28].We useCto denote the average per-class top-1 accuracy andHto denote the harmonic mean.The subscriptssandudenote the seen classes and the unseen classes.The equations are as follows:

The results of the proposed method are shown in Table 2.As seen from Table 2,the results of the proposed method on the AWA1 dataset are 2.2% lower than the best results.The method proposed in this paper achieves promising results on AWA2 and aPY datasets.Especially on the aPY dataset,the method in this paper outperforms the Spherical Zero-Shot Learning(SZSL)[24]method by 5.1%.The methods Semantic Autoencoder+Generic Plug-in Attribute Correction(SAE+GPAC)[23],SZSL[24],Transferable Contrastive Network(TCN)[21],and Modality Independent Adversarial Network(MIANet) [15] are considered the unseen semantic features in their models.Where SAE+GPAC,SZSL,and MIANet impose constraints on the semantic features,making the different classes of features more distinguishable.TCN proposed using the relationship of unseen class and seen class semantic features as the coefficients of the loss function.The method in this paper achieves better results than SAE+GPAC,SZSL,TCN,and MIANet these four methods on the AWA1,AWA2,and APY datasets,and the methods SZSL and TCN for the CUB dataset are better than the proposed method.In summary,the method in this paper gives good results on the AWA2 dataset and the APY dataset,and not as good as the other methods on the AWA1 dataset and the CUB dataset,especially on CUB dataset.This is because the CUB dataset is a fine-grained image dataset,although the method in this paper can provide features about unseen classes,it is not sufficiently discriminative between features of different classes,so it will lead to a decrease in classification results.

Table 2: The results in GZSL

4.2 Parameters Influences

Figs.4–7 show the effects of β in Eq.(10)on the generalized zero-shot classification results.

Figure 4: The effects of β on AWA1

Figure 5: The effects of β on AWA2

In Figs.4–7,this paper uses ‘tr’and ‘ts’to denote the average per-class top-1 accuracy of the seen classes and the unseen classes,respectively.For the AWA1 and AWA2 datasets,asβincreases,the accuracy is increased for the harmonic mean and unseen classes and decreased for the seen classes.For the aPY dataset,an increase inβhas little effect on the harmonic mean,while the accuracy decreases for the seen classes and increases for the unseen classes.For the CUB dataset,accuracy increases for unseen class samples and decreases for seen class samples.In summary,asβincreases,the accuracy of the unseen classes increases,while the accuracy of the seen classes decreases.

Figure 6: The effects of β on aPY

Figure 7: The effects of β on CUB

4.3 Ablation Experiments and tSNE

The results of the ablation experiments are shown in Table 3.The method without discriminator and feature fusion is denoted as the baseline.We use visual features as the input featuresf1andf2for the classifier in the baseline.We use‘baseline+feature fusion’to indicate that the model does not contain discriminators,f1andf2are calculated using Eqs.(7) and (8).‘baseline+feature fusion+one discriminator’denotes the method adds a discriminator related to semantic features of the seen classes.

Table 3: Ablation experiments

Table 3 shows that for AWA1,AWA2,and CUB,the fusion of features in the three dataset models can drastically improve the harmonic mean.‘baseline+feature fusion’improves the accuracy of the seen classes compared to the baseline method,but does not reduce the accuracy of the unseen classes too much,which indicates that ‘baseline+feature fusion’can improve the accuracy of the seen classes while still making the unseen class samples not massively biased toward the seen classes.‘baseline+feature fusion’can make the increase in both seen and unseen classes on aPY compared to the baseline method.From Table 3,it can be seen that when the discriminator is added,there is an increase in harmonic mean;this is because adding the discriminator not only adds information about the unseen class but also makes the features of the different modalities more consistent.

Figs.8a and 8b show the tSNE for the AWA2 dataset.Fig.8a shows the unseen class visual features in the AWA2 dataset,and Fig.8b shows the visual featuresf2obtained using feature fusion.Since the training set samples are used to obtainf2,the number of samples obtained for each class is different.The figure shows that the method proposed in this paper can provide a part of the distribution similar to the original sample features.

Figure 8: The tSNE of AWA2

4.4 The Influence of the Features ΔX

Fig.9 shows the results of replacing ΔXin Eq.(4) with the original visual featureXt.From Fig.9,although good results can be obtained using the original visual features,the results are still low compared to the method in this paper.

Fig.10 shows the classification accuracy for each unseen class on the aPY dataset when replacing ΔXwith the original featureXt.From Fig.9,the accuracy is less than the method proposed in this paper,except for very few classes where the accuracy increases when using the original features.

Figure 9: The harmonic mean of the original train features used in Eq.(4)

Figure 10: The accuracy of the unseen class samples of aPY

5 Conclusions

We propose a new model structure for the consistency problems of different modal features and domain shift problems in generalized zero-shot learning.Using a dual discriminator structure in the proposed model can lead to a better alignment of semantic and visual features,and this dual discriminator structure can provide part of the information about the unseen class.At the same time,this paper adopts a new feature fusion method to reduce the information about seen classes and provide information about unseen classes,so the model is not too biased towards seen classes in generalized zero-shot classification and improves the harmonic mean.We have experimented with our proposed model on four datasets,and the experimental results show the effectiveness of our approach,especially on the aPY dataset.We will further explore using an attention mechanism approach to extract more discriminative features,which will enable better alignment of features across modalities,and more discriminative features can improve the accuracy of zero-shot classification.

Acknowledgement:The authors sincerely appreciate the editors and reviewers for their valuable work.

Funding Statement:The authors received no specific funding for this study.

Author Contributions:Study design and draft manuscript preparation: Tianshu Wei;reviewing and editing the manuscript:Jinjie Huang.

Availability of Data and Materials:The datasets used in the manuscript are public datasets.The datasets used in the manuscript are available from https://www.mpi-inf.mpg.de/departments/computer-vision-an d-machine-learning/research/zero-shot-learning/zero-shot-learning-the-good-the-bad-and-the-ugly.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

主站蜘蛛池模板: 国产成人一区在线播放| 天天躁夜夜躁狠狠躁图片| 无码电影在线观看| 黄色网站在线观看无码| 国产人成乱码视频免费观看| 久久综合伊人 六十路| 午夜福利网址| 中文字幕在线观看日本| 国产香蕉在线视频| 久久精品最新免费国产成人| av一区二区无码在线| 国产特级毛片| 中文字幕无码av专区久久| 中文无码精品A∨在线观看不卡| 亚洲AⅤ波多系列中文字幕| 国产原创演绎剧情有字幕的| 99精品福利视频| 青青青视频免费一区二区| 国产精品第三页在线看| 波多野结衣视频一区二区| 99视频精品在线观看| 欧美成人午夜视频| 成人国产精品视频频| 日本欧美视频在线观看| 亚洲日本韩在线观看| 免费可以看的无遮挡av无码| 国产日本一区二区三区| 亚洲国产欧美国产综合久久| 国产主播在线一区| 日本91在线| 日韩欧美国产成人| 青青青草国产| 这里只有精品在线| 全色黄大色大片免费久久老太| A级毛片无码久久精品免费| 亚洲中文字幕久久精品无码一区| 亚洲天堂免费在线视频| 国产在线观看成人91| 波多野结衣国产精品| 国产第四页| 在线国产三级| 色婷婷在线影院| 亚洲精品无码人妻无码| 人妖无码第一页| 青青久久91| 国产日韩精品欧美一区灰| 丁香五月亚洲综合在线| 色亚洲激情综合精品无码视频 | 亚洲国产日韩在线成人蜜芽| 国产免费福利网站| 2022国产91精品久久久久久| 亚洲第一极品精品无码| 日日拍夜夜操| 91亚瑟视频| 午夜精品影院| AV片亚洲国产男人的天堂| 国产青榴视频在线观看网站| 成人亚洲国产| 福利视频一区| 在线免费a视频| 国产精品永久免费嫩草研究院| 在线欧美国产| 亚洲啪啪网| 麻豆精品在线视频| 国产黄网永久免费| 亚洲成人网在线观看| 91免费观看视频| 青青青伊人色综合久久| 思思99思思久久最新精品| 偷拍久久网| 手机永久AV在线播放| 国产亚洲美日韩AV中文字幕无码成人| 欧美国产日产一区二区| 丝袜亚洲综合| 97se亚洲综合不卡| 亚洲国产欧洲精品路线久久| 国产无码高清视频不卡| 欧美在线网| 色视频国产| 在线欧美日韩国产| 免费全部高H视频无码无遮掩| 在线欧美a|