999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Narrow Pooling Clothing Classification Based on Attention Mechanism

2022-09-28 10:08:40MAXiaoWANGShaoyu王紹宇YEShaoping葉少萍FANJingyi樊靜宜XUAnXIAXiaoling夏小玲

MA Xiao(馬 驍),WANG Shaoyu(王紹宇),YE Shaoping(葉少萍), FAN Jingyi(樊靜宜),XU An(徐 安),XIA Xiaoling(夏小玲)

School of Computer Science and Technology, Donghua University, Shanghai 201620, China

Abstract: In recent years, with the rapid development of e-commerce, people need to classify the wide variety and a large number of clothing images appearing on e-commerce platforms. In order to solve the problems of long time consumption and unsatisfactory classification accuracy arising from the classification of a large number of clothing images, researchers have begun to exploit deep learning techniques instead of traditional learning methods. The paper explores the use of convolutional neural networks(CNNs) for feature learning to enhance global feature information interactions by adding an improved hybrid attention mechanism(HAM) that fully utilizes feature weights in three dimensions: channel, height, and width. Moreover, the improved pooling layer not only captures local feature information, but also fuses global and local information to improve the misclassification problem that occurs between similar categories. Experiments on the Fashion-MNIST and DeepFashion datasets show that the proposed method significantly improves the accuracy of clothing classification (93.62% and 67.9%) compared with residual network (ResNet) and convolutional block attention module(CBAM).

Key words: clothing classification; convolutional neural network(CNN); residual network (ResNet); attention mechanism; narrow pooling

Introduction

In recent years, with e-commerce rapidly rising, the use of cell phones and apps is becoming more and more common. Due to the convenience of online shopping, more and more people prefer buying clothing on shopping sites in their daily lives. When people want to search for clothing on e-commerce platforms, they usually have two ways. The first one is searching for clothing information through a text input engine, which requires sellers to post photograph manually and classify the clothing in their stores beforehand. The other one is uploading clothing images to a platform and detecting the relevant attributes of these clothing images. In these two ways, clothing classification is an indispensable step, which not only facilitates the store’s management of the product, but also helps users to narrow the search scope. This type of classification is different from other image classification tasks that there are many attributes of clothing products, a large number of images, and a high degree of similarity between some classes.

The essence of clothing classification is to determine the classes by extracting image features and designing classification models. Traditional classification models mainly include support vector machine (SVM), extreme learning machine (ELM), random forest, and transfer forest and so on. Salem and Nasari[1]applied SVM to research in clothing. Panetal.[2]proposed to use the back propagation(BP) neural network for the discrimination of knitted fabrics. Bossardetal.[3]extracted the features by histogram of oriented gradients(HOG) and local binary patterns(LBP), and put the features into a clothing classification system containing classifiers such as SVM, random forest, and transfer forest, and the accuracy obtained is 35.05%, 38.29%, and 41.36% respectively. Thewsuwan and Horio[4]proposed a clothing classification method by using two texture features (LBP and Gabor filters) and obtained an average accuracy of 80.27% on a five-category dataset. Zhangetal.[5]added HOG to clothing classification and achieved strong robustness to light, and obtained an accuracy of 73.60% by classifying the Tmall buyer show dataset. Panetal.[6]proposed an ethnic clothing classification algorithm based on scale invariant feature transform(SIFT), HOG and color features, and obtained an average accuracy of 87.6%. Surakarin and Chongstitvatana[7]used texture features and speed up robust features(SURF) obtained by improving SIFT for clothing classification.

Since the raise of deep learning, breakthroughs in the application and improvement of deep learning networks have been made in various fields, and clothing classification is no exception[8]. A large number of deep learning-based clothing classification algorithms have emerged. Nawazetal.[9]proposed an ethnic clothing classification algorithm based on the inception model. They designed a convolutional neural network(CNN) architecture and added the inception module, which led to an improvement in classification accuracy. Liuetal.[10]proposed a clothing classification method based on global convolutional features and local key points to improve the classification results by adding local information. Zhangetal.[11]proposed a clothing classification method based on residual network(ResNet), which used ResNet as baseline and achieved good results. The above studies investigated and improved deep learning for clothing classification, but did not apply global information as well as local information interactions, and the influence of clothing background information still exists.

The contribution of this paper to the above problem is as follows.

(1)We incorporate a hybrid attention mechanism(HAM) that reduces global information loss and pays attention to 3D information interaction.

(2)We use narrow pooling layers to not only acquire and fuse global and local information, but also reduce the interference of irrelevant background information.

(3)We enhance the interactions between feature information in three dimensions and reduce the difficulty of distinguishing similar categories by a novel pooling layer.

1 Related Work

1.1 Deep learning architecture

CNN is a very common deep learning network nowadays. Its principle is derived from biological vision mechanism. The most basic architecture of CNN consists of a convolutional layer, a pooling layer, and a fully connected layer, where feature learning and classifier are integrated.

1.2 Attention mechanism

Attention model has been widely used in various types of deep learning tasks such as natural language processing, image recognition, and speech recognition. It is one of the core technologies worthy of attention and in-depth understanding in deep learning technology. A neural network trained without attention mechanism processes all features of a picture equivalently. Although the neural network learns the features of the image for classification, these features are not different in the “eyes” of the neural network, so the neural network does not pay much attention to a “region”. The spatial domain attention proposed by Jaderbergetal.[12]transformed the spatial information in the picture to other spaces and retained the useful information to obtain better robustness. Huetal.[13]proposed squeeze and excitation network(SENet) to extract the global information of the channel to get the features of channel dimension, and gave more attention to the features with more messages and suppressed the irrelevant features. Wooetal.[14]proposed convolutional block attention module(CBAM) to combine channel attention with spatial attention, which was able to compensate for the loss caused by upsampling.

2 Methods

2.1 3D information HAM

The commonly used attention mechanisms are channel attention mechanism and spatial attention mechanism. During the use of both channel attention mechanism and spatial attention mechanism, maximum pooling and average pooling are used in parallel to compress the feature map in spatial dimension. The difference is that the channel attention mechanism sends the results to the multi-layer perceptron(MLP) separately and then sums them, while the spatial attention mechanism performs a concatenation operation on the results after passing through the maximum and average pooling.

Fig. 1 HAM architecture

Given an intermediate feature mapF∈RC×H×Was input, the overall attention process can be summarized as

Fc=Mc(F)?F,

(1)

Fs=Ms(Fc)?Fc,

(2)

whereMcis the channel attention map,Msis the spatial attention map. ? denotes element-wise multiplication, which makes channel attention values be broadcasted along the spatial dimension, and vice versa.Fcis the output of channel attention block, andFsis the final refined output.

We found that the pooling operation reduced the use of feature information and had a negative impact on our attention module. To further preserve the feature mapping, we removed the maximum pooling and average pooling from the two attention mechanisms.

2.2 Narrow pooling layer module

Pooling operation is a very efficient way to obtain a large range of perceptual fields in pixel-by-pixel prediction tasks.

The average pooling is used in traditional clothing classification tasks, and a square kernel ofN×Nis generally taken for pooling to perform feature extraction. We define the input two-dimensional tensor asx, the dimension asH×W, and we have

(3)

whereNis the kernel size of the average pooling, 0

Considering that clothing images are usually regular and rectangular, the use of square convolution kernels is detrimental to the feature information and may lead to collecting useless background information, we propose to use a new narrow pooling layer. This method considers both global and local information usage by using an average pooling layer withH×1 and 1×Wconvolution kernels in both horizontal and vertical directions. This narrow pooling operation defined as

(4)

(5)

Parallel horizontal and vertical pooling branches encode the global horizontal or vertical information presented in the obtained feature maps, and then assign weights to the feature information for optimization. Our pooling operation captures feature information in two ways. On one hand, the kernel is able to collect global information more efficiently through a dimension equivalent to height or width. On the other hand, through narrow pooling operation, we can keep local information while discarding extraneous information, which helps to distinguish some highly similar clothing types more precisely. The architecture of embedding narrow pooling module(NPM) in the bottleneck of ResNet[15]is shown in Fig. 2.

Fig. 2 Bottleneck structure of ResNet which has narrow pooling

2.3 Overall architecture

Our clothing classification model is based on ResNet, which embeds attention mechanism and narrow pooling in the base network. This model has two main modules: an HAM based on 3D global information interaction, and an NPM for global and local information fusion. We added the hybrid attention module to the first and fourth layer of ResNet, and four main layers for global information interaction. The NPM was added to the last residual block of each layer for global and local information fusion.

3 Experiments

In this section, we evaluated the proposed method on popular clothing datasets, including Fashion-MNIST[16]and DeepFashion[10]with classification benchmarking and ablation studies.

3.1 Experiment dataset

The Fashion-MNIST dataset contains 6 000 samples per category and the test dataset contains 1000 samples per category. There are 10 categories in total, the training dataset has a total of 60 000 samples, and the test dataset has a total of 10 000 samples. As shown in Fig. 3, each grayscale clothing image is a 28×28 pixel array, the value of each pixel is an 8-bit unsigned integer (uint8) between 0 and 255, which is stored using a 3D array, and the last dimension indicates the number of channels.

Fig. 3 Example images of Fashion-MINIST dataset

DeepFashion is a large-scale dataset opened by the Chinese University of Hong Kong, China. As shown in Fig. 4, it contains 800 000 images, including images from different angles, different scenes, buyers’ shows,etc. We selected the subset used for classification in DeepFashion: category and attribute prediction benchmark contain 289 222 images with 46 categories in total, and all in JPG format.

Fig. 4 Example images of DeepFashion dataset

Fig. 5 Confusion matrix of baseline

Fig. 6 Confusion matrix of the proposed method

3.2 Experiment preparation

The experiments used graphics processing unit (GPU) to speed up the training of the model, and the Adam optimizer was used to speed up the convergence of the model. In the training phase, the training cycle (epoch) was set to 50 times, and the number of images per batch was 32. Due to the ueven size of the DeepFashion dataset, the images of DeepFashion are augmented by randomly panning and horizontal or vertical flipping, and the image size is uniformly adjusted to 224×224 pixels, which is to enhance the generalization ability of the model during the model training stage.

3.3 Experiment results

We first performed our experiments on the Fashion-MNIST dataset. As we can see from Table 1, each part of the proposed method is very useful and represents a significant improvement compared with the ResNet and CBAM baseline.

Table 1 Results on Fashion-MNIST

In Table 1, we can see that adding HAM or NPM, as well as using both methods together, improves the performance better than baseline by 0.33%, 0.05%, and 0.55%, respectively. In addition to the above results, the proposed method is also better than CBAM by 0.40%.

To visualize the improvement in accuracy of the improved model, we compared the confusion matrix generated by baseline with the results obtained by the proposed method shown as Figs. 5 and 6. The vertical axisy_true is the ground-truth and the horizontal axisy_pred is the predicted label. The number is dataset’s count. We find that most of the misclassified cases have been improved to some extent.

We also get excellent results on the DeepFashion dataset as shown in Table 2. The proposed method shows significant improvements in accuracy, average precision, and average recall, which is an improvement of 1.14%, 4.36%, and 2.49%, respectively over baseline.

Table 2 Results on DeepFashion

4 Conclusions

In this paper, a clothing classification model based on attention mechanism is proposed. The model firstly obtains global information and interacts with features through an improved blending attention mechanism, and then a narrow pooling layer is added to the convolutional layer to enhance the use of global and local features, finally clothing images are classified by feature fusion. The proposed method can be used to help industry managers and researchers to perform fast and effective automatic classification of clothing images. In addition, the model can also help to build image classification models and systems for other scenes.


登錄APP查看全文

主站蜘蛛池模板: 亚洲精品国产成人7777| 亚洲精品成人福利在线电影| 欧美日韩国产成人在线观看| 国产一二三区在线| 好久久免费视频高清| 波多野结衣视频网站| 国产又黄又硬又粗| av尤物免费在线观看| 国产一线在线| 中文字幕在线免费看| 亚洲高清无在码在线无弹窗| 一级全免费视频播放| 97免费在线观看视频| 波多野结衣一区二区三区88| 亚洲福利视频网址| 国产美女精品一区二区| 国产精品三区四区| 国产成人资源| 精品伊人久久大香线蕉网站| 制服丝袜在线视频香蕉| 九九线精品视频在线观看| 亚洲不卡网| 在线观看免费黄色网址| 色亚洲激情综合精品无码视频 | 久久精品丝袜高跟鞋| 91久久国产成人免费观看| 色综合综合网| 国产成人精品第一区二区| 欧美激情福利| 亚洲国产天堂久久综合| 激情综合网址| 欧洲av毛片| 精品久久久久久久久久久| 国产精品不卡永久免费| 国产午夜精品鲁丝片| 欧美精品一二三区| 国产女人综合久久精品视| 亚洲品质国产精品无码| 一级不卡毛片| 国产在线精彩视频论坛| 四虎精品国产AV二区| 中文一区二区视频| 第一区免费在线观看| 日韩不卡免费视频| 国产成人无码AV在线播放动漫 | 国产精品专区第1页| 国产美女在线观看| 亚洲人成成无码网WWW| 免费一级无码在线网站| 欧美亚洲国产一区| 中文字幕在线欧美| 久久精品无码一区二区日韩免费| 国产成人免费| 国产91小视频在线观看| 国产剧情一区二区| 亚洲人成色在线观看| 啪啪国产视频| 亚洲性一区| 亚洲天堂网2014| 黑色丝袜高跟国产在线91| 全裸无码专区| 91人妻在线视频| 亚洲天堂啪啪| 国产在线观看人成激情视频| 日韩高清欧美| 国产精品丝袜在线| 欧美激情网址| 影音先锋亚洲无码| 婷婷伊人久久| 国产高清免费午夜在线视频| 欧美日韩一区二区三区四区在线观看 | 国产永久在线视频| 久无码久无码av无码| 亚洲第一视频网| 欧美午夜小视频| 久久久受www免费人成| 综合色区亚洲熟妇在线| swag国产精品| 亚洲天堂视频在线免费观看| 综合天天色| 免费国产高清视频| 大香伊人久久|