999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Small objects detection in UAV aerial images based on improved Faster R-CNN

2020-04-21 00:54:18WANGJiwuLUOHaibaoYUPengfeiLIChenyang

WANG Ji-wu, LUO Hai-bao, YU Peng-fei, LI Chen-yang

(School of Mechanical, Electronic and Control Engineering, Beijing Jiaotong University, Beijing 100044, China)

Abstract: In order to solve the problem of small objects detection in unmanned aerial vehicle (UAV) aerial images with complex background, a general detection method for multi-scale small objects based on Faster region-based convolutional neural network (Faster R-CNN) is proposed. The bird’s nest on the high-voltage tower is taken as the research object. Firstly, we use the improved convolutional neural network ResNet101 to extract object features, and then use multi-scale sliding windows to obtain the object region proposals on the convolution feature maps with different resolutions. Finally, a deconvolution operation is added to further enhance the selected feature map with higher resolution, and then it taken as a feature mapping layer of the region proposals passing to the object detection sub-network. The detection results of the bird’s nest in UAV aerial images show that the proposed method can precisely detect small objects in aerial images.

Key words: Faster region-based convolutional neural network (Faster R-CNN); ResNet101; unmanned aerial vehicle (UAV); small objects detection; bird’s nest

0 Introduction

At present, there are three main object detection framework: Faster region-based convolutional neural network (Faster R-CNN)[1], single shot multibox detection (SSD)[2]and you only look once (YOLO)[3]. Compared with the other two object detection frameworks, Faster R-CNN usually has higher detection precision. However, Faster R-CNN method has different detection precisions for different scale objects because it is good for general scale objects, but relatively poor for small objects because it is easy to have missed detection. There are two main reasons for this problem. On the one hand, the extraction of region proposal position is not precise enough. Faster R-CNN uses an anchor mechanism in region proposal network (RPN) to generate nine region proposals of three ratios and three scales at each pixel position of the last convolutional feature map. Compared to the ground-truth box, the region proposals generated by RPN are too large for small objects, which results in an imprecise extraction of the region proposals. On the other hand, the multiple maximum pooling operation causes the small objects information of original image to be easily lost on the deep convolutional feature map. In Faster R-CNN, the maximum pooling operation is generally adopted between two or three adjacent convolutional layers, which has the advantages of effectively reducing the computational complexity of the network model and making the convolutional neural network have translation and rotation invariance to some extent. However, it also brings some problems. The resolution of the deep convolutional feature map is much lower than that of the original image, which results in the loss of the parts of original image information on the deep convolutional feature map. In addition, the region of interest (ROI) pooling in the Faster R-CNN detection sub-network uses the output feature map of the conv5_3 layer. For small objects, undergoing multi-layer pooling, the loss of feature information is severe, and sufficient information cannot be retained for subsequent classification and regression. Therefore, for the detection of small objects, it is not deeper convolution features that are more conducive to detection results.

Aiming at the existing problems of small objects detection with Faster R-CNN method, this paper presents an improved method based on Faster R-CNN. Firstly, we use the improved convolutional neural network ResNet101[4]to extract object features. Secondly, multi-scale sliding windows are used to obtain the object region proposals on the deep convolution feature maps with different resolutions. Finally, a deconvolution operation is added to further enhance the selected feature map with higher resolution, and makes a feature mapping layer of the region proposals pass to the object detection sub-network. This method provides a reliable basis for achieving the automatic detection of small objects based on UAV aerial images.

1 Design of feature extraction network

Effective extraction of the object feature is a key step in image object detection. The convolutional neural network[5-6]has strong image classification ability, because it is good at mining local features of image object data, and can greatly reduce the number of parameters to be learned by using its own unique local perception and weight sharing attributes, which can effectively improve network training performance. In this paper, the ResNet101 network architecture based on convolutional neural network is improved to complete the features extraction of the image objects. Compared with AlexNet[7], VGGNet[8], GoogLeNet[9]and other feature extraction networks, ResNet network has the following advantages:

1) ResNet normalizes the input data at each layer by adding batch normalization layer, which can effectively accelerate the convergence speed of the training and reduces the degree of over-fitting of the network model.

2) ResNet avoids the gradient dispersion problem caused by weight layer by using a new “shortcut” identical mapping network connection, which makes the network performance in an optimal state and not decrease with the increase of network depth.

3) The ResNet50/101/152 adopts the “Bottleneck design” method. As shown in Fig.1(a), by using 1×1 convolution to control the number of input and output feature maps of 3×3 convolution, the number of convolution parameters will be greatly reduced while the depth and width of the network will be increased.

In order to obtain an effective and rich object features, we propose a ResNet variant network structure by referring to the inception network structure, as shown in Fig.1(b). Such a design method enables each layer in the network to learn sparse or non-sparse features, which increases the adaptability of the network to the scale. Meanwhile, the network using two 3×3 convolutions can obtain a lager receptive field than before and also has fewer parameters.

256×1×1×64+64×3×3×32+(64+32)×

3×3×16+64×1×1×256=65 024,

(1)

256×1×1×64+64×3×3×64+

64×1×1×256=69 632.

(2)

By comparing Eq.(1) with and Eq.(2), it is found that the improved network structure has fewer training parameters. Furthermore, the subsequent experimental results show that the proposed ResNet variant structure can improve the detection precision significantly.

Fig.1 Comparison of ResNet101 network structures

2 Design of object detection network

2.1 Overall structure of object detection network

In order to solve the problem of multi-scale small objects detection in aerial images, an improved method based on Faster R-CNN method is proposed. The overall structure is shown in Fig.2.

Compared with Faster R-CNN method, the improvements of the method are as follows:

1) To solve the problem of imprecise location of region proposals for small objects, we propose a multi-scale sliding window method to obtain object region proposals on deep convolution feature maps with different resolutions, which is called multi-scale RPN (MS-RPN). According to the actual distribution of the object’s own scale, the network sets reasonable sliding windows with different sizes on different deep feature maps to generate region proposals with abundant scales on the input image, so that MS-RPN can extract more precise region proposals than RPN.

2) Aiming at the problem that the information of small objects in original image disappears on deep convolution feature maps, firstly, we prefer to use the improved ResNet101 network structure to extract object features; then we select ResNet_4w convolution feature map with appropriate depth and high resolution as feature mapping layer of the region proposals, and add a deconvolution operation to further enhance the resolution of this feature layer; finally, the region proposals generated by MS-RPN we pool into a fixed-size feature map by using ROI pooling operation, and then feed it into ResNet_5c convolution layer to extract the object features once more before the final detection is achieved.

Fig.2 Overall framework of small objects detection network

2.2 MS-RPN

The structure of MS-RPN is shown in Fig.3. In order to consider various-scale objects, especially small objects, we set reasonable sliding windows with different sizes on the deep convolutional layers ResNet_3d, ResNet_4f and ResNet_5c, respectively, and the region corresponding to each sliding window is mapped to the input image as a proposal window. The subsequent classification and regression processes are consistent with those of the classic RPN network. For ResNet_3d, because the resolution of this convolutional feature layer is lower and its response to small objects is stronger than that of other deep convolution feature layers in MS-RPN, so it is mainly used to extract the region proposals for the small objects in the input image. Considering the detection speed, we use the sliding windows 5×5 and 7×7 respectively on this convolution feature layer, and set the step size of the sliding window to be 2; For ResNet_4f, it is mainly for normal size objects, besides using the sliding windows of 5×5 and 7×7 an additional 9×9 sliding window is added, and all sliding windows have a step size of 1. For ResNet_5c, the sliding windows 7×7, 9×9 and 11×11 are used, respectively, and the sliding step is also set to be 1. Finally, the experimental results show that the proposed MS-RPN network can keep a high recall rate for small objects in UAV aerial images.

Fig.3 MS-RPN structure

2.3 Network loss function and training details

In order to train the MS-RPN network, it is necessary to label the region proposals corresponding to each sliding window. We assign a positive label to two kinds of region proposals: (i) the region proposal with the highest intersection-over-union (IoU) overlap with a ground-truth box, or (ii) the region proposal that has an overlap higher than 0.5 with any ground-truth box. We assign a negative label to the region proposal if its IoU ratio is lower than 0.2 for all ground-truth boxes. The region proposals being neither positive nor negative does not affect the training objective. The total loss function of MS-RPN network refers to the calculation method of RPN loss function in Faster R-CNN. Because the selection of the region proposals in this paper comes from different convolution layers, its calculation method is slightly different from that of RPN. The specific calculation method is expressed as

(3)

whereMis the number of convolution layers participating in the region proposals, which is 3;wmis the sample weight corresponding to each convolution layer;Smis the sample set extracted for each convolution layer, which is 128;lmis the loss function of any convolution layer in MS-RPN, which includes classification loss function {pi} and regression loss function {ti}. The whole object detection network uses back propagation and random gradient descent to train end-to-end.

Considering the network training stage, the large number of negative samples and uneven distribution will have a great impact on the final network model detection precision, therefore, we use the IoU value between the region proposal and the ground-truth box to rank all negative samples, and then select some samples with higher IoU value as negative samples to join the training set.

For an input image with a resolution of 1 000×600, the region proposals are extracted by the MS-RPN method, and about 12 000 region proposals are obtained. However, there will be a lot of overlap among the region proposals, which seriously affects the detection speed. Therefore, based on the confidence value of the region proposals, we use the method of non-maximum suppression (NMS) to select some high-quality region proposals. The IoU threshold is set to be 0.7. After performing the NMS operation, there are only about 1 000 region proposals left for each image. Subsequently, 100 regions with the highest confidence level are selected from the remaining 1 000 region proposals as the final region proposals, and then processed by ROI pooling operations and finally sent to the subsequent convolution feature extraction layer and object detection sub-network.

Meanwhile, in order to reduce the training parameters and accelerate the detection speed of the network, the global average pooling method is used to replace the full connection method in the detection part to achieve the classification judgment and the bounding box regression of the object.

3 Experimental results and analysis

3.1 Building object data set

In our work, the research object is the bird’s nest on the high-voltage tower in UAV aerial images, which is used to verify the proposed method.

In order to enrich the image training data set. We use image enhancement techniques (image flipping, image rotation, increasing image contrast and gaussian noise, etc.) to expand the image data set. Besides, the sample database contains not only the images with common complex background, but also contains the images with severe interference by illumination, coverage and haze. Then, the sizes of all image samples are scaled uniformly to 1 000×600, and the position and label of the bird’s nest in the image are marked respectively to make it conform to the standard data set format of Pascal VOC. At last, the images in the sample database are divided into two groups according to the ratio of 3 to 1. The sample number of training set is 9 000, and sample number of the test set is 3 000.

3.2 Experimental results and analysis on object test set

The object recall rates under different IoU thresholds are used as evaluation criteria by referring to Ref.[10]. The MS-RPN region proposals method is compared with the RPN region proposals method in Faster R-CNN on the constructed bird’s nest data set. From Fig.4, we can see that both RPN and MS-RPN have a high recall rate when the threshold is set between 0.5 and 0.7. But when the threshold exceeds 0.7, MS-RPN still has an ideal recall rate, while the recall rate corresponding to RPN decreases sharply. The results show that the MS-RPN method is more precise than the RPN method. There are two main reasons: Firstly, the region proposals selected by the RPN method are not precise enough for small objects; Secondly, the RPN method generally extracts the region proposals from the last deep convolution feature map. Because the resolution of this convolution feature layer is low, its detection ability for small objects is limited. But in our work, we set reasonable sliding windows with different sizes on different deep convolution feature maps according to the scale distribution of the research objects, so that the region proposals are extracted with higher precision.

Fig.4 Comparison of recall rates under different IoU thresholds

Table 1 compares the performance of the proposed method with those of the best traditional object detection methods including deformable parts model (DPM), Faster R-CNN VGG16 and Faster R-CNN ResNet101 on the same bird’s nest data set.

Table 1 Comparison of bird’s nest detection results on test set

DetectionmethodTest datasetmAP (%)Miss rate (%)Speed(frame/s)DPM3 00042.3439.7715Faster R-CNN VGG163 00071.4018.739Faster R-CNN ResNet1013 00073.3513.577Proposed method3 00085.556.6510

Fig.5 Example detection results of bird’s nest on test set

The tests were carried out on Nvidia Titan X. Compared with DPM object detection method, the proposed method has nearly doubled the detection precision, but the detection speed is slightly slower. Compared with Faster R-CNN VGG16 and Faster RCNN ResNet101, the detection mAP is improved by 14.15% and 12.2% respectively. Besides, the miss rate is reduced by about two-thirds and half respectively, which fully verifies the significant advantage of this method in small objects detection. Meanwhile, the detection speed of this method is slightly faster than the other two Faster RCNN method mentioned above. Fig.5 shows some bird’s nests detection results of this method on some test set.

In order to further verify the detection performance of the proposed method, network model decomposition experiments were carried out on the object data set, and the effects of various network design methods proposed in this paper on the detection results are analyzed concretely. The experimental results of network model decomposition in Table 2 show that the mAP of bird’s nest detection will decrease by 6.4% if MS-RPN is not used to extract the region proposals; And the improved ResNet-101 network structure can improve the mAP by 3.6%; Finally, the deconvolution operation can improve the mAP by 2.2%.

Table 2 Comparison of experimental results of network model decomposition

ProjectDetection resultsImproved ResNet101?√√√MS-RPN√?√√Dec-Conv√√?√mAP (%)81.9579.1583.3585.55

3.3 Experimental results and analysis on VOC0712 object test set

In order to verify the generality of the proposed method, it is also tested on VOC0712 data set, and the test results are compared with those of DPM method, Faster R-CNN VGG16 and Faster RCNN ResNet101. Four methods were carried out on the same training set and test set, respectively. The training set is composed of VOC2007-train and VOC2012-train, and the test set is VOC2007-test. Table 3 shows the test results of four methods on some detected objects. It can be seen from Table 3 that the last three detection methods based on convolution neural network have better detection precision than traditional DPM method on all scales of objects because the convolution neural network has the powerful function of automatic learning to extract object features. The proposed method is basically the same as Faster R-CNN VGG16 and Faster RCNN ResNet101 in detecting large objects such as airplanes and cars. The main reason is that most of the VOC2007 data set is composed of large objects. But the proposed method has obvious advantage on small objects such as birds, bottles and plants, and the overall detection precision has been improved by nearly 10%. Meanwhile, the detection precision of Faster R-CNN ResNet101 is slightly higher than that of Faster R-CNN VGG16 on various scale objects because with the increase of network depth, the better object features can be obtained. To sum up, the results just reflect that the proposed MS-RPN network can obtain a higher quality region proposal because the objects in UAV aerial image are generally much smaller than the whole image, therefore the proposed method is reasonable and feasible.

Table 3 Comparison of detection results on VOC2007 test set

4 Conclusion

This paper presents a general multi-scale small objects detection method based on Faster R-CNN for UAV aerial images. The experimental results show that the proposed method can precisely detect small objects in aerial images, and the detection precision is higher than those of those of best traditional method and other Faster R-CNN methods, and the speed is also slightly faster than those of other Faster R-CNN methods.


登錄APP查看全文

主站蜘蛛池模板: 国产亚洲欧美在线中文bt天堂| 国产精品内射视频| 福利片91| 国产91熟女高潮一区二区| www.国产福利| av免费在线观看美女叉开腿| 亚洲第一成人在线| 国产素人在线| 中文无码日韩精品| 依依成人精品无v国产| 国产成人一区在线播放| 國產尤物AV尤物在線觀看| 久久亚洲国产视频| 国产情精品嫩草影院88av| 九九热这里只有国产精品| 亚洲首页在线观看| 91精品网站| 一区二区无码在线视频| 狠狠亚洲婷婷综合色香| 亚洲乱伦视频| 国产特级毛片| 全午夜免费一级毛片| 免费观看欧美性一级| 天天综合网色| 国产区人妖精品人妖精品视频| 国产激情影院| 一级一毛片a级毛片| 国产自无码视频在线观看| 亚洲精品自产拍在线观看APP| 欧美不卡视频一区发布| 亚洲无码高清视频在线观看 | 欧美精品影院| 免费在线视频a| 国产在线精彩视频论坛| 一区二区自拍| 亚洲an第二区国产精品| 91精品啪在线观看国产91九色| 国产精品一线天| 国产亚洲欧美在线专区| 欧美日韩导航| 久久一日本道色综合久久| 国模沟沟一区二区三区| 999精品在线视频| 99伊人精品| 国产91线观看| 亚洲日本一本dvd高清| 国产美女丝袜高潮| 午夜久久影院| 亚洲欧美人成电影在线观看| 免费看久久精品99| 欧美中文字幕无线码视频| 99尹人香蕉国产免费天天拍| 久久久久久国产精品mv| 亚洲三级色| 久久成人18免费| 欧美激情首页| 不卡午夜视频| 国产91视频观看| 亚洲日韩AV无码精品| 日韩av电影一区二区三区四区 | 热99re99首页精品亚洲五月天| 污污网站在线观看| 亚洲成年人片| 麻豆精品在线视频| 久久久久亚洲av成人网人人软件| 免费欧美一级| 久久久久无码国产精品不卡| 91久久偷偷做嫩草影院精品| 中文字幕不卡免费高清视频| 欧美人人干| 国产一区二区网站| 免费播放毛片| 欧美日韩国产综合视频在线观看| 国产91色在线| 视频一区视频二区日韩专区| 国产一区二区三区在线观看视频 | 精品一區二區久久久久久久網站| 免费A级毛片无码免费视频| 尤物特级无码毛片免费| 久久国产V一级毛多内射| 高清色本在线www| www.精品国产|