聶可卉 劉文哲 童同 杜民 高欽泉



摘 要:針對目前視頻質量增強和超分辨率重建等任務中常采用的光流估計相關算法只能估計像素點間線性運動的問題, 提出了一種新型多幀去壓縮偽影網絡結構。該網絡由運動補償模塊和去壓縮偽影模塊組成。運動補償模塊采用自適應可分離卷積代替傳統的光流估計算法,能夠很好地處理光流法不能解決的像素點間的曲線運動問題。對于不同視頻幀,運動補償模塊預測出符合該圖像結構和像素局部位移的卷積核,通過局部卷積的方式實現對后一幀像素的運動偏移估計和像素補償。將得到的運動補償幀和原始后一幀聯結起來作為去壓縮偽影模塊的輸入,通過融合包含不同像素信息的兩視頻幀,得到對該幀去除壓縮偽影后的結果。與目前最先進的多幀質量增強(MFQE)算法在相同的訓練集和測試集上訓練并測試,實驗結果表明,峰值信噪比提升(ΔPSNR)較MFQE最大增加0.44dB,平均增加0.32dB,驗證了所提出網絡具有良好的去除視頻壓縮偽影的效果。
關鍵詞:視頻質量增強;光流估計;運動補償;自適應可分離卷積;去視頻壓縮偽影
中圖分類號:TP391; TP183
文獻標志碼:A
Abstract: The existing optical flow estimation methods, which are frequently used in video quality enhancement and superresolution reconstruction tasks, can only estimate the linear motion between pixels. In order to solve this problem, a new multiframe compression artifact removal network architecture was proposed. The network consisted of motion compensation module and compression artifact removal module. With the traditional optical flow estimation algorithms replaced with the adaptive separable convolution, the motion compensation module was able to handle with the curvilinear motion between pixels, which was not able to be well solved by optical flow methods. For each video frame, a corresponding convolutional kernel was generated by the motion compensation module based on the image structure and the local displacement of pixels. After that, motion offsets were estimated and pixels were compensated in the next frame by means of local convolution. The obtained compensated frame and the original next frame were combined together as input for the compression artifact removal module. By fusing different pixel information of the two frames, the compression artifacts of the original frame were removed. Compared with the stateoftheart MultiFrame Quality Enhancement (MFQE) algorithm on the same training and testing datasets, the proposed network has the improvement of Peak SignaltoNoise Ratio (ΔPSNR) increased by 0.44dB at most and 0.32dB on average. The experimental results demonstrate that the proposed network performs well in removing video compression artifacts.
英文關鍵詞Key words: video quality enhancement; optical flow estimation; motion compensation; adaptive separable convolution; video compression artifact removal
0 引言
去壓縮偽影是計算機視覺中的經典問題。圖像和視頻壓縮算法通常通過減小媒體文件大小以降低傳輸帶寬,達到節省傳輸成本和時間的效果;然而這種壓縮算法不可避免地導致圖像和視頻中信息的丟失和引入不必要的偽影,嚴重影響用戶的視覺體驗,因此,如何去除壓縮偽影并復原這些圖像和視頻是現在熱門的研究問題。
在過去幾年中,隨著深度學習的發展,許多方法已成功應用于去除圖像壓縮偽影:首先,偽影減少卷積神經網絡(Artifacts Reduction Convolutional Neural Network, ARCNN)[1]已經證明了深度卷積神經網絡(Convolutional Neural Network, CNN)在去除圖像中JPEG(Joint Photographic Experts Group)壓縮偽影的有效性; 隨后,深度雙域卷積網絡(Deep Dualdomain Convolutional Network, DDCN)[2]采用在頻域和像素域上同時對圖像進行處理來去除壓縮偽影; 近年來,隨著生成對抗網絡[3]被提出并被廣泛使用后,Guo等[4]和Galteri等[5]采用生成對抗網絡來去除圖像的壓縮偽影。上述提及的方法都驗證了深度神經網絡對于去除單一圖像壓縮偽影的有效性。
目前,通過以單幀圖像作為輸入得到的去偽影后的視頻幀仍存在較嚴重的物體輪廓模糊甚至信息丟失的情況,可見該方法在處理連續視頻幀上具有較大的局限性。通過融合視頻中連續的多幀圖像,利用相鄰幀之間像素的相關性和幀間的信息互補性,從而補償各幀丟失的信息,可以獲得更好的去視頻壓縮偽影效果。
現有的對視頻的質量進行增強的研究主要分布在視頻去噪去模糊、視頻超分辨率重建等工作[6-10]上。近來, Wang等[11]提出深層卷積自動解碼器(Deep CNNbased Auto Decoder, DCAD)網絡用于壓縮視頻質量恢復, 該網絡由10層卷積層組成,由于網絡體積較小,重建效果因此受限。Yang等[12]提出了解碼側卷積神經網絡(DecoderSide Convolutional Neural Network, DSCNN)用于視頻質量增強,該網絡由兩個子網絡組成,其中幀內解碼側卷積神經網絡(IntraDecoderside Convolutional Neural Network, DSCNNI)用來減少幀內編碼的壓縮偽影而幀間解碼側卷積神經網絡(InterDecoderside Convolutional Neural Network, DSCNNB)用來減少幀間編碼的壓縮偽影。由于以上兩種方法均未使用到相鄰視頻幀間的信息,故而均可看作是單幀圖像去偽影算法。Yang等[13]提出了分別通過兩個不同網絡處理HEVC(High Efficiency Video Coding)幀內和幀間編碼幀的質量增強卷積神經網絡(Quality Enhancement Convolutional Neural Network, QECNN)方法。由于該方法僅考慮到去除HEVC編碼的視頻,不適用于全部場景,故而Yang等[14]提出多幀質量增強(MultiFrame Quality Enhancement, MFQE)網絡結構。MFQE包含四部分: 一個支持向量機(Support Vector Machine,SVM)用于對高質量幀(Peak Quality Frame, PQF)和非高質量幀(nonPeak Quality Frame, nonPQF)進行分類,運動補償網絡用來實現幀間運動補償,兩個不同的質量增強網絡分別用來減少PQF和nonPQF幀的壓縮偽影。若壓縮視頻不存在PQF和nonPQF時(例如壓縮質量系數設置為CRF(Constant Rate Factor)),該網絡將不能很好地發揮作用。
光流估計算法是利用圖像序列中圖像在時間域上的變化以及相鄰幀之間的相關性來找到上一幀和當前幀之間存在的對應關系從而計算相鄰幀之間物體運動的一種方法。對于傳統的光流估計法[15-18]來說,需要通過光流圖估計和像素形變這兩個階段得到預測幀,由于缺乏光流圖的真實值,故而以上方法存在較大誤差。文獻[19]指出光流圖估計法被看作是點到點的固定的變換圖(transmission map),也即假定像素點A到像素點B的移動是一條直線(反之亦然),而并未考慮像素點的曲線運動,并且在視頻運動過程中出現遮擋和模糊的情況時,光流法可能會由于找不到相鄰幀中對應的像素點而無法得到較為準確的運動路徑。
空間轉換網絡(Spatial Transformer Network)[20]的提出使得網絡可以學習到兩張圖片像素的空間映射關系,并將這種點對點的映射關系以網格轉換(grid transform)的形式表現,該形式可以類似表示光流場中矢量運動,很快該空間轉換網絡被用于編碼運動視頻中的光流圖特征[14,21]進行運動補償操作。
本文通過使用兩個級聯網絡解決去除視頻壓縮偽影的問題。本文網絡包括兩個模塊:運動補償模塊和去壓縮偽影模塊。與通常使用基于網格映射進行運動補償的方法不同,本文中的運動補償網絡通過一維的局部可分離卷積方式實現,不僅可以有效地估計像素偏移,同時可以對相鄰幀間信息進行補償,為缺損視頻幀帶來更多像素信息。隨后,運動補償模塊得到的對后一幀的補償幀聯結原始的后一幀作為去壓縮偽影模塊的輸入,通過融合包含不同像素信息兩幀,重建后一幀視頻幀,實現去除壓縮偽影的效果。該網絡可以實現端到端的訓練。
本文的主要工作如下:
1)采用可分離局部卷積實現相鄰幀間像素估計與補償。較光流估計法點到點的直線運動估計,該方法通過非線性特征映射的方式可以對像素間可能存在的曲線運動進行估計,因而更具靈活性。
2)提出了一種新穎的基于卷積神經網絡去除視頻壓縮偽影的網絡模型方案,網絡模型由運動補償模塊和去壓縮偽影模塊相連接實現,通過聯結多幀圖像作為網絡輸入從而融合相鄰幀間缺損信息,可以達到更好的去除視頻偽影效果。
3 結語
本文提出一種新型多幀去壓縮偽影網絡結構, 其中:運動補償模塊以自適應可分離卷積方式實現對后一幀像素的運動偏移估計和缺損像素補償; 去壓縮偽影模塊通過融合含有不同像素信息量的補償幀和對應的原始視頻幀,最終得到去視頻壓縮偽影結果。在本文實驗中,運動補償網絡得到的補償幀較對應壓縮幀的PSNR平均提升了0.03dB,與對應未壓縮視頻幀的幀間差較壓縮幀平均減少了0.04dB,由此證明了運動補償網絡對缺損像素的補償作用,并且,結合了運動補償網絡后去偽影結果比僅去壓縮偽影網絡結果在視覺效果上有顯著提升。本文中結合了運動補償網絡的去壓縮偽影結果較目前先進的ARCNN、DCAD、DSCNN和MFQE增強算法結果在相同測試序列上平均ΔPSNR分別提高了1.58dB,1.55dB,1.42dB以及0.32dB,較MFQE算法在測試序列上最大ΔPSNR提升了0.44dB,并且本文網絡去偽影后視覺效果較上述算法均有顯著提升,這表明本文所提出的網絡具有良好的去除視頻壓縮偽影的作用。
在未來工作中,將展開對網絡加速方法的研究,例如嘗試使用深度可分離卷積代替原始二維卷積的策略,通過調整網絡結構,在保證網絡性能的前提下對網絡進行加速。
參考文獻 (References)
[1] ??? DONG C, DENG Y, CHEN C L, et al. Compression artifacts reduction by a deep convolutional network[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015: 576-584.
[2] ??? GUO J, CHAO H. Building dualdomain representations for compression artifacts reduction [C]// ECCV 2016: Proceedings of the 2016 European Conference on Computer Vision. Berlin: Springer, 2016: 628-644.
[3] ??? GOODFELLOW I J, POUGETABADIE J, MIRZA M, et al. Generative adversarial networks[J/OL]. arXiv Preprint, 2014, 2014: arXiv:1406.2661 [2014-06-10]. https://arxiv.org/abs/1406.2661.
[4] ??? GUO J, CHAO H. Onetomany network for visually pleasing compression artifacts reduction [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 4867-4876.
[5] ??? GALTERI L, SEIDENARI L, BERTINI M, et al. Deep generative adversarial compression artifact removal [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2017: 4836-4845.
[6] ??? 楊麗麗,盛國.一種基于卷積神經網絡的礦井視頻圖像降噪方法[J]. 礦業研究與開發, 2018, 38(2): 106-109. (YANG L L, SHENG G. A mine video image denoising method based on convolutional neural network[J]. Mining Research and Development, 2018, 38(2): 106-109.)
[7] ??? REN W, PAN J, CAO X, et al. Video deblurring via semantic segmentation and pixelwise nonlinear kernel[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2017: 1086-1094.
[8] ??? SAJJADI M S M, VEMULAPALLI R, BROWN M. Framerecurrent video superresolution[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 6626-6634.
[9] ??? TAO X, GAO H, LIAO R, et al. Detailrevealing deep video superresolution [C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 6626-6634.
[10] ?? 李玲慧,杜軍平,梁美玉,等.基于時空特征和神經網絡的視頻超分辨率算法[J].北京郵電大學學報,2016, 39(4):1-6. (LI L H, DU J P, LIANG M Y, et al. Video super resolution algorithm based on spatiotemporal features and neural networks[J]. Journal of Beijing University of Posts and Telecommunications, 2016, 39(4):1-6.)
[11] ?? WANG T, CHEN M, CHAO H. A novel deep learningbased method of improving coding efficiency from the decoderend for HEVC[C]// Proceedings of the 2017 Data Compression Conference. Piscataway, NJ: IEEE, 2017: 410-419.
[12] ?? YANG R, XU M, WANG Z. Decoderside HEVC quality enhancement with scalable convolutional neural network[C]// Proceedings of the 2017 IEEE International Conference on Multimedia and Expo. Piscataway, NJ: IEEE, 2017: 817-822.
[13] ?? YANG R, XU M, WANG Z, et al. Enhancing quality for HEVC compressed videos [J/OL]. arXiv Preprint, 2018, 2018: arXiv:1709.06734 (2017-09-20) [2018-07-06]. https://arxiv.org/abs/1709.06734.
[14] ?? YANG R, XU M, LIU T, et al. Multiframe quality enhancement for compressed video [C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 6664-6673.
[15] ?? DOSOVITSKIY A, FISCHERY P, ILG E, et al. FlowNet: learning optical flow with convolutional networks [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015: 2758-2766.
[16] ?? BAILER C, TAETZ B, STRICKER D. Flow fields: dense correspondence fields for highly accurate large displacement optical flow estimation[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015: 4015-4023.
[17] ?? REVAUD J, WEINZAEPFEL P, HARCHAOUI Z, et al. EpicFlow: edgepreserving interpolation of correspondences for optical flow [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2015: 1164-1172.
[18] ?? ILG E, MAYER N, SAIKIA T, et al. FlowNet2.0: evolution of optical flow estimation with deep networks[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017, 2: 6.
[19] ?? MAHAJAN D, HUANG F C, MATUSIK W, et al. Moving gradients: a pathbased method for plausible image interpolation [J]. ACM Transactions on Graphics, 2009, 28(3): Article No. 42.
[20] ?? JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2015: 2017-2025.
[21] ?? NIKLAUS S, MAI L, LIU F. Video frame interpolation via adaptive separable convolution [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington, DC: IEEE Computer Society, 2017: 261-270.
[22] ?? HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770-778.
[23] ?? HE K, ZHANG X, REN S, et al. Identity mappings in deep residual networks [C]// ECCV 2016: Proceedings of the 2016 European Conference on Computer Vision. Berlin: Springer, 2016: 630-645.
[24] ?? DROZDZAL M, VORONTSOV E, CHARTRAND G, et al. The importance of skip connections in biomedical image segmentation [M]// Deep Learning and Data Labeling for Medical Applications. Berlin: Springer, 2016: 179-187.
[25] ?? BOSSEN F. Common test conditions and software reference configurations [S/OL].[2013-06-20].http://wftp3.itu.int/avarch/jctvcsite/2010_07_B_Geneva/JCTVCB300.doc.
[26] ?? GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks [C]// Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: JMLR, 2010: 249-256.
[27] ?? KINGMA D, BA J. Adam: a method for stochastic optimization[EB/OL].[2018-03-20]. http://yeolab.weebly.com/uploads/2/5/5/0/25509700/a_method_for_stochastic_optimization_.pdf.
[28] ?? BARRON J T. A more general robust loss function[J/OL]. arXiv Preprint, 2017, 2017: arXiv:1701.03077 (2017-01-11) [2017-01-11]. https://arxiv.org/abs/1701.03077.
[29] ?? LAI W S, HUANG J B, AHUJA N, et al. Deep Laplacian pyramid networks for fast and accurate superresolution[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 5835-5843.