









摘 要: 針對基于內容的視頻檢索中場景分割效率有待提高的問題,提出了一種基于卷積神經網絡提取特征的多模態視頻場景分割優化算法。首先利用改進的VGG19網絡從視頻鏡頭中提取多種模態的底層特征和語義特征,再將這些特征組成向量,然后通過三重損失學習與鏡頭相似度計算等方法,使場景分割問題轉換為對鏡頭邊界的二分類問題,最后建立評分機制優化所得結果,獲取分割好的視頻場景及對應的場景邊界,完成場景分割任務。實驗結果表明,該算法能對視頻場景進行有效分割,整體查全率與查準率分別能達到85.77%、87.01%。
關鍵詞: 場景分割; 多模態; 卷積神經網絡; 相似度度量; VGG19
中圖分類號: TP37"" 文獻標志碼: A
文章編號: 1001-3695(2022)05-054-1595-06
doi:10.19734/j.issn.1001-3695.2021.10.0404
Multi-modal video scene segmentation optimization algorithm based on convolutional neural network
Huang Qinga, Feng Hongcaib, Liu Lia
(a.School of Mathematics amp; Computer Sciences, b.Network amp; Information Center, Wuhan Polytechnic University, Wuhan 430023, China)
Abstract: Aiming at the problem that the efficiency of scene segmentation in content-based video retrieval needs to be improved,this paper proposed a multi-modal video scene segmentation optimization algorithm based on feature extraction of convolutional neural network.Firstly,the algorithm applied the improved VGG19 network to extract underlying features and semantic features from each video shots.Secondly,this paper combined these features into vectors and applied the method of triplet loss learning and shot similarity calculation,so that converted the scene segmentation task to a binary classification problem for shot boundary.Finally,this paper established a scoring mechanism to optimize the results and obtained the segmented video scene and corresponding scene boundary.Experimental results show that the algorithm can be effective in video scene segmentation,and the overall recall and precision indicators can reach 85.77% and 87.01%.
Key words: scene segmentation; multi-modal; convolutional neural networks(CNN); similarity measure; VGG19
視頻場景分割是實現視頻場景構建和檢索的關鍵步驟,以視頻鏡頭作為研究內容,將相似的連續鏡頭組合到同一場景中,將視頻分割成若干個語義相關的邏輯故事單元[1]。視頻摘要和檢索等更高層次的任務要求以場景作為基本單元[2],近年來國內外研究者對視頻場景分割進行了大量研究。Sidiropoulos等人[3]通過引入鏡頭轉換圖(shot transition graph,STG)的方法進行視頻場景分割,其中每個節點都代表一個鏡頭,并根據它們之間的相似度對節點之間的邊進行加權,最后,利用歸一化切割將STG分解為子圖,檢測場景邊界。但該方法沒有充分考慮鏡頭間的語義相關關系,因此生成的子圖難以形成語義層面的場景,導致分割結果不夠準確。Kumar等人[4]采用鏡頭相似度圖(shot similarity graph,SSG)法解決場景分割問題,并添加滑動窗口來控制場景檢測過程,通過鏡頭相似度計算,動態調整滑動窗口的長度,避免了場景分割過多或過少的問題。……