馮利琪 江華 閆格 閔長偉 李玲香



摘要:針對傳統DNN語音分離中噪聲干擾的問題,提出了一種在DNN語音分離后期處理中結合DNN和譜減法的語音分離方法。首先提取語音聲級特征,通過DNN學習帶噪特征到分離目標語音的映射,得到分離目標語音;然后對分離目標語音中每一時頻單元進行噪聲能量估計;最后,通過快速傅里葉逆變換得到譜減后的分離語音波形信號。通過對不同類型的噪聲和不同輸入信噪比混合后的語音信號進行試驗,結果表明,加入譜減法后分離的語音信號與只經DNN網絡輸出的語音信號相比,前者分離的語音可懂度和信噪比得到了顯著提高,并且分離語音的信號更接近于純凈語音的信號。
關鍵詞:語音分離;神經網絡;譜減法;目標語音;噪聲能量估計
Speech Separation Combined with DNN and Spectral Subtraction
FENG Li?qi?JIANG Hua?YAN Ge?MIN Chang?wei?LI Ling?xiang
(1.Key Laboratory of Granular Computing and Application,Minnan Normal University;
2.School of Computer Science, Minnan Normal University, Zhangzhou 363000,China;
3.School of Electronics and Information Engineering,Hunan University of Science and Engineering,Yongzhou 425199,China)
Abstract:In view of the problem of noise interference in traditional DNN speech separation, a speech separation method based on DNN and spectral subtraction was proposed in the post processing of DNN speech separation. Firstly, the features of speech were extracted and the DNN was used to learn the mapping of the noisy features to the separated target speech. Then the noise energy is estimated for each time frequency unit in the separated target speech. Finally, the speech waveform was obtained by the inverse fast fourier transform. By testing the speech signal mixed by different types of noise and different input SNR, the experimental results show that compared with the speech signal output only by the DNN network, the speech signal separated after adding spectral subtraction is significantly improved in the speech intelligibility and signal to noise ratio of the proposed algorithm. The similarities between the separated speech signal and the original clean speech signal has also been greatly improved.
Key Words:speech separation; neural networks; spectral subtraction; target speech; noise energy estimation
0?引言
近年來,深度學習被廣泛應用于語音信號處理領域。語音分離問題起源于“雞尾酒會效應”[1],即從復雜的混合聲音中分離所需語音。語音分離的目的主要是從被干擾的語音信號中分離出有用信號,該過程實質上相當于一個監督性學習問題。隨著互聯網技術飛速發展,電子設備處理語音的能力不斷提高,使語音分離在眾多領域發揮重要作用,如自動語音識別、助聽器、移動語音通信等[2]。
單聲道語音分離在語音信號處理領域已被廣泛研究。隨著深度學習的出現,許多深層模型被廣泛應用于語音和圖像處理[3?5]。其中深度神經網絡(Deep Neural Networks,DNN)在語音分離領域發揮著強大作用[6?9]。……