艾虎 李菲















摘? 要:方言的辨別可為案件偵破提供重要線索,本文針對(duì)貴州方言辨別提出一種有效的方言辨識(shí)模型,從貴州省6個(gè)地區(qū)采集時(shí)長(zhǎng)不等的語(yǔ)音樣本,提取梅爾頻率倒譜系數(shù)MFCC,然后利用多級(jí)二維離散小波變換提取MFCC中的低頻分量同時(shí)進(jìn)行壓縮,然后采用滑窗進(jìn)行信息重疊分塊,對(duì)每塊進(jìn)行奇異值分解并保留高貢獻(xiàn)率的特征向量,把分塊合并后轉(zhuǎn)換成一個(gè)3維矩陣作為方言辨識(shí)模型的輸入數(shù)據(jù)。先對(duì)卷積神經(jīng)網(wǎng)絡(luò)進(jìn)行改進(jìn),然后構(gòu)建方言辨識(shí)模型,并采用交叉實(shí)驗(yàn)對(duì)該模型進(jìn)行訓(xùn)練和驗(yàn)證,從而對(duì)二維離散小波變換的級(jí)數(shù)和滑窗的寬度進(jìn)行優(yōu)化。實(shí)驗(yàn)結(jié)果證明該模型對(duì)貴州方言辨識(shí)是高效的。
關(guān)鍵詞:漢語(yǔ)方言辨識(shí);梅爾頻率倒譜系數(shù);二維離散小波變換;奇異值分解;卷積神經(jīng)網(wǎng)絡(luò)
中圖法分類號(hào):TP391.4 文獻(xiàn)標(biāo)志碼:A? 文章編號(hào):2096-4706(2019)01-0005-06
Identification of Guizhou Dialect Based on Improved Convolutional Neural Network
AI Hu1,LI Fei2
(1.Department of Criminal Technology,Guizhou Police College,Guiyang? 550005,China;
2.The Education University of Hong Kong,Hong Kong? 999077,China)
Abstract:Chinese dialect identification may provide an important clue for forensic investigation. This paper has proposed an effective dialect identification model for Guizhou dialect identification. The authors extracted Mel frequency cepstral coefficients (MFCC) from speech samples of different time lengths collected from six regions in Guizhou province,then extracted low-frequency components in MFCC with multi-stage two-dimensional discrete wavelet transform (2-DWT) for compression,and then used the sliding window to conduct information overlapping blocking. The singular value of each block was decomposed and high contribution rate feature vectors were retained,and the blocks were combined and converted into a 3-dimensional matrix as the input data of the dialect identification model. Firstly,the convolutional neural network (CNN) is improved,then a dialect identification model is constructed,and the model is trained and verified by adopting a cross experiment,so that the stages of the two-dimensional discrete wavelet transform and the width of the sliding window are optimized. The experimental results show that the model is efficient for Guizhou dialect identification.
Keywords:Chinese dialect identification;mel frequency cepstrum coefficients;two-dimensional discrete wavelet transform;singular value decomposition;convolutional neural network
0? 引? 言
現(xiàn)代通訊工具在案件偵破中扮演著重要角色,對(duì)所產(chǎn)生的語(yǔ)音信息進(jìn)行方言辨別可以判斷犯罪嫌疑人的原籍地或長(zhǎng)期居留地,從而為案件偵破提供重要的線索。由于方言的發(fā)音差異主要體現(xiàn)在頻譜結(jié)構(gòu)的時(shí)間變化上[1],所以梅爾頻率倒譜系數(shù)(Mel-Frequency Cepstral Coefficients,MFCC)[2]作為從語(yǔ)音樣本中所提取的特征參數(shù)廣泛應(yīng)用于語(yǔ)音識(shí)別模型中。
目前,國(guó)內(nèi)外語(yǔ)音識(shí)別的模型多不勝舉,為了提高語(yǔ)音識(shí)別的魯棒性,用于語(yǔ)音識(shí)別模型的方法包括區(qū)分性訓(xùn)練(Discriminative Training,DT)[3,4]、因子分析(Factor Analysis,F(xiàn)A)[5,6]和全差異空間(Total Variability,TV)[7,8]等,用于聲學(xué)分類的模型包括隱馬爾可夫模型(HMM)[9]、深度神經(jīng)網(wǎng)絡(luò)(DNN)[10]、卷積神經(jīng)網(wǎng)絡(luò)(CNN)和多層反饋網(wǎng)絡(luò)(RNN)[11]等。……