章露露 呂曉偉
摘要:查詢擴(kuò)展是信息檢索領(lǐng)域重要研究?jī)?nèi)容。為了解決信息檢索過程中用戶提交查詢時(shí)描述不準(zhǔn)確以及查詢?cè)~不匹配的問題,提出一種基于Word2vec的語義查詢擴(kuò)展方法。使用分布式神經(jīng)語言概率模型Word2vec訓(xùn)練低維詞向量,選取擴(kuò)展詞候選集,利用面向擴(kuò)展詞的查詢向量生成方法過濾候選集,使選取的擴(kuò)展詞能更有效地體現(xiàn)整個(gè)查詢的語義及語法相關(guān)性。實(shí)驗(yàn)結(jié)果表明基于Word2vec的語義查詢擴(kuò)展方法使查全率及查準(zhǔn)率均有提高,因此該方法能很好地應(yīng)用于查詢擴(kuò)展領(lǐng)域。
關(guān)鍵詞:查詢擴(kuò)展;分布式神經(jīng)語言概率模型;Word2vec;面向擴(kuò)展詞;語義相關(guān)性
DOIDOI:10.11907/rjdk.181044
中圖分類號(hào):TP301
文獻(xiàn)標(biāo)識(shí)碼:A文章編號(hào)文章編號(hào):16727800(2018)009004804
英文標(biāo)題Semantic Query Expansion Method Based on Word2vec
--副標(biāo)題
英文作者ZHANG Lulu,LV Xiaowei
英文作者單位( Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
英文摘要Abstract:Query expansion is an important research issue in the field of information retrieval.In order to solve the problem of inaccurate description and mismatch when users submit queries,we propose a new semantic query expansion method based on Word2vec.The distributed neural language probability model word2vec is used to train the low dimensional word vectors to select the expansion term anthology,and a new query vector generation method based on extended words is proposed to filter candidate sets,so that the selected extended words can be reflected more effectively in the semantic and grammatical correlation of the whole query.The experimental results show that the semantic query expansion method based on Word2vec has improved both the recall rate and the precision ratio.Therefore,the semantic query extension method based on Word2vec can be applied to the domain of query extension well.
英文關(guān)鍵詞Key Words:query expansion; distributed neural language probability model; Word2vec; expansion oriented words; semantic relevance
0引言
完整的信息檢索系統(tǒng)通常包括數(shù)據(jù)庫(數(shù)據(jù)庫中包含若干文檔),將每篇文檔與詞項(xiàng)相關(guān)聯(lián)的索引以及匹配機(jī)制,該機(jī)制由詞語組成的用戶查詢和相關(guān)文檔形成映射。建立信息檢索系統(tǒng)的主要目的是能在給定的索引數(shù)據(jù)庫中找到包含搜索者所需信息的文檔[1]。
傳統(tǒng)信息檢索系統(tǒng)處理用戶給定的查詢時(shí),要求用戶描述精確并輸入查詢。但是通常情況下,用戶對(duì)想要查詢的信息并不能精確描述,因此信息檢索系統(tǒng)可能會(huì)返回大量非預(yù)期的結(jié)果,導(dǎo)致“詞典問題”(Dictionary Problem)[23]。想要解決詞典問題,用戶需要在提交查詢時(shí)使用足夠多的關(guān)鍵詞描述搜索內(nèi)容,信息檢索系統(tǒng)才能返回滿意的結(jié)果。……