999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Deep Neural Network and Pseudo Relevance Feedback Based Query Expansion

2022-08-24 03:30:24AbhishekKumarShuklaandSujoyDas
Computers Materials&Continua 2022年5期

Abhishek Kumar Shuklaand Sujoy Das

Department of Mathematics,Bio-Informatics and Computer Applications,Maulana Azad National Institute of Technology Bhopal,Bhopal,Madhya Pradesh,462003,India

Abstract: The neural network has attracted researchers immensely in the last couple of years due to its wide applications in various areas such as Data mining, Natural language processing, Image processing, and Information retrieval etc.Word embedding has been applied by many researchers for Information retrieval tasks.In this paper word embedding-based skip-gram model has been developed for the query expansion task.Vocabulary terms are obtained from the top“k”initially retrieved documents using the Pseudo relevance feedback model and then they are trained using the skip-gram model to find the expansion terms for the user query.The performance of the model based on mean average precision is 0.3176.The proposed model compares with other existing models.An improvement of 6.61%,6.93%,and 9.07%on MAP value is observed compare to the Original query, BM25 model, and query expansion with the Chi-Square model respectively.The proposed model also retrieves 84, 25, and 81 additional relevant documents compare to the original query, query expansion with Chi-Square model, and BM25 model respectively and thus improves the recall value also.The per query analysis reveals that the proposed model performs well in 30, 36, and 30 queries compare to the original query,query expansion with Chi-square model,and BM25 model respectively.

Keywords: Information retrieval; query expansion; word embedding; neural network;deep neural network

1 Introduction

Over the years the web has growing exponentially and it has become difficult to retrieve the relevant documents as per the user query.The information retrieval system tries to minimize the gap between the user query and relevant documents.Various phases of the retrieval process are affected by the vagueness of the user query.For example novice user during the formulation of the query,might be uncertain in selecting the keyword to express his/her information need.The user has only a fuzzy idea about what he/she is looking for.Due to this retrieval system retrieves irrelevant documents along with relevant documents.Query expansion appends additional terms to the original query and helps in retrieving those additional relevant documents that were left out.Query expansion technique tries to minimize the word mismatch problem.Generally,queries are categorized into the following three main categories[1]

(1) Navigational queries

(2) Informational queries

(3) Transactional queries

Navigational queries are those queries that are searching a particular URL or website.Informational queries are those which search a broad area of the given query and may contain thousands of documents.Transactional queries are those which search user intention to execute some task like downloading or buying some items.In information retrieval,one method of query expansion could be the use of semantically similar terms to the original query.WordNet[2]based methods are one of the oldest methods for query expansion.It is a semantic-based approach that finds semantically similar terms of the original query terms by using synonyms,hyponyms and,meronyms of the query terms.Word embedding is a technique to find similar terms to the original query.Word2vev[3]and Glove[4]are the two well-known word embedding techniques to find the semantically similar terms to the original query terms for query expansion.Word2vec and Glove learns the word embedding vector in an unsupervised way using a deep neural network.Word2vec and Glove find the semantically similar term of original query terms using global document collection or external resources such as Wikipedia[5]or similarity thesaurus[6].The local method of query expansion searches the similar term of the original query using the Pseudo relevance feedback method.The pseudo relevance feedback method assumes that top “k”retrieved documents are relevant to the original query.It is observed that the local method of query expansion performs better than the global method of query expansion[7].

The proposed method uses a deep neural network-based query expansion method using the skipgram model.In the proposed method of query expansion semantically similar terms of the original query are retrieved from top “k” initial retrieved documents using the Pseudo relevance feedback method.Semantically similar terms are retrieved by training the terms in top “k”initially retrieved documents using the skip-gram model.In the skip-gram model, we predict the context word of the given center word.The Skip-gram model uses an un-supervised deep neural network-based training method that successively updates the weight between two successive layers.The Skip-gram model assigns each term to a lower-dimensional vector compare to the vocabulary size,in a semantic vector space.The proposed method predicts the context word of each query term and then finds the union of these context words.The combined context words are treated as expansion terms for the given query terms.Fig.1 shows the architecture of the proposed model.

2 Related Work

Query expansion plays important role in improving the performance of the retrieval system.The most common method of query expansion is to extract the expansion terms from an external data collection such as Anchor text,Query log,and external corpus.References[8,9]used anchor text as a data source.References[10,11]used query log for query expansion.They applied correlation between query terms and documents term.They collected data source from click-through of documents on URL.Reference [12] used query log as a bipartite graph where query nodes are connected to URL nodes by click edges and they showed an improvement of 10%.Reference[13]proposed co-occurrencebased document-centric probabilistic model for query expansion.A continuous word embeddingbased technique for the document was proposed by [14].They reported that their model performs better than LSI based model but does not outperform TF-IDF and divergence from the randomness model.Reference[15]proposed supervised embedding-based term weighting technique for language modeling.Reference[16]proposed semantic similarities between vocabulary terms to improve the performance of the retrieval system.Reference[17]proposed word embedding technique in a supervised manner for query expansion.Reference[18]proposed word embedding-based word2vec based model for expanding the query terms.Using this model they extracted similar terms of the query terms using the K-nearest neighbor approach.They reported considerable improvement on TREC ad-hoc data.Reference[19]used Word2Vec and Glove for query expansion for ad hoc retrieval.Reference[20]used fuzzy method to reformulate and expand the user query using pseudo relevance feedback method that uses top ‘k’ranked document as a data source.Reference [21] proposed a hybrid method that uses both local and global methods as a data source.The proposed method used a combination of external corpus and top ‘k’ranked documents as a data source.Reference [22] used a combination of top retrieved documents and anchor text as a data source for query expansion.Reference[23]used query log and web search result as a data source for query reformulation and expansion.Reference[24]used Wikipedia and Freebase to expand the initial query.Reference[25]used a fuzzy-based machine learning technique to classify the liver disease patients.Reference [26] proposed a machine learning technique that diagnoses breast cancer patients using different classifiers.

3 Query Expansion Using Deep Learning

Deep learning is a technique that is used in almost every area of computer science to learn something.In information retrieval,continuous word embedding is widely used to improve the mean average precision (MAP).There are following two deep learning approaches of word embedding technique

(1) The Continuous Bag of Words model(CBOW)[27]

(2) The Skip-gram model

Continuous bag of word model and Skip-gram model is widely used in query expansion method[28,29].A continuous bag of word model is used to predict the center word of given context words.The Skip-gram model is just the opposite of the CBOW model.The Skip-gram model predicts the context word of a given center word.In this paper Skip-gram model is used to expand the query.The proposed method predicts context words for each query term and then they are combined and treated as expansion terms.

The proposed model is having three-layer of architectures,Input layer,hidden layer,and output layer.The proposed model used both the feed-forward network and the back-propagation method to predict the context word of a given center word.In the skip-gram model architecture,each query word is represented as one-hot encoding at the input layer.In one hot encoding representation if vocabulary size is 7000 words then in a 7000X1 vector is created and 0 is put at each index except at the index containing the center word.“1”is put at the index of the center word.The architecture of the skipgram model is shown in Fig.2.In the following diagram, the weight matrix is initialized randomly.Hidden layer is used to represent the one hot encoding into a dense representation.This is achieved through the dot product of the hot vector and weight matrix.At the next layer,we initialize another weight matrix with random weights.Then the dot product of a hidden vector and newly weighted matrix is obtained.At the next layer,activation function softmax is applied to the output value of the product of the hidden vector and newly assigned weight matrix.In the mid of the training,we have to change the weight of both the matrix so that the words surrounding the context words have a higher probability at the softmax layer.Let N represents the number of unique terms in our corpus of text,X represents the one hot encoding of our query word at input layer,N’number of neurons in the hidden layer, W(N’XN) weight matrix between the input layer and hidden layer, W’(NXN’) weight matrix between the hidden layer and output layer,and Y a softmax layer having probabilities of every word in vocabulary,then using feed-forward propagation we have

h=wT.x

and

u=w′T.h

Let ujbe the jth neuron of layer u,wjbe the word in our vocabulary where j is any index,andVwjbe the jth column of matrix W’then we have

uj=.h

y=softmax(u)

yj=softmax(uj)

Yjdenote the probability that wjis a context word

P(wj|wi) is the probability that wjis a context word wiis the input word.The goal is to maximizeP(wj*|wi)where j*represents the indices of context words.We have to maximize

wherey*care the vocabulary indices of context words.Context words are range from c=1,2,3,....,C.The loss function E is defined as negative log of Eq.(1)as

Using Back Propagation we have

The loss function is propagated from output layer to hidden layer and hidden layer to input layer from Eqs.(2)and(3).The weight W and W’is updated as

whereWijNewandare the updated weights between input layer and hidden layer and hidden layer and output layer respectively.The algorithm of the proposed method is

Algorithm 1:SKIP-GRAM BASED QUERY EXPANSION 1.Create hot vector X,for each query term“t”for user query Q.2.Initializeskip_window_size=l,epoch=k,voc_size=N,hid_size=N’weight_matrix1=W(N’XN),weight_matrix2=W’(NXN’).3.Using Feed Forward:3.1 Compute h=wT.x,u=w′T.h 3.2 Compute softmax(uj)=eujimages/BZ_1434_894_1682_921_1707.pngN j′=1 euj′3.3 y=softmax(u)3.4 for j←1 to N:3.4.1 yj ←softmax(uj)4.Using Back Propagation:4.1 for i ←1 to N’:4.1.1 for j←1 to N:4.1.1.1 Compute ej ←yj-tj 4.1.2 Compute eji =ej.hi 4.1.3 Compute e′ij ←ej.w′ij.xi 4.2 for m ←1 to epoch:4.2.1 for I ←1 to N’:4.2.1.1 for j ←1 to N:4.2.1.1.1 W′ji ←W′ji-eji 4.2.1.1.2 Wij ←Wij-e′ij 5.len ←Length(Q)6.leq ←「l/len■7.for each t in query Q retrieve the indices at y which has top“leq”values.8.for each t in query Q retrieve the words corresponding to indices at y and merge them to Qm 9.Append these words to exp←Q+Qm 10.return exp

4 Experimental Results and Discussion

Precision and recall are the two metrics to check the performance of the retrieval system.A retrieval system with high precision and recall gives an implication to the evaluators that the proposed system is highly significant.Precision is defined as

Mean average precision(MAP)defined as

where

Qj:number of relevant documents for query j

N:number of queries

P(doci):precision at ith relevant document

We have performed our experiment onFIRE 2011 English test collection[30].The dataset is of size 1.1 GB containing 392577 documents.We have usedterrier3.5[31]search engine as retrieval engine.The documents are pre-processed through the following steps.

o Text Segmentation:To split the text into sentences and then to split the sentence into tokens.

o Stop Word Removal:This step remove all the stop words containing the documents.

o Stemming:This step stem the root words of all the terms containing the documents.

We have performed pre-processing on the underlying dataset by applyingPorter stemmer and Stopwordto stem the root word and to remove the stop words respectively.In the proposed method documents are retrieved using InL2c1.0 model.We have performed our experiment on 50 queries.The mean average precision value of the proposed method, query expansion with the Chi-Square model, BM25 model, and the original query are 0.3176, 0.2912, 0.2970, and 0.2979 respectively.An improvement of 6.61% to the original query is observed.The performance of the retrieval system on original query, Chi-squared based query expansion, query expansion using the proposed model and BM25 model are shown in Tabs.1–4 respectively.The performance of original queryvs.query expansion using Chi-Square,original queryvs.proposed model,Chi-Squarevs.proposed model,and original queryvs.Chi-Squarevs.proposed model is shown in Figs.3–6 respectively.From Tabs.1–4 it is observed that the proposed model outperforms to original query model, query expansion with the Chi-Square model,and BM25 model respectively.The MAP improvement of the proposed model to the original query and query expansion with Chi-square is 6.61%and 9.07%respectively.Figs.3–6 query by query analyses reveal that the proposed model retrieves 84 and 35 more relevant documents in comparison to the original query and query expansion with the Chi-Square model.The proposed model also performs well on 30 queries in comparison to and in 36 queries in comparison to the original query and query expansion with the Chi-Square model respectively.The sample query and their expansion terms using the proposed model are shown in Tab.6.In the Following Figures x-axis represents query number and y-axis represents MAP value respectively.

Figure 1:Architecture of proposed model

Figure 2:Deep skip-gram model architecture

Figure 3:Performance of Chi Square based query expansion vs.original query

Figure 4:Performance of proposed model vs.original query

Figure 5:Performance of proposed model vs.Chi Square model

Figure 6:Performance of proposed model vs.Chi Square model vs.original query

Figure 7:Performance of proposed model vs.BM25 model

Table 1:Original query

Table 2:Query expansion using Chi Square model

Table 3:Query expansion using proposed model

Table 3:Continued

Table 4:BM25 model

5 Discussion

From Tabs.1–5 it is clear that the proposed model performs well over the other models.The proposed model improves the MAP result 6.61%,6.93%,and 9.07%concerning original query,query BM25 model, and query expansion with Chi-Square model respectively.The proposed model also improves the result on R precision parameter 8.47%, 7.02%, and 12.13% concerning original query,BM25 model,and query expansion with Chi-Square model respectively.The proposed model improves recall value by retrieving 84, 25, and 81 additional documents compare to the original query, query expansion with the Chi-Square model,and BM25 model respectively.From Figs.4,5,and 7 it is clear that per query analysis reveals that out of 50 queries proposed model performs well in 30, 36, and 30 queries compare to the original query, query expansion with the Chi-Square model, and BM25 model respectively.Per query,analysis shows that more than 60%of queries proposed model performs well compare to other models.The proposed model performs well compare to the original query in query numbers Q128,Q129,Q130,Q131,Q133,Q139,Q140,Q141,Q143,Q144,Q146,Q147,Q148,Q150, Q154, Q155, Q156, Q157, Q159, Q160, Q162, Q163, Q165, Q166, Q169, Q170, Q171, Q172,Q173,and Q174.The proposed model also performs well compare to query expansion with the Chi-Square model in query numbers Q127, Q128, Q129, Q130, Q137, Q138, Q139, Q140, Q142, Q143,Q144, Q145, Q146, Q147, Q150, Q151, Q152, Q155, Q156, Q157, Q158, Q159, Q160, Q162, Q163,Q164, Q165, Q166, Q167, Q169, Q170, Q171, Q172, Q173, Q174, and Q1175.The proposed model also performs well compare to the BM25 model in query numbers Q127,Q128,Q129,Q130,Q132,Q133, Q139, Q140, Q141, Q143, Q144, Q146, Q147, Q148, Q150, Q154, Q155, Q156, Q157, Q159,Q160,Q162,Q163,Q165,Q166,Q169,Q171,Q172,Q173,and Q174.

Table 5:Comparison of relative performance of proposed model with other models

Table 6:Sample query and their expansion terms using proposed model

Table 6:Continued

6 Conclusion

In this paper, the word mismatch problem is minimized by applying combination of pseudo relevance feedback and deep neural network based method.In the proposed method,we have applied the skip-gram-based neural method for selecting the expansion terms.The mean average precision of the proposed method is 0.3176.An improvement of 6.61%and 9.07%is observed on MAP parameter in comparison to the original query and query expansion with Chi-square model respectively.The proposed model also retrieves 84 and 35 more documents in comparison to original query and query expansion with Chi-square model respectively.In near future, we will try to further improve the performance of the proposed method by tuning the parameters.

Acknowledgement:One of the authors is pursuing a full-time Ph.D.from the Department of Mathematics,Bio-informatics,and Computer Applications,Maulana Azad National Institute of Technology(MANIT)Bhopal(MP),India.He expresses sincere thanks to the Institute for providing an opportunity for him to pursue his Ph.D.work.The author also thanks the Forum of Information Retrieval and Evaluation(FIRE)to provide a dataset to perform his experimental work.

Funding Statement:The authors received no specific funding for this study.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

主站蜘蛛池模板: 性欧美久久| 久久窝窝国产精品午夜看片| 婷婷亚洲天堂| 欧美午夜理伦三级在线观看| 伊人成人在线| 中文字幕亚洲专区第19页| 日韩在线第三页| 国产亚洲精品97AA片在线播放| 免费A级毛片无码免费视频| 免费啪啪网址| 欧美高清国产| 亚洲综合二区| 免费观看国产小粉嫩喷水| 精品少妇人妻一区二区| a级毛片一区二区免费视频| 成人毛片在线播放| 一区二区日韩国产精久久| 人妻丰满熟妇AV无码区| 综合人妻久久一区二区精品| 在线视频一区二区三区不卡| 久久人搡人人玩人妻精品| 国产国产人在线成免费视频狼人色| 亚洲乱强伦| 亚洲Va中文字幕久久一区| 亚洲国产第一区二区香蕉| 这里只有精品在线播放| 久久99国产综合精品1| 精品丝袜美腿国产一区| 男人天堂亚洲天堂| 国产成人免费高清AⅤ| 乱人伦99久久| 在线观看网站国产| 国产成人午夜福利免费无码r| 国产精品三区四区| 国产呦视频免费视频在线观看 | 国产精品亚洲五月天高清| 久久福利网| 天堂在线视频精品| 日韩高清无码免费| 国产爽爽视频| 久久亚洲国产最新网站| 国产成人综合亚洲欧美在| 亚洲日韩精品伊甸| 国产在线拍偷自揄拍精品| 国产一区二区免费播放| 色天天综合久久久久综合片| 精品伊人久久久久7777人| 欧美一区二区丝袜高跟鞋| 国产精品视频系列专区| 国产美女免费| 中文字幕亚洲精品2页| 99青青青精品视频在线| 国产视频入口| 日本在线国产| 色综合久久88色综合天天提莫| 九九久久精品国产av片囯产区| 欧美在线黄| 免费无码在线观看| 国产福利小视频在线播放观看| 精品撒尿视频一区二区三区| 国产欧美日韩综合在线第一| 香蕉综合在线视频91| 精品亚洲国产成人AV| 免费人成在线观看成人片| 亚洲成A人V欧美综合天堂| 天堂成人av| 欧美精品H在线播放| 国产成人无码久久久久毛片| 亚洲综合极品香蕉久久网| 午夜天堂视频| 欧日韩在线不卡视频| 亚洲欧美在线综合一区二区三区| 亚洲欧美自拍视频| 欧美日韩一区二区三| 一级一级一片免费| 欧美性猛交一区二区三区| 成人免费一级片| 日韩在线网址| 国产一级二级三级毛片| 午夜a视频| 成人免费网站在线观看| 97人妻精品专区久久久久|