基于深度多匹配網絡的多輪對話回復選擇模型

2023-12-31 00:00:00劉超李婉

計算機應用研究 2023年8期

摘要：現有工作利用神經網絡構建了各種檢索模型，取得了一定的成功，但仍存在注入模型信息篩選不充分、引入噪聲和對已知內容的潛在語義信息、時序關系挖掘不充分問題。針對上述問題，提出了基于深度多匹配網絡的多輪對話回復模型（DMMN）。該模型將上下文與知識作為對候選回復的查詢，在三者編碼之后提出預匹配層，采用單向交叉注意力機制分別篩選出基于知識感知的上下文與基于上下文感知的知識，識別兩者中重要的信息。將候選回復與以上兩者交互作用之后，進行特征聚合階段，一方面借助額外BiLSTM網絡捕獲基于回復的上下文對話信息間的時序信息，另一方面借助帶門控的注意力機制挖掘基于回復的知識間的語義信息，增強匹配特征信息。最后，融合上述表示特征。在原始的和修改后的Persona-Chat數據集上性能評測結果顯示，與現有方法相比，該模型召回率得到了進一步的提高，檢索出的回復效果更好。

關鍵詞：多輪回復選擇；深度多匹配網絡；語義挖掘；帶門控的注意力機制

中圖分類號：TP391文獻標志碼：A

文章編號：1001-3695（2023）08-023-2393-06

doi：10.19734/j.issn.1001-3695.2022.11.0783

Multi-turn dialogue response selection model based on

deep multi-matching network

Liu Chao， Li Wan

（School of Computer Science amp; Engineering， Chongqing University of Technology， Chongqing 400054， China）

Abstract：There are still issues with insufficient model information screening that introduces noise， insufficient mining of potential semantic information， and insufficient consideration of the temporal relationships of known contents， although existing works have constructed a variety of retrieval models using neural networks with some success. The research suggested a multi-turn dialogue response model based on a deep multi-matching network （DMMN） to overcome the aforementioned problems. The model took context and knowledge as queries to candidate responses， proposed a pre-matching layer after encoding all three， and used a one-way cross-attention mechanism to filter knowledge-aware context and context-aware knowledge， respectively， to identify the important information in both. After the candidate response had interacted with the aforementioned two， it conducted a feature aggregation phase to improve the matching feature information by mining the semantic information between the response-based knowledge and the attention mechanism with gating on the one hand and the temporal information between the response-based contextual dialogue messages with the aid of an additional BiLSTM network on the others. Finally， the representation features mentioned above were combined. According to the performance evaluation results on the original and revised Persona-Chat datasets， the model has further increased the recall rate and recovered better responses when compared to existing approaches.

Key words：response selection; deep multi-matching network; semantic mining; attention mechanism with gating

0 引言

建立一個智能的對話系統一直是人工智能中一個頗具難度的研究領域，人與機器之間能夠智能對話是人工智能的目標之一。基于檢索的對話系統利用給定的用戶輸入信息，從回復候選集中選擇出最相關的回復，其關鍵任務是衡量輸入信息和回復候選集的匹配度［1］。現有大型對話平臺如百度的小度、微軟的小冰［2］和阿里巴巴的AliMe［3］等設備依舊偏向于采用基于檢索的對話模型，主要原因是它的回復是來自于人的真實對話，語句質量高，語法錯誤少，更為流暢。

開放域對話的典型特點是話題廣泛且多樣，對于用戶輸入信息可能存在多個恰當回復。早期研究集中在將用戶輸入視為查詢的單輪對話，然而輸入語句本身攜帶的信息不足會極大程度限制機器理解用戶輸入語義的能力。與單輪檢索對話模型相比，多輪檢索對話系統需要整合當前用戶輸入和對話語境信息。序列匹配網絡（sequential matching network，SMN）［4］將回復語句和上下文進行匹配，研究表明引進上下文的確會使多輪回復選擇模型的性能得到巨大提升。深度表達融合網絡（deep utterance aggregation，DUA）［5］用網格細化處理對話后用注意力機制挖掘關鍵信息并忽略冗余信息。深度注意匹配（deep attention matching，DAM）網絡［6］使用堆疊的自注意力獲取多粒度的語義表示，利用交叉注意力來依賴信息進行匹配，分別對其進行改善。交互匹配網絡（interactive matching network，IMN）［7］增強word-level和sentence-level的上下文—回復對表示，并將上下文和回復進行雙向、全局交互，以得到匹配的特征向量。秦漢忠等人［8］提出的擴展DAM模型（ex-DAM），對DAM模型進行改進，引入多頭注意力機制使模型更適合處理有細微變化的數據。Whang等人［9］使用預訓練語言模型將回復檢索任務定于為對話—回復二分類問題，提出話語操縱策略來解決話語之間時間依賴被忽視的問題。Liu等人［10］通過基于Transformer預訓練模型的掩碼機制來解耦語境化的單詞表示，使得每個單詞分別只關注當前話語、其他話語以及說話角色。Zhang等人［11］利用監督對比損失將對比學習應用在回復選擇當中，學習到的正、負例在嵌入空間中得到更遠分離，從而提高匹配性能。

人們談話內容通常圍繞話題背景知識進行展開。例如，兩個人談論某一本書時，他們大腦中已經存在許多關于這本書籍的先驗知識。缺乏先驗知識會使對話系統與人的對話交互遭受語義和一致性等問題困擾［12］。研究證明將外部知識作為對話基礎，有利于生成更合理、信息性更豐富的回復［13～15］。因此，對話系統的研究重點逐漸轉向將外部知識納入對話系統。Zhang等人［16］將人物角色描述（personal）作為個性化知識表征來增強上下文表示，提高模型回復選擇能力。Gu等人［17］采用IMN作為基礎架構，采取Zhang等人［16］人物角色融合方法，提出了改進的個性化回復模型。Thulke等人［18］通過檢索無結構文本知識來增強對話系統生成效果。Yang等人［19］提出一種新的圖結構（ground graph）G²，對對話上下文和知識文檔的語義結構進行建模，以促進任務的知識選擇和集成。Wu等人［20］提出一種知識源感知多頭解碼方法（knowledge source aware multi-head decoding）KSAM，更有效地將多源頭知識注入到對話生成中。Gu等人［21］設計了多種角色融合策略，深入探討了如何利用自身和同伴personal知識更好地進行回復檢索。

現有的基于檢索的對話回復選擇工作已經努力利用神經網絡構建了各種文本匹配模型，仍存在兩個突出問題：a）將知識和上下文直接全用于匹配過程，這些信息并不全是有用的，不可避免地引入了噪聲信息，那些過度無用的信息會影響匹配過程；b）對已知內容的潛在語義信息和時序關系采取的挖掘不夠充分。為了解決這些問題，本文引入個性化角色知識（personal），提出了一個深度多匹配網絡（deep multi-matching network，DMMN）多輪對話回復選擇。該模型將知識和上下文的編碼信息相結合，使兩者軟對齊，篩選上下文與知識得到基于知識感知的上下文信息和基于上下文感知的知識信息。然后，將上述兩者與候選回復同時通過雙向交叉注意力機制進行雙交互匹配，使用BiLSTM（bi-directional long short-term memory）［22，23］分別將匹配特征信息進行整合之后，借助一個單獨的BiLSTM網絡和一個帶門控的注意力機制分別進一步增強基于回復的上下文匹配信息和基于回復的知識匹配信息。最后將信息整合傳入最終預測層，進行回復選擇。

本文主要貢獻有：a）提出了DMMN進行多輪對話回復選擇，并在原始的和已修改的Persona-Chat數據集上證明其有效性;b）提出預匹配層，采用交叉注意力機制分別對知識和上下文中重要部分信息進行篩選，使兩者軟對齊，解決了知識和上下文存在高度不對稱信息的問題，使其能夠更好地進行雙交互匹配;c）在特征聚合時，分別借助BiLSTM網絡增強基于回復的上下文的匹配信息，幫助模型學習序列中的語義關系和時序信息，借助帶門控的注意力機制增強基于回復的知識匹配信息，更好地挖掘有用信息，舍棄冗余信息。

1 任務定義

2 DMMN模型

2.1 模型概述

2.2 表示層

2.3 編碼層

2.4 預匹配層

2.5 匹配層

2.6 聚合層

2.7 預測層

2.8 損失函數優化策略

3 實驗

本章具體介紹了驗證模型效果的實驗，包括使用的數據集、實驗數值設置、使用的評價指標、對比模型、消融實驗以及實驗結果分析。

3.1 數據集

本文模型主要在原始的和修改后的Persona-Chat數據集上與基線模型進行對比實驗。該數據包含來自人類的162 064個對話語句，單個話語中每句最多15個詞。數據集作者將兩個人隨機配對，每個人只知道自己的個性化角色信息內容，不知道對方的個性化角色信息內容，每個人按照被分配的個性化角色進行自然的對話，并且在談話中了解對方。該原始的Persona-Chat數據集中的數據信息包括了8 939個完整的訓練對話，其中1 000個用于驗證，968個用于測試。如表1所示（加粗字體為所用到的Persona），實驗發現人與人的談話過程中會不知不覺地重復一些個人角色信息里的詞匯內容，數據集作者將這些信息通過改寫后，創建了修改后的Persona-Chat數據集。在原始的Persona-Chat數據集中，每一個個人角色資料在文件中平均含有4.49句話。已修改后的Persona-Chat數據集與原始的Persona-Chat句子數量相同，在原始資料中每句平均有7.33個單詞，修復后的資料中每句平均有7.32個單詞。

本文在原始Persona-Chat和修改后的Persona-Chat上分別做了實驗。

3.2 基線模型

以下模型將作為基線模型，在Persona-Chat數據集上與本文模型進行比較。

a）profile memory［16］。采用上下文作為查詢，對profile句子進行關注，用余弦來測量融合查詢，對profile句子進行關注，用余弦來測量融合查詢與回復之間的相似度。

b）KV profile memory［16］。對profile memory進行改進，擴展成多跳模型。首先用profile來獲得融合的查詢，隨后在第二跳中，將對話歷史作為關鍵信息，來幫助當前對話的預測。

c）Transformer［27］。Transformer的一個變體，將上下文和回復候選對象進行編碼，在Persona-Chat數據集上表現出不錯的性能。

d）DGMN［1］。通過自我注意機制對對話歷史和知識進行編碼，分別通過層次注意與候選回復交互。

e）DIM［17］。通過交叉注意機制使上下文和文檔分別與回復候選集進行雙向交互匹配。此模型與上面提到的所有基線相比是最好的。

f）BERT-based。Gu等人［21］深入利用personal來進行回復選擇，設計了多種角色融合策略，分別應用于分層循環編碼器（HRE）［28］、交互匹配網絡（IMN）［7］和BERT［29］模型之中。在三者中，應用于BERT模型的結果最好，本文基線采取其部署在BERT上的三種配置作為對比，分別為BERT-NA（無意識）、BERT-CA（情境感知）和BERT-RA（回復感知）。

3.3 實驗設置

實驗在TensorFlow［30］框架上實現，使用顯卡為Tesla T4的GPU機器進行加速。本文模型參數配置參照DIM模型的參數部署。在表示層中，使用GloVe、word2vec和字符嵌入聯合生成表示。GloVe嵌入設置為300維，特定訓練集的word2vec設置為100維，窗口為{3，4，5}大小的字符級的嵌入設置為150維。編碼層和聚合層都使用了BiLSTM網絡分別進行編碼和聚合，其中批次維度為batch_size，在訓練中將batch_size參數設置為16，此數值根據顯卡算力可以酌情增添，所有BiLSTM網絡的隱藏狀態設置為200維度，dropout_keep_prob設置為0.8。多層感知分類器（MLP）的隱藏層中隱藏單元設置為256。實驗中，將輪次num_epochs設置為10，每100輪評估一次。

在數據集的相關數值設置中，每個上下文中最大話語數設置為15，最大話語長度設置為20；每個回復候選最大話語數值設置為20，最大的回復候選長度設置為20；每個知識最大話語數為5，最大的知識句子長度為15。如果在話語中少于數值，將用零填充。

3.4 對比實驗結果與分析

3.5 消融實驗

綜合表3、4消融實驗的結果可以明確觀察到，在原始的和修改后的Persona-Chat數據集上，添加預匹配層的模型指標分別最高提高了0.7%和1.2%；聚合層采用帶門控的注意力機制模塊的模型指標提高了1.7%和0.2%。預匹配后效果優于未添加的模型，帶門控的注意機制對知識的聚合起到了重要作用，增加此模塊在兩種數據集上結果上升皆明顯。任何一個新增的模塊都會導致性能的提高，完整模型綜合了兩者，分別最高提高了2.3%、最少提高了0.6%和最高提高了1.3%、最少提高了0.4%，這證明了本文模型每個組件的有效性和必要性。

3.6 案例研究

為了進一步理解DMMN模型預匹配的作用性，本文隨機一段Persona-Chat數據集中的樣例，計算上下文語句的相關性得分，并將其可視化。表5顯示了每個話語與正確回復之間的相似性得分，表6為表5中對話語境文本對應的知識集與正確回復之間的相似性得分。可以看出，不同語句對回復選擇作用不同，預先對其進行匹配選擇是必要的。

圖5為上下文與知識集相關性得分的可視化，圖6為知識與選擇出的回復相似性得分可視化。可以看出，U1和U5分別用K1和K4獲得了較大的注意權重。同時一些不相關的項，如相對于U1和U3的K4相關性得分較小，可以通過適當閾值進行過濾。這些實驗結果證明了預匹配的有效性。

4 結束語

本文提出了一個融合知識的深度多匹配網絡來進行多輪的對話回復選擇。與DIM相比，提出增加預匹配層，將數據全部拋入匹配層進行交互運算之前，率先對齊知識與上下文，篩選即將進入匹配運算的數據，并且在聚合層分別對基于回復的上下文和基于回復的知識進行信息增強，進行深度信息挖掘。實驗結果證明，本文的改動和補充可以提升回復選擇的準確性。

希望未來可以在預匹配層進一步細化篩選的數據，提高挖掘對話歷史中全局、局部信息的能力，嘗試更好地將數據信息互相融合。

參考文獻：

［1］Zhao Xueliang， Tao Chongyang， Wu Wei， et al. A document-groun-ded matching network for response selection in retrieval-based chatbots［C］//Proc of the 28th International Joint Conference on Artificial Intelligence. 2019： 5443-5449.

［2］Shum X Y， He Xiaodong， Li Di. From Eliza to Xiao-Ice： challenges and opportunities with social chatbots［J］. Frontiers of Information Technology amp; Electronic Engineering， 2018，19（1）： 10-26.

［3］Li Fenglin， Qiu Minghui， Chen Haiqing， et al. AliMe assist： an intelligent assistant for creating an innovative ecommerce experience［C］//Proc of ACM on Conference on Information and Knowledge Management. New York： ACM Press， 2017： 2495-2498.

［4］Yu Wu， Wei Wu， Chen Xing， et al. Sequential matching network： a new architecture for multi-turn response selection in retrieval-based chatbots［C］//Proc of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2017： 496-505.

［5］Zhang Zhuosheng， Li Jiangtong， Zhu Pengfei， et al. Modeling multi-turn conversation with deep utterance aggregation［C］//Proc of the 27th International Conference on Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2018： 3740-3752.

［6］Zhou Xiangyang， Li Lu， Dong Daxiang， et al. Multi-turn response selection for chatbots with deep attention matching network［C］//Proc of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2018： 1118-1127.

［7］Gu Jiachen， Ling Zhenhua， Liu Quan. Interactive matching network for multi-turn response selection in retrieval-based chatbots［C］//Proc of the 28th ACM International Conference on Information and Know-ledge Management. New York： ACM Press， 2019： 2321-2324.

［8］秦漢忠，于重重，姜偉杰，等. 基于多頭注意力和BiLSTM改進DAM模型的中文問答匹配方法［J］. 中文信息學報， 2021，35（11）： 118-126. （Qin Hanzhong， Yu Chongchong， Jiang Weijie， et al. Improved DAM model based on multi-headed attention and Bi-LSTM for Chinese question and answer matching［J］. Journal of Chinese Information Processing， 2021，35（11）： 118-126.）

［9］Whang T， Lee D， Oh D， et al. Do response selection models really know what’s next？ Utterance manipulation strategies for multi-turn response selection［C］//Proc of AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2021： 14041-14049.

［10］Liu Longxiang， Zhang Zhuosheng， Zhao Hai， et al. Filling the gap of utterance-aware and speaker-aware representation for multi-turn dialogue［C］//Proc of AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2021： 13406-13414.

［11］Zhang Wentao， Xu Shuang， Huang Haoran. Two-level supervised contrastive learning for response selection in multi-turn dialogue［EB/OL］. （2022-03-01）. https：//arxiv.org/abs/2203.00793.

［12］Kai Hua， Feng Zhiyuan， Chong Yangtao， et al. Learning to detect relevant contexts and knowledge for response selection in retrieval-based dialogue systems［C］//Proc of the 29th ACM International Conference on Information amp; Knowledge Management. New York：ACM Press， 2020： 525-534.

［13］Majumder B P， Jhamtani H， Berg-Kirkpatrick T， et al. Like hiking？ You probably enjoy nature： persona-grounded dialog with commonsense expansions［C］//Proc of Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2020： 9194-9206.

［14］Xu Lin， Zhou Qixian， Fu Jinlan， et al. CorefDiffs： co-referential and differential knowledge flow in document grounded conversations［C］//Proc of the 29th International Conference on Computational Linguistics. ［S.l.］： International Committee on Computational Linguistics， 2022： 471-484.

［15］Oh M S， Kim M S. Persona-knowledge dialogue multi-context retrie-val and enhanced decoding methods［EB/OL］. （2022-07-28）. https：//arxiv.org/abs/2207.13919.

［16］Zhang Saizheng， Dinan E， Urbanek J， et al. Personalizing dialogue agents： I have a dog， do you have pets too？［C］//Proc of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2018： 2204-2213.

［17］Gu Jiachen， Ling Zhenhua， Zhu Xiaodan， et al. Dually interactive matching network for personalized response selection in retrieval-based chatbots［C］//Proc of Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computatio-nal Linguistics， 2019： 1845-1854.

［18］Thulke D， Daheim N， Dugast C， et al. Efficient retrieval augmented generation from unstructured knowledge for task-oriented dialog［EB/OL］. （2021-02-09）. https：//arxiv.org/abs/2102.04643.

［19］Yang Yizhe， Gao Yang， Li Jiawei， et al. G2： enhance knowledge grounded dialogue via ground graph［EB/OL］. （2022-04-27）. https：//arxiv.org/abs/ 2204.12681.

［20］Wu Sixing， Li Ying， Zhang Dawei， et al. KSAM： infusing multi-source knowledge into dialogue generation via knowledge source aware multi-head decoding［C］//Proc of Findings of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2022： 353-363.

［21］Gu Jiachen， Liu Hui， Ling Zhenhua， et al. Partner matters！ An empirical study on fusing personas for personalized response selection in retrieval-based chatbots［C］//Proc of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021： 565-574.

［22］Graves A， Mohamed A R， Hinton G. Speech recognition with deep recurrent neural networks［C］//Proc of IEEE International Confe-rence on Acoustics， Speech and Signal Processing. Piscataway， NJ： IEEE Press， 2013： 6645-6649.

［23］Hochreiter S， Schmidhuber J. Long short-term memory［J］. Neural Computation， 1997，9（8）： 1735-1780.

［24］Pennington J， Socher R， Manning C. GloVe： global vectors for word representation［C］//Proc of Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2014： 1532-1543.

［25］Mikolov T， Sutskever I， Chen Kai， et al. Distributed representations of words and phrases and their compositionality［C］//Proc of the 26th International Conference on Neural Information Processing Systems. 2013： 3111-3119.

［26］Hao Yanchao， Zhang Yuanzhe， Liu Kang， et al. An end-to-end mo-del for question answering over knowledge-base with cross-attention combining global knowledge［C］//Proc of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2017： 221-231.

［27］Vaswani A， Shazeer N， Parmar N， et al. Attention is all you need［C］//Proc of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 5998-6008.

［28］Serban I V， Sordoni A， Bengio Y， et al. Building end-to-end dialogue systems using generative hierarchical neural network models［C］//Proc of the 30th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2016： 3776-3783.

［29］Devlin J， Chang Mingwei， Lee K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］//Proc of Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. 2019： 4171-4186.

［30］Abadi M， Barham P， Chen Jianmin， et al. TensorFlow： a system for large-scale machine learning［C］//Proc of the 12th USENIX Confe-rence on Operating Systems Design and Implementation. ［S.l.］： USENIX Association， 2016： 265-283.

計算機應用研究2023年8期

計算機應用研究的其它文章: 下期要目; 特征挖掘與區域增強的弱監督時序動作定位; 基于時空注意的毫米波雷達人體活動識別網絡; 胸部X線影像和診斷報告的雙塔跨模態檢索; 基于邊緣關聯點云的激光雷達與相機外參標定方法; 一種SOFC燃燒室燃燒狀態識別方法