2019-10-15对话系统调研篇

1.高建峰 微软 自然语言处理,信息检索,机器学习,深度学习
h指数

[1]A user simulator for task-completion dialogues | 43 | 2016 |
[2]End-to-end joint learning of natural language understanding and dialogue manager | 39 | 2017 |
[3]Deep reinforcement learning for dialogue generation | 459 | 2016 |
[4]Towards end-to-end reinforcement learning of dialogue agents for information access | 121 | 2016 |
[5]End-to-end task-completion neural dialogue systems| 120 | 2017 |
[6]Efficient exploration for dialogue policy learning with bbq networks & replay buffer spiking | 76 | 2016 |
[7]Composite task-completion dialogue policy learning via hierarchical deep reinforcement learning | 52 | 2017 |

一、A user simulator for task-completion dialogues

摘要:
Despite wide spread interests in reinforcement-learning for task-oriented dialogue systems, several obstacles can frustrate research and development progress. 
First,reinforcement learners typically require interaction with the environment, so con-ventional dialogue corpora cannot be used directly.  
Second, each task presentsspecific challenges, requiring separate corpus of task-specific annotated data. 
Third,collecting and annotating human-machine or human-human conversations for task-oriented dialogues requires extensive domain knowledge.  
Because building an appropriate dataset can be both financially costly and time-consuming, one popular approach is to build a user simulator based upon a corpus of example dialogues.
Then, one can train reinforcement learning agents in an online fashion as they inter-act with the simulator. 
Dialogue agents trained on these simulators can serve as an effective starting point. Once agents master the simulator, they may be deployed in a real environment to interact with humans, and continue to be trained online.To ease empirical algorithmic comparisons in dialogues, this paper introduces anew, publicly available simulation framework, where our simulator, designed forthe movie-booking domain, leverages both rules and collected data. 
The simulator supports two tasks: movie ticket booking and movie seeking. Finally, we demon-strate several agents and detail the procedure to add and test your own agent in theproposed framework
用户仿真的面向任务的对话系统
1.虽然强化学习在面向任务的对话系统中被广泛应用着,但该方法也存在一些阻碍。
2.强化学习需要与环境交互,所以对话语料没办法直接使用。
3.每个任务都需要特定的单独的语料库。
4.获取面向任务的“人-人”,“人-机”对话,需要广泛的领域知识。
5.构建合适的数据集,会花费巨大的人力成本和时间成本。
6.一种较为适用的数据收集方式:基于示例对话的语料库构建用户模拟器。
7.网上在线,与人互动时,训练强化代理器。
8. 一旦代理掌握了模拟器,就可以将其部署在真实的环境中与人类互动,并继续进行在线训练。
9.该篇文章主要研究内容:
基于规则的方法--电影票预定/电票搜索
∗源代码位于:https://github.com/MiuLab/UserSimulator

二、End-to-end joint learning of natural language understanding and dialogue manager

摘要:
Natural language understanding and dialogue policy learning are both essential in conversational systems that predict the next system actions in response to a current user utterance.
Conventional approaches aggregate separate models of natural language understanding (NLU) and system action prediction (SAP) as a pipeline that is sensitive to noisy outputs of error-prone NLU. 
To address the issues, we propose an end-to-end deep recurrent neural network with limited contextual dialogue memory by jointly training NLU and SAP on DSTC4 multi-domain human-human dialogues. 
Experiments show that our proposed model significantly outperforms the state-of-the-art pipeline models for both NLU and SAP, which indicates that our joint model is capable of mitigating the affects of noisy NLU outputs, and NLU model can be refined by error flows backpropagating from the extra supervised signals of system actions.
1.对话系统:根据用户的会话,预测系统下一个动作。
2.对话系统两个重要组成部分:自然语言理解/对话策略学习。
3.对话系统常规方法,自然语言理解和系统动作预测单独聚合在一起。
4.但,自然语言理解部分常容易出错。
5.该篇文章主要研究内容:
提出具有有限上下文对话记忆的端到端深度循环神经网络,联合自然语言理解和系统动作预测一起训练。
∗源代码位于:https://github.com/XuesongYang/end2end_dialog

三、Deep Reinforcement Learning for Dialogue Generation

摘要:
Recent neural models of dialogue generation offer great promise for generating responses for conversational agents, but tend to be short-sighted,  predicting  utterances  one  at  a  time while ignoring their influence on future out-comes. 
Modeling the future direction of a dialogue is crucial to generating coherent, interesting dialogues, a need which led traditional NLP models of dialogue to draw on reinforcement learning. 
In this paper, we show how to integrate these goals, applying deep reinforcement learning to model future reward in chat-bot dialogue. 
The model simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties:  informativity, coherence, and ease of answering (related to forward-looking function). 
We evaluate our model on diversity, length as well as with human judges, showing that the proposed algorithm generates more interactive responses and manages to foster a more sustained conversation in dialogue simulation. 
This work marks a first step towards learning a neural conversational model based on the long-term success of dialogues.
基于强化学习的对话生成
1.单轮对话的缺点,只考虑当前用户会话;多轮对话,要考虑用户之前也说过的话。
2.用户的对话有跳跃性,也有连贯性,对话系统中可以用强化学习,促进了更持久的对话。
3.该篇文章的主要研究内容:
展示了如何整合这些目标,将深度强化学习应用于聊天机器人对话的奖励模型中。 
获取一个能持久性聊天的系统。
持久对话的例子

四、Towards end-to-end reinforcement learning of dialogue agents for information access

摘要:
This paper proposes KB-Info Bot— a multi-turn dialogue agent  which  helps users  search Knowledge Bases (KBs) without  composing  complicated queries.
Such  goal-oriented  dialogue  agents  typically  need  to  interact  with  an  external database to access real-world knowledge.
Previous systems achieved this by issuing a symbolic query to the KB to retrieve entries  based  on  their  attributes.   
However, such symbolic operations break the differentiability of the system and prevent end-to-end training of neural dialogue agents.
In  this  paper,  we  address  this  limitation by replacing symbolic queries with an induced “soft” posterior distribution over the KB that indicates which entities the user is interested in.  
Integrating the soft retrieval process with a reinforcement learner leads to higher task success rate and reward in both  simulations  and  against  real  users.
We also present a fully neural end-to-end agent, trained entirely from user feedback, and discuss its application towards personalized dialogue agents.
1.多轮对话代理,帮助用户搜索知识库KB,而无需写复杂的查询。
2.系统通过向KB发出符号查询,根据其属性检索条目来实现对话返回。
但是,这样的符号操作破坏了系统的可区分性,并阻止了神经对话代理的端到端训练。
3.即,以上查询返回过程是不在,神经网络模型训练过程中的,它是直接借助数据库查询语句了。
4.该篇文章研究内容:
将查询返回这一系类动作与强化学习联系起来
完全根据用户的反馈去训练模型
源代码可在以下位置获得:https://github.com/MiuLab/KB-InfoBot

五、End-to-end task-completion neural dialogue systems

摘要:
One of the major drawbacks of modularized task-completion dialogue systemsis  that  each  module  is  trained  individually,  which  presents  several  challenges.
For example, downstream modules are affected by earlier modules,  and  the  per-formance  of  the  entire  system  is  not  robust to the accumulated errors.   
This paper presents a novel end-to-end learning framework for  task-completion dialogue systems to tackle such issues.   
Our neural  dialogue  system  can  directly  interact with a structured database to assist users in accessing information and accomplishing certain tasks. 
The reinforcement learning based dialogue manager offers robust capabilities to handle noises caused by other components of the dialogue system.
Our  experiments  in  a  movie-ticket  booking domain show that our end-to-end system not only outperforms modularized dialogue system baselines for both objective and  subjective  evaluation,  but  also  is  robust to noises as demonstrated by several systematic  experiments  with  different  error granularity and rates specific to the language understanding module.
1.面向任务的对话模块,针对每个任务是单独训练一个模型,这对对话系统有一定影响。
2.不良影响,如:前一个模块的错误会累计到后一个模块中。
该篇文章的研究内容:
在电影票预定领域进行实验
设计基于强化学习管理的对话系统

六、Efficient exploration for dialogue policy learning with BBQ networks & replay buffer spiking

摘要:
When rewards are sparse and action spaces large, Q-learning with greedy exploration can be inefficient. 
This poses problems for otherwise promising applications such as task-oriented dialogue systems, where the primary reward signal, indicating successful completion of a task, requires a complex sequence of appropriate actions. 
Under these circumstances, a randomly exploring agent might never stumble upon a successful out come in reasonable time.  
We present two techniques that significantly improve the efficiency of exploration for deep Q-learning agents indialogue systems. 
First, we introduce an exploration technique based on Thompson sampling, drawing Monte Carlo samples from a Bayes-by-backprop neural network, demonstrating marked improvement over common approaches such as greedy and Boltzmann exploration.  
Second, we show that spiking the replay buffer with experiences from a small number of successful episodes, as are easy to harvest for dialogue tasks, can make Q-learning feasible when it might otherwise fail.
1.当奖励稀少且行动空间很大时,进行贪婪探索的Q学习可能会效率低下。
2.任务成功完成的主要奖励信号需要一系列复杂的适当动作。
3.在这种情况下,随机探索的决策代理可能永远不会在合理的时间内做出成功的回复。
4.该篇文章的研究内容:
提高Q学习代理程序对话系统的探索效率

七、Composite task-completion dialogue policy learning via hierarchical deep reinforcement learning

摘要:
Building  a  dialogue  agent  to  fulfill  complex tasks, such as travel planning, is challenging  because  the  agent  has  to  learn to collectively complete multiple subtasks.
For example, the agent needs to reserve a hotel and book a flight so that there leaves enough time for commute between arrival and hotel check-in.  
This paper addresses this  challenge  by  formulating  the  task  in the  mathematical  framework  of options over Markov Decision Processes (MDPs), and  proposing  a  hierarchical  deep  reinforcement learning approach to learning a dialogue  manager  that  operates  at  different  temporal  scales.   
The  dialogue  manager consists of:  (1) a top-level dialogue policy that selects among subtasks or op-tions, (2) a low-level dialogue policy that selects  primitive  actions  to  complete  the subtask given by the top-level policy, and (3) a global state tracker that helps ensure all  cross-subtask  constraints  be satisfied.
Experiments on a travel planning task with simulated and real users show that our approach leads to significant improvements over three baselines,  two based on hand-crafted  rules  and  the  other  based  on  flat deep reinforcement learning.
该篇文章的研究内容:
针对旅行对话系统进行优化
提出 分层的深度强化学习方法来学习在不同时间范围内运行对话管理器

八、总结

限定范围:电影查询对话/电影票预定对话/旅游出行对话 都是解决某一特定领域,某一特定问题。 对话系统--目标--智能服务人类的生活,对话系统中待解决的问题太多了。 强化学习是对话系统发展的方向。
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 159,117评论 4 362
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 67,328评论 1 293
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 108,839评论 0 243
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 44,007评论 0 206
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 52,384评论 3 287
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 40,629评论 1 219
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 31,880评论 2 313
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 30,593评论 0 198
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 34,313评论 1 243
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 30,575评论 2 246
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 32,066评论 1 260
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 28,392评论 2 253
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 33,052评论 3 236
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 26,082评论 0 8
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 26,844评论 0 195
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 35,662评论 2 274
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 35,575评论 2 270

推荐阅读更多精彩内容