Luxi Xing - Blog

What I cannot create,
I do not understand.

Text Summarization - Main Problems

Posted on 2019-01-09 | Post modified: 2019-01-09 | In Text Summarization | Post Total Visits

Words count in article: 325 | Reading time ≈ 1

文本摘要中的主要问题与挑战

问题：

区分是挑战还是技术问题

1.关注重点的词、句；

Attention、pos、tf-idf
intra-temporal attention
encoder-less self-attention 【generating Wikipedia by Summarizing long Sequences】

2.不准确的复制\生成事实细节；

不准确的复制\生成事实细节；
- copy

3.重复性短语、句子

重复性短语、句子；
- temporal Attention
- coverage
- intra-attention

4.处理长文档

处理长文档；
- Sentence-level Attention
- selective gate
- self-attention：缓解长距离依赖

5.生成可读性好的摘要

生成可读性好的摘要；
- RL

6.生成新的词（基于理解的基础上）；

生成新的词（基于理解的基础上）；

7.词汇问题

罕见词（rare but important）、未登录词；
- add n-gram match term to loss【A neural Attention model for Sentence Summarization】
- pointer
使用大规模词典；candidate sampling

关注要解决的问题：

如何在生成过程中使decoder的注意力更集中，使Attention更聚焦，由于输入序列的长度比较长？即便使用Attention模型，也不能很好的聚焦到对应的源端token（loss focus）；
- encoder的输出在用于Attention计算时包含噪声
- 使重点更突出；而不是过滤？区别？
copy 机制的贡献程度？
- 以及coverage
如何判定信息冗余与信息丢失

**** END of This Post. Thank for Your READING ****

If you have any Question, welcome to Email me or leave your comments below.

Post author: Luxi Xing
Post link: http://xingluxi.github.io/2019/01/09/nlp-challenge-ts/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 3.0 unless stating additionally.