E: Enriched / Enhanced
Main Related Work
- Pre-trained Models for Natural Language Processing: A Survey. 2020.
- ERNIE(THU)
- ERNIE: enhanced language representation with informative entities. In ACL, 2019.
- KnowBERT
- Knowledge enhanced contextual word representations. EMNLP-IJCNLP, 2019.
- K-BERT
- K-BERT: Enabling language representation with knowledge graph. AAAI,2020.
- KEPLER
- KEPLER: A unified model for knowledge embedding and pre-trained language representation. 2019.
- WKLM
- Pretrained encyclopedia: Weakly supervised knowledge-pretrained language model. ICLR, 2020.
Introduction of KnE-PTM
PTMs通常是从general-purpose large-scale text corpora中学习universal language representation,但是缺少 domain-specific knowledge。
从外部知识库向PTM中引入领域知识是一种有效的方式。
Note1: Pre-training 的优势:
- 在大规模文本语料库上进行预训练,可以学习通用语言表示并辅助下游任务;
- 预训练提供了更好的模型初始化,通常会带来更好的泛化性能,并加速对目标任务的收敛;
- 预训练可以看作是一种正则化,以避免模型在小数据集上过拟合;
Note2: Pre-training 任务:
- LM with Maximum-likelihood estimation (MLE);
- Masked Language Modeling (MLM)
- uni/bidirectional MLM
- Seq2Seq MLM
- E-MLM: span boundary objective、span order recovery
- Permuted Language Modeling (PLM)
- Denoising Autoencoder (DAE);
- token masking
- token deletion
- text infilling
- sentence permutation
- document rotation
- Contrastive Learning (CTL)
- Deep InfoMax
- Replaced Token Detection (RTD)
- Next Sentence Prediction (NSP)
- Sentence Order Prediction (SOP)
外部知识库类型及相关工作:
- 引入 linguistic 知识的工作:
- 引入 semantic 知识(实际上是语言学知识)
- SenseBERT[04]
- 利用wordnet,在预测 masked tokens 的同时,还预测其在 wordnet 中的supersenses;
- SenseBERT[04]
- 引入 commonsense 知识
- A knowledge-enhanced pretraining model for commonsense story generation. 2020.
- 针对特定下游任务
- A knowledge-enhanced pretraining model for commonsense story generation. 2020.
- 引入 factual 知识
- ERNIE:-THU
- 利用预训练的KGE通过实体提及链接到文本,增强文本表示;
- KnowBERT.
- 联合BERT训练和entity linking任务;
- K-BERT.
- 显式地从KG中抽取三元组扩展输入句子(树形)
- WKLM.
- 通过实体替换识别任务,鼓励模型感知事实型知识;
- KEPLER.
- 联合优化 KE 和 LM 的目标函数;
- 通过 实体表示,引入KG的结构信息;
- ERNIE:-THU
- 引入 domain-specific 知识[05][06][07][08]
引入知识的方法
- 在 pre-training 阶段引入外部知识
- BERT之前,早期工作,联合学习 KGE 和 word embeddin;
- BERT之后,增加 auxiliary pre-training 任务:
- pros.
- cons.:
- 更新PTM的参数,导致在引入多种类型的知识时发生灾难性遗忘;
- 仅适配单一知识;
- 更新PTM的参数,导致在引入多种类型的知识时发生灾难性遗忘;
- 引入外部知识而不从头训练PTM
- 在下游任务中应用:
- K-BERT:侧重输入格式的处理
- KT-NET:结合预训练的KGE完成QA任务;
- intermediate pre-training 的方式;
- 针对下游任务的需求,
- 在下游任务中应用:
- 将LM扩展为KG-conditioned LM (knowledge graph conditioned LM)[14][15]
Probe Knowledge in LM
Probe Linguistic Knowledge
Probe World Knowledge
- 采用
fill-in-the-blank
的形式构造输入查询 BERT;
- 01.Informing unsupervised pretraining with external linguistic knowledge. 2019. ↩
- 1.What do you learn from context? probing for sentence structure in contextualized word representations. ICLR, 2019. ↩
- 02.Linguistic knowledge enhanced language representation for sentiment analysis. 2019. ↩
- 2.Linguistic knowledge and transferability of contextual representations. NAACL,2019. ↩
- 03.K-adapter: Infusing knowledge into pre-trained models with adapters. 2020. ↩
- 3.Assessing BERT’s syntactic abilities. 2019. ↩
- 4.What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. TACL,2020. ↩
- 04.SenseBERT: Driving some sense into BERT. 2019. ↩
- 05.Integrating graph contextualized knowledge into pre-trained language models. 2019. ↩
- 5.A structural probe for finding syntax in word representations. NAACL,2019. ↩
- 06.BioBERT: a pre-trained biomedical language representation model for biomedical text mining. 2019. ↩
- 6.What does BERT learn about the structure of language? ACL,2019. ↩
- 7.Are pre-trained language models aware of phrases? simple but strong baselines for grammar induction. 2020. ↩
- 07.SciBERT: A pretrained language model for scientific text. EMNLP,2019. ↩
- 8.Language models as knowledge bases? EMNLP,2019. ↩
- 08.PatentBERT: Patent classification with fine-tuning a pre-trained BERT model. 2019. ↩
- 9.How can we know what language models know? 2019. ↩
- 10.BERT is not a knowledge base (yet): Factual knowledge vs. name-based reasoning in unsupervised QA. 2019. ↩
- 11.Negated LAMA: birds cannot fly. 2019. ↩
- 12.Inducing relational knowledge from BERT. 2019. ↩
- 13.Commonsense knowledge mining from pretrained models. EMNLP,2019. ↩
- 14.Barack’s wife hillary: Using knowledge graphs for fact-aware language modeling. ACL, 2019. ↩
- 15.Latent relation language models. 2019. ↩