EMNLP2019 | KagNet - Knowledge-Aware Graph Networks for Commonsense Reasoning

Title: KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
Author: Bill Yuchen Lin, Xinyue Chen, Jamin Chen, Xiang Ren
Org.: University of Southern California, Shanghai Jiao Tong University
Published: EMNLP,2019
Code: https://github.com/INK-USC/KagNet

Motivation

本文针对的数据集是：CommonsenseQA

目标：

empowering machines with the ability to perform commonsense reasoning/inferences
- 关于推理的定义：
  - reasoning is the process of combining facts and beliefs to make new decisions ^[1].
  - reasoning is the ability to manipulate knowledge to draw inferences ^[2].
- 关于常识推理的定义：
  - commonsense reasoning utilizes the basic knowledge that reflects our natural understanding of the world and human behaviors, which is common to all humans.
- Naive Physics: Humans’ natural understanding of the physical world
- Folk Psychology: Humans’ innate ability to reason about people’s behavior and intentions
gap between baselines and human performance
lack of transparency and interpretability
- how the machines manage to answer commonsense questions and make their inferences.
why exploit commonsense knowledge bases
- knowledge-aware models can explicitly incorporate external knowledge as relational inductive biases
  - enhance reasoning capacity
  - increase the transparency of model behaviors for more interpretable results
- challenges
  - How can we ﬁnd the most relevant paths in KG? ( noisy )
  - What if the best path is not existent in the KG? ( incomplete )

This work

Knowledge-aware reasoning framework, two major steps:

schema graph grounding (see figure below)
graph modeling for inference

Knowledge-aware graph network module: KAGNET

GCN-LSTM-HPA 结构：
- 由GCN, LSTM, 和一个 hierarchical path-based attention mechanism组成
- 用于 path-based relational graph representation
overall workflow
- 首先，分别识别出 $q$ 和 $a$ 中提及的 concept ，根据这些 concept ，找到他们之间的路径，构建出 (ground) schema graph；
- 使用 LM encoder 编码 QA 对，产生 statement vector，作为 GCN-LSTM-HPA 的输入，来计算 graph vector；
- 最后使用 graph vector 计算最终的QA对的打分

Model

question $q$ with $N$ candidate answers $\{a_i\}$

schema graph $g=(V, E)$

Schema Graph Grounding

Concept Recognition

n-gram 匹配：句子中的 token 和 ConceptNet 的顶点集合进行 n-gram 匹配
Note：从有噪声的知识源中有效地提取上下文相关的知识仍是一个开放问题

Schema Graph Construction

sub-graph matching via path finding

采取一种直接有效的方法：直接在Q和A中提及的Concept ($\mathcal{C}_q \cup \mathcal{C}_a$) 之间查找路径
- 对于问题中的一个 Concept $\mathcal{c}_i \in \mathcal{C}_q$ 和候选项中的一个 Concept $\mathcal{c}_j \in \mathcal{C}_a$ ，查找他们之间路径长度小于 $k$ 的path，添加到图中
  - 本文中，设置 $k=4$，即3-hop paths

Path Pruning via KG Embedding

为了从潜在噪声的schema graph中删除无关的path

使用KGE方法，如TransE等，预训练Concept Embedding和Relation Type Embedding（同时可用于KAGNET的初始化）
评价路径的质量
- 将路径分解为三元组集合，一条路径的打分为每一组三元组的乘积，通过设置一个阈值进行过滤。
- 三元组的打分通过KGE中的打分函数进行计算（例如，the confidence of triple classification）

Knowledge-Aware Graph Network

overview:
首先，使用GCN对图进行编码
然后，使用LSTM对 $\mathcal{C}_q$ 和 $\mathcal{C}_a$ 之间的路径进行编码，捕捉multi-hop relational Information
最后，使用 hierarchical path-based attention 计算 relational schema graph 和 QA 之间路径的关系

Graph Convolution Networks

使用GCN的目的：

contextually refine the concept vector
- 这里的 context 指节点在图中的上下文，即邻接关系
- 使用邻居来对预训练的Concept Embedding进行消歧
capture structural patterns of schema graphs for generalization
schema graph 的模式为推理提供了潜在的有价值的信息
- QA对Concept之间的更短、更稠密的连接可能意味着更大的可能性，在特定的上下中。
- 评价候选答案的可能性
$h_i^{(l+1)} = \sigma(W_{self}^{(l)} h_i^{(l)} + \sum_{j \in N_i} \frac{1}{|N_i|} W^{(l)} h_j^{l})$

Relational Path Encoding

定义问题中的第 $i$个 concept $c_i^{(q)}$ 和候选答案中的第 $j$ 个 concept $c_j^{(a)}$ 之间的第$k$ 条路径为 $P_{i,j}[k]$ ，路径是三元组序列：

$P_{i,j}[k]=[(c_i^{(q)}, r_0, t_0), …, (t_{n-1}, r_n, c_j^{(a)})]$
- relation vector 由 KGE 预训练得到
- concept vector 是上一环节 GCN 的顶层输出
每个三元组表示为三个向量的串联，得到 triple vector
使用LSTM编码三元组向量序列，得到 path vector
- $R_{i,j} = \frac{1}{|P_{i,j}|} \sum_k LSTM(P_{i,j}[k])$
$R_{i,j}$ 可以视为问题中的concept 和候选项中的concept 之间的潜在的关系

聚集所有路径的表示，得到最终的 graph vector $g$

这里使用了 Relation Network 的方法：
- $T_{i,j} = MLP([s;c_q^{i};c_a^{(j)}])$
- statement vector $s$ 为 LM encoder [CLS] 的表示
然后通过mean-pooling：称这种计算方式为 GCN-LSTM-mean
- $g=\frac{\sum_{i,j}[R_{i,j};T_{i,j}]}{|\mathcal{C}_q| \times |\mathcal{C}_a|}$
通过这种简单的方式将分别从 symbolic space 和 semantic space 中计算的relational representation 进行融合

候选项的 plausibility 打分：

$\text{score}(q,a)=sigmod(MLP(g))$

Hierarchical Attention Mechanism

考虑到不同的路径对推理的重要程度不同，采用 mean-pooling 不是一种可取的方式。

基于此，本文提出了 hierarchical path-based attention 机制，有选择地聚集重要的path vector以及更重要的QA concept 对。

使用 path-level 和 concept-pair-level attention 来学习根据上下文建模图表示
path-level
- $\alpha_{(i,j,k)} = T_{i,j} W_1 LSTM(P_{i,j}[k])$
- $\hat{a}_{(i,j,\cdot)} = softmax(\alpha_{(i,j,k)})$
- $\hat{R}_{i,j} = \sum_k \hat{a}_{(i,j,k)} \cdot LSTM(P_{i,j}[k])$
concept-pair level
- $\beta_{(i,j)} = s W_2 T_{i,j}$
- $\hat{\beta}_{(\cdot, \cdot)} = softmax(\beta_{(\cdot, \cdot)})$
- $\hat{g} = \sum_{i,j} \hat{\beta}_{(i,j)} [\hat{R}_{(i,j)}; T_{i,j}]$

Experiments

Transferability

Case Study on Interpretibility

exp

Error Analysis

negative reasoning
- graph grounding 对否定词不敏感
comparative reasoning strategy
- 没有进行答案之间的比较
subjective reasoning
- 有些答案是根据带有主观性的推理得到的

Analysis & Summary

kAGNET 可以看做是 knowledge-augmented Relation Network (RN) module

1.Philip N Johnson-Laird. 1980. Mental models in cognitive science. Cognitive science, 4(1):71–115. ↩
2.Drew A. Hudson and Christopher D. Manning. 2018. Compositional attention networks for machine reasoning. In Proc. of ICLR. ↩