Recent Advances on Multi-Hop RC - WikiHop

WikiHop in QAngaroo benchmark is a dataset for Multi-hop Reading Comprehension Across Documents.
Style: Multiple-Choice

Reference papers on WikiHop task:

Neural models for reasoning over multiple mentions using coreference. NAACL,2018.

Exploring graph-structured passage representation for multihop reading comprehension with graph neural networks. 2018.

Exploiting explicit paths for multihop reading comprehension. 2018.

BAG: Bi-directional Attention Entity Graph Convolutional Network for Multi-hop Reasoning Question Answering. NAACL,2019.

Question Answering by Reasoning Across Documents with Graph Convolutional Networks. NAACL,2019.

Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs. ACL,2019.

Coarse-Grain Fine-Grain Coattention Network for Multi-Evidence Question Answering. ICLR,2019.

Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension. ACL,2019.

Task Definition

Input: $< q, P, C_q >$

query: $q$ in the form of triple without tail entity $< h_e, r, ? >$
a set of supporting documents: $P = \{P_1, …, P_M\}$
a set of candidate answers (all of which are entities mentioned in $P$): $C_q = \{C_1,…,C_N\}$

Goal:

select $a^{\star} \in C_q$, which is the entity that correctly answers the question
need to aggregate information from multiple evidences across documents

Coref-GRU

Neural models for reasoning over multiple mentions using coreference.
NAACL,2018.
Bhuwan Dhingra. (William W. Cohen)
CMU.

1.Motivation

existing RNN layer are biased towards short-term dependencies

This work:

adapt a standard RNN layer by introducing a bias towards coreferent recency

2.Model Details

coreference relationships between words (Directed acyclic graph(DAG) style graph)
introduce a term in the update equations for GRU which depends on the hidden state of the coreferent antecedent of the current token/word
- hidden states are propagated along coreference chains and the original sequence in parallel

MHQA-GCN/GRN

Exploring graph-structured passage representation for multihop reading comprehension with graph neural networks.
2018.
Linfeng Song. (Yue Zhang)
University of Rochester. (Westlake University)

1.Motivation

local coreference information is limited in providing information for rich inference and global evidence

This work:

form more complex graphs to better connecting global evidence
considering two more types of edges in addition to coreference
- same entity mentions (cross-document)
- window-typed (within-document)
  - two mentions of different entities within a context window

2.Model Details

Encoding: evidence integration with graph network
- graph recurren network
- graph convolutional network
Prediction: summing the probabilities over all occurrences of the same entity mention
- $Pr_{\epsilon}=\frac{\sum_{k\in{C_q}}\alpha_k}{\sum_{k^{\prime} \in {C_q}} \alpha_{k^{\prime}}}$
- $\alpha_k = \frac{exp(e^k)}{\sum_{k^{\prime} \in C_q} exp(e^{k^{\prime}})}$
- $e^k$ is the representation of entity mention of $\epsilon_k$

Path-based(Kundu.2018.)

Exploiting Explicit Paths for Multi-hop Reading Comprehension
2018

1.Motivation

graph-based models only implicitly combine knowledge from all the passages
- unable to provide explicit reasoning paths for the selected answer.

This work

present a path-based reasoning approach for textual reading comprehension
- generating potential paths across multiple passages
- extracting implicit relations along this path
- composing relations to encode each path

2.Model Details

Path Define:
- only consider two-hop path: eg: $path_{kj}=h_e \rightarrow e_1 \rightarrow c_k$
- $e_1$: intermediate entity and can be extended to multi-hop
- an extracted path is a set of entity sequences
Path Extraction: for each candidate
- step #1: find a passage $P_1$ contains $h_e$ of the query
- step #2: find intermediate entities: all Named Entities and Noun Phrases that appear in the same sentence with $h_e$ or in the subsequent sentence
- step #3: find another passage $P_2$ containes any of the intermediate entities found in step#3
  - for distinguishing the $e_1$ in different passage，use $e_1^{\prime}$ to stand for the same mention in the second passage
- step #4: check if passage $P_2$ contains any of the candidate answer choices
Path Encoding:
- context-based path encoding
  - use the concatenation of the boundary vectors of the passage encoding as the location encoding vector of entity
    - $g_{e} = [s_{p_1,i_1};s_{p_1,i_2}]$
  - extract implicit relation with a feed forward layer
    - $r_{h_e,e_1}=FFL(g_{h_e}, g_{e_1})$
    - as well as $r_{e_1^{\prime}, c_k}$
  - compose implicit relation vector with a feed forward layer
    - $x_{ctx}=FFL(r_{h_e,e_1}, r_{e_1^{\prime},c_k})$
  - feed forward layer $FFL(a,b)=tanh(aW_a + bW_b + bias)$
- passage-based path encoding
  - question-weighted passage representation
    - query-aware passage representation: $S_p^1$ and $S_p^2$
  - aggregate passage representation: get single passage vector
    - self-attention
    - $\tilde{s}_{p_1}$ and $\tilde{s}_{p_2}$
  - $x_{psg} = FFL(\tilde{s}_{p_1}, \tilde{s}_{p_2})$
Path Scorer:
- context-based path scoring
  - $\tilde{q}=([q_0;q_L])W_q$
  - $y_{ctx,q}=FFL(x_{ctx},\tilde{q})$
  - $z_{ctx}=y_{ctx,q}W_{ctx}^T$
- passage-based path scoring
  - self attention get single candidate answer choice vector $\tilde{c}_k$
  - $z_{psg}=\tilde{c}_k x_{psg}^T$
- unormalized score $z = z_{ctx} + z_{psg}$
  - softmax over all the paths and candidates get $score(path_{kj})$
Prediction
- $prob(c_k)=\sum_j score(path_{kj})$

BAG

BAG: Bi-directional Attention Entity Graph Convolutional Network for Multi-hop Reasoning Question Answering.
NAACL,2019.

1.Motivation

comprehend the relationships of entities across documents before answering questions

2.Model Details

Graph Construction
- node: all mentions of candidates
- edge: undirected
  - 1)cross-document edge
  - 2)within-document edge
Multi-Level Features:
- input node embedding
- concatenation of GLoVe/ELMo(+linear)/NER/POS
Query Encoding: BiLSTM + linear
Relational Graph Convolutional Network, same with Entity-GCN.
Bi-directional Attention Between a Graph and a Query
- similarity matrix $S = pooling_{mean}(f_a ([h_n; f_q; h_n \circ f_q] ))$
  - $f_q$: query representation
  - $h_n$: all node representation
- directional computation is the same with BiDAF
Prediction: the probability of each node becoming answer.
- the probability of each candidate is the sum of all corresponding nodes.

Entity-GCN

Question Answering by Reasoning Across Documents with Graph Convolutional Networks.
NAACL,2019.

1.Motivation

This work

frame question answering as an inference problem on a graph representing the document collection.

2.Model Details

Graph Construct
- node: mentions of candidate choices and head entity in the query
- edge:
  - co-occurrence in the same document
  - mentions that exactly match across document
Encoding
- ELMo: a concatenation of three 1024-dimensional vectors resulting in 3072-dimensional input vectors
Graph Encoding: Relational GCN to model message passing process
- at layer $l$:
- aggregation: aggregate information from neighbors of each node
  - $z_i^l = \frac{1}{|N_i|} \sum_{j\in N_i} \sum_{r \in R_{ij}} f_r(h_j^l)$
  - $N_i$ is the set of indices of nodes neighbouring $i$-th node
  - $R_{ij}$ is the set of edge annotations between $i$ and $j$
- combination
  - $u_i^l = f_s(h_i^l) + z_i^l$
- updating: how much of the update message propagates to the next step
  - $g_i^l = sigmoid(f_g ( [z_i^l; h_i^l] ))$
  - $h_i^{l+1} = tanh(u_i^l) \odot g_i^l + h_i^l \odot (1-g_i^l)$
Prediction
- $Prob(c|q,C_q,P) \varpropto exp(max_{i\in M_c} f_o( [q, h_i^L] ) )$
  - $M_c$ is the set of node indicate that $i \in M_c$ only if node $i$ is a mention of candidate choice $c$

Heterogeneous Document-Entity

Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs.
ACL,2019.

1.Motivation

mainly compared with Entity-GCN
contains different granularity levels of information including candidates, documents and entities in speciﬁc document contexts

This work

design the Heterogeneous graph contains
- three kinds of nodes
- seven types of edges
include nodes corresponding to candidates, documents and entities.

2.Model Details

Graph Construction
- node:
  - 1) candidate entity nodes
  - 2) entity nodes extracted from documents
  - 3) document nodes
- edge
  - between (doc, entity)
    - 1) if the candidate appear in the document at least one time
  - between (doc, entity)
    - 2) if the entity is extracted from the document
  - between (entity, candidate)
    - 3) if the entity is a mention of the candidate
  - between (entity, entity)
    - 4) if they are extracted from the same document
    - 5) if they are mentions of the same candidate or query subject and they are extracted from different documents
    - 7) entity nodes that do not meet previous conditions are connected
  - between (cnadidate, candidate)
    - 6) all candidate nodes connect with each other
Graph Encoding
- Relational GCN, the same with Entity-GCN
Prediction
- $a = f_C(H^C) + ACC_{max}(f_E(H^E)$
- $H^C$ : node representations of all candidate nodes
- $H^E$: node representations of all entity nodes that correspond to candidates
- $ACC_{max}$: max pooling of entites belong to the same candidate
- $f(\cdot)$: two-layers MLP with tanh

Summary

related works can be categorized as follows:
- graph-based
  - coreference
  - co-occurrence
  - heterogeneous
- path-based
- neural network based
official leaderboard: http://qangaroo.cs.ucl.ac.uk/leaderboard.html

Models	UnMask Dev	UnMaks Test	Mask Dev	Maks Test
BiDAF	49.7	42.9	59.8	-
Coref-GRU	56.0	59.3	-	-
MHQA-GCN	62.6	-	-	-
MHQA-GRN	62.8	65.4	-	-
Entity-GCN	64.8	67.6	-	-
CFC	66.4	70.6	-	-
Kundu.2018	67.1	-	-	-
BAG	66.5	69	70.9	68.9