WikiHop in QAngaroo benchmark is a dataset for Multi-hop Reading Comprehension Across Documents.
Style: Multiple-Choice
Reference papers on WikiHop task:
- Neural models for reasoning over multiple mentions using coreference. NAACL,2018.
- Exploring graph-structured passage representation for multihop reading comprehension with graph neural networks. 2018.
- Exploiting explicit paths for multihop reading comprehension. 2018.
- BAG: Bi-directional Attention Entity Graph Convolutional Network for Multi-hop Reasoning Question Answering. NAACL,2019.
- Question Answering by Reasoning Across Documents with Graph Convolutional Networks. NAACL,2019.
- Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs. ACL,2019.
- Coarse-Grain Fine-Grain Coattention Network for Multi-Evidence Question Answering. ICLR,2019.
- Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension. ACL,2019.
Task Definition
Input: $< q, P, C_q >$
- query: $q$ in the form of triple without tail entity $< h_e, r, ? >$
- a set of supporting documents: $P = \{P_1, …, P_M\}$
- a set of candidate answers (all of which are entities mentioned in $P$): $C_q = \{C_1,…,C_N\}$
Goal:
- select $a^{\star} \in C_q$, which is the entity that correctly answers the question
- need to aggregate information from multiple evidences across documents
Coref-GRU
Neural models for reasoning over multiple mentions using coreference.
NAACL,2018.
Bhuwan Dhingra. (William W. Cohen)
CMU.
1.Motivation
- existing RNN layer are biased towards short-term dependencies
This work:
- adapt a standard RNN layer by introducing a bias towards coreferent recency
2.Model Details
- coreference relationships between words (Directed acyclic graph(DAG) style graph)
- introduce a term in the update equations for GRU which depends on the hidden state of the coreferent antecedent of the current token/word
- hidden states are propagated along coreference chains and the original sequence in parallel
MHQA-GCN/GRN
Exploring graph-structured passage representation for multihop reading comprehension with graph neural networks.
2018.
Linfeng Song. (Yue Zhang)
University of Rochester. (Westlake University)
1.Motivation
- local coreference information is limited in providing information for rich inference and global evidence
This work:
- form more complex graphs to better connecting global evidence
- considering two more types of edges in addition to coreference
- same entity mentions (cross-document)
- window-typed (within-document)
- two mentions of different entities within a context window
2.Model Details
- Encoding: evidence integration with graph network
- graph recurren network
- graph convolutional network
- Prediction: summing the probabilities over all occurrences of the same entity mention
- $e^k$ is the representation of entity mention of $\epsilon_k$
Path-based(Kundu.2018.)
Exploiting Explicit Paths for Multi-hop Reading Comprehension
2018
1.Motivation
- graph-based models only implicitly combine knowledge from all the passages
- unable to provide explicit reasoning paths for the selected answer.
This work
- present a path-based reasoning approach for textual reading comprehension
- generating potential paths across multiple passages
- extracting implicit relations along this path
- composing relations to encode each path
2.Model Details
- Path Define:
- only consider two-hop path: eg: $path_{kj}=h_e \rightarrow e_1 \rightarrow c_k$
- $e_1$: intermediate entity and can be extended to multi-hop
- an extracted path is a set of entity sequences
- Path Extraction: for each candidate
- step #1: find a passage $P_1$ contains $h_e$ of the query
- step #2: find intermediate entities: all Named Entities and Noun Phrases that appear in the same sentence with $h_e$ or in the subsequent sentence
- step #3: find another passage $P_2$ containes any of the intermediate entities found in step#3
- for distinguishing the $e_1$ in different passage,use $e_1^{\prime}$ to stand for the same mention in the second passage
- step #4: check if passage $P_2$ contains any of the candidate answer choices
- Path Encoding:
- context-based path encoding
- use the concatenation of the boundary vectors of the passage encoding as the location encoding vector of entity
- $g_{e} = [s_{p_1,i_1};s_{p_1,i_2}]$
- extract implicit relation with a feed forward layer
- $r_{h_e,e_1}=FFL(g_{h_e}, g_{e_1})$
- as well as $r_{e_1^{\prime}, c_k}$
- compose implicit relation vector with a feed forward layer
- $x_{ctx}=FFL(r_{h_e,e_1}, r_{e_1^{\prime},c_k})$
- feed forward layer $FFL(a,b)=tanh(aW_a + bW_b + bias)$
- use the concatenation of the boundary vectors of the passage encoding as the location encoding vector of entity
- passage-based path encoding
- question-weighted passage representation
- query-aware passage representation: $S_p^1$ and $S_p^2$
- aggregate passage representation: get single passage vector
- self-attention
- $\tilde{s}_{p_1}$ and $\tilde{s}_{p_2}$
- $x_{psg} = FFL(\tilde{s}_{p_1}, \tilde{s}_{p_2})$
- question-weighted passage representation
- context-based path encoding
- Path Scorer:
- context-based path scoring
- $\tilde{q}=([q_0;q_L])W_q$
- $y_{ctx,q}=FFL(x_{ctx},\tilde{q})$
- $z_{ctx}=y_{ctx,q}W_{ctx}^T$
- passage-based path scoring
- self attention get single candidate answer choice vector $\tilde{c}_k$
- $z_{psg}=\tilde{c}_k x_{psg}^T$
- unormalized score $z = z_{ctx} + z_{psg}$
- softmax over all the paths and candidates get $score(path_{kj})$
- context-based path scoring
- Prediction
- $prob(c_k)=\sum_j score(path_{kj})$
BAG
BAG: Bi-directional Attention Entity Graph Convolutional Network for Multi-hop Reasoning Question Answering.
NAACL,2019.
1.Motivation
- comprehend the relationships of entities across documents before answering questions
2.Model Details
- Graph Construction
- node: all mentions of candidates
- edge: undirected
- 1)cross-document edge
- 2)within-document edge
- Multi-Level Features:
- input node embedding
- concatenation of GLoVe/ELMo(+linear)/NER/POS
- Query Encoding: BiLSTM + linear
- Relational Graph Convolutional Network, same with Entity-GCN.
- Bi-directional Attention Between a Graph and a Query
- similarity matrix $S = pooling_{mean}(f_a ([h_n; f_q; h_n \circ f_q] ))$
- $f_q$: query representation
- $h_n$: all node representation
- directional computation is the same with BiDAF
- similarity matrix $S = pooling_{mean}(f_a ([h_n; f_q; h_n \circ f_q] ))$
- Prediction: the probability of each node becoming answer.
- the probability of each candidate is the sum of all corresponding nodes.
Entity-GCN
Question Answering by Reasoning Across Documents with Graph Convolutional Networks.
NAACL,2019.
1.Motivation
This work
- frame question answering as an inference problem on a graph representing the document collection.
2.Model Details
- Graph Construct
- node: mentions of candidate choices and head entity in the query
- edge:
- co-occurrence in the same document
- mentions that exactly match across document
- Encoding
- ELMo: a concatenation of three 1024-dimensional vectors resulting in 3072-dimensional input vectors
- Graph Encoding: Relational GCN to model message passing process
- at layer $l$:
- aggregation: aggregate information from neighbors of each node
- $N_i$ is the set of indices of nodes neighbouring $i$-th node
- $R_{ij}$ is the set of edge annotations between $i$ and $j$
- combination
- updating: how much of the update message propagates to the next step
- Prediction
- $Prob(c|q,C_q,P) \varpropto exp(max_{i\in M_c} f_o( [q, h_i^L] ) )$
- $M_c$ is the set of node indicate that $i \in M_c$ only if node $i$ is a mention of candidate choice $c$
- $Prob(c|q,C_q,P) \varpropto exp(max_{i\in M_c} f_o( [q, h_i^L] ) )$
Heterogeneous Document-Entity
Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs.
ACL,2019.
1.Motivation
- mainly compared with Entity-GCN
- contains different granularity levels of information including candidates, documents and entities in specific document contexts
This work
- design the Heterogeneous graph contains
- three kinds of nodes
- seven types of edges
- include nodes corresponding to candidates, documents and entities.
2.Model Details
- Graph Construction
- node:
- 1) candidate entity nodes
- 2) entity nodes extracted from documents
- 3) document nodes
- edge
- between (doc, entity)
- 1) if the candidate appear in the document at least one time
- between (doc, entity)
- 2) if the entity is extracted from the document
- between (entity, candidate)
- 3) if the entity is a mention of the candidate
- between (entity, entity)
- 4) if they are extracted from the same document
- 5) if they are mentions of the same candidate or query subject and they are extracted from different documents
- 7) entity nodes that do not meet previous conditions are connected
- between (cnadidate, candidate)
- 6) all candidate nodes connect with each other
- between (doc, entity)
- node:
- Graph Encoding
- Relational GCN, the same with Entity-GCN
- Prediction
- $a = f_C(H^C) + ACC_{max}(f_E(H^E)$
- $H^C$ : node representations of all candidate nodes
- $H^E$: node representations of all entity nodes that correspond to candidates
- $ACC_{max}$: max pooling of entites belong to the same candidate
- $f(\cdot)$: two-layers MLP with tanh
Summary
- related works can be categorized as follows:
- graph-based
- coreference
- co-occurrence
- heterogeneous
- path-based
- neural network based
- graph-based
- official leaderboard: http://qangaroo.cs.ucl.ac.uk/leaderboard.html
| Models | UnMask Dev |
UnMaks Test |
Mask Dev |
Maks Test |
|---|---|---|---|---|
| BiDAF | 49.7 | 42.9 | 59.8 | - |
| Coref-GRU | 56.0 | 59.3 | - | - |
| MHQA-GCN | 62.6 | - | - | - |
| MHQA-GRN | 62.8 | 65.4 | - | - |
| Entity-GCN | 64.8 | 67.6 | - | - |
| CFC | 66.4 | 70.6 | - | - |
| Kundu.2018 | 67.1 | - | - | - |
| BAG | 66.5 | 69 | 70.9 | 68.9 |