Recent Advances on Multi-Hop RC - WikiHop

WikiHop in QAngaroo benchmark is a dataset for Multi-hop Reading Comprehension Across Documents.
Style: Multiple-Choice

Reference papers on WikiHop task:

  1. Neural models for reasoning over multiple mentions using coreference. NAACL,2018.
  2. Exploring graph-structured passage representation for multihop reading comprehension with graph neural networks. 2018.
  3. Exploiting explicit paths for multihop reading comprehension. 2018.
  4. BAG: Bi-directional Attention Entity Graph Convolutional Network for Multi-hop Reasoning Question Answering. NAACL,2019.
  5. Question Answering by Reasoning Across Documents with Graph Convolutional Networks. NAACL,2019.
  6. Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs. ACL,2019.
  7. Coarse-Grain Fine-Grain Coattention Network for Multi-Evidence Question Answering. ICLR,2019.
  8. Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension. ACL,2019.

Task Definition

Input: $< q, P, C_q >$

  • query: $q$ in the form of triple without tail entity $< h_e, r, ? >$
  • a set of supporting documents: $P = \{P_1, …, P_M\}$
  • a set of candidate answers (all of which are entities mentioned in $P$): $C_q = \{C_1,…,C_N\}$

Goal:

  • select $a^{\star} \in C_q$, which is the entity that correctly answers the question
  • need to aggregate information from multiple evidences across documents

Coref-GRU

Neural models for reasoning over multiple mentions using coreference.
NAACL,2018.
Bhuwan Dhingra. (William W. Cohen)
CMU.

1.Motivation

  • existing RNN layer are biased towards short-term dependencies

This work:

  • adapt a standard RNN layer by introducing a bias towards coreferent recency

2.Model Details

  • coreference relationships between words (Directed acyclic graph(DAG) style graph)
  • introduce a term in the update equations for GRU which depends on the hidden state of the coreferent antecedent of the current token/word
    • hidden states are propagated along coreference chains and the original sequence in parallel

MHQA-GCN/GRN

Exploring graph-structured passage representation for multihop reading comprehension with graph neural networks.
2018.
Linfeng Song. (Yue Zhang)
University of Rochester. (Westlake University)

1.Motivation

  • local coreference information is limited in providing information for rich inference and global evidence

This work:

  • form more complex graphs to better connecting global evidence
  • considering two more types of edges in addition to coreference
    • same entity mentions (cross-document)
    • window-typed (within-document)
      • two mentions of different entities within a context window

2.Model Details

  • Encoding: evidence integration with graph network
    • graph recurren network
    • graph convolutional network
  • Prediction: summing the probabilities over all occurrences of the same entity mention
    • $e^k$ is the representation of entity mention of $\epsilon_k$

Path-based(Kundu.2018.)

Exploiting Explicit Paths for Multi-hop Reading Comprehension
2018

1.Motivation

  • graph-based models only implicitly combine knowledge from all the passages
    • unable to provide explicit reasoning paths for the selected answer.

This work

  • present a path-based reasoning approach for textual reading comprehension
    • generating potential paths across multiple passages
    • extracting implicit relations along this path
    • composing relations to encode each path

2.Model Details

  • Path Define:
    • only consider two-hop path: eg: $path_{kj}=h_e \rightarrow e_1 \rightarrow c_k$
    • $e_1$: intermediate entity and can be extended to multi-hop
    • an extracted path is a set of entity sequences
  • Path Extraction: for each candidate
    • step #1: find a passage $P_1$ contains $h_e$ of the query
    • step #2: find intermediate entities: all Named Entities and Noun Phrases that appear in the same sentence with $h_e$ or in the subsequent sentence
    • step #3: find another passage $P_2$ containes any of the intermediate entities found in step#3
      • for distinguishing the $e_1$ in different passage,use $e_1^{\prime}$ to stand for the same mention in the second passage
    • step #4: check if passage $P_2$ contains any of the candidate answer choices
  • Path Encoding:
    • context-based path encoding
      • use the concatenation of the boundary vectors of the passage encoding as the location encoding vector of entity
        • $g_{e} = [s_{p_1,i_1};s_{p_1,i_2}]$
      • extract implicit relation with a feed forward layer
        • $r_{h_e,e_1}=FFL(g_{h_e}, g_{e_1})$
        • as well as $r_{e_1^{\prime}, c_k}$
      • compose implicit relation vector with a feed forward layer
        • $x_{ctx}=FFL(r_{h_e,e_1}, r_{e_1^{\prime},c_k})$
      • feed forward layer $FFL(a,b)=tanh(aW_a + bW_b + bias)$
    • passage-based path encoding
      • question-weighted passage representation
        • query-aware passage representation: $S_p^1$ and $S_p^2$
      • aggregate passage representation: get single passage vector
        • self-attention
        • $\tilde{s}_{p_1}$ and $\tilde{s}_{p_2}$
      • $x_{psg} = FFL(\tilde{s}_{p_1}, \tilde{s}_{p_2})$
  • Path Scorer:
    • context-based path scoring
      • $\tilde{q}=([q_0;q_L])W_q$
      • $y_{ctx,q}=FFL(x_{ctx},\tilde{q})$
      • $z_{ctx}=y_{ctx,q}W_{ctx}^T$
    • passage-based path scoring
      • self attention get single candidate answer choice vector $\tilde{c}_k$
      • $z_{psg}=\tilde{c}_k x_{psg}^T$
    • unormalized score $z = z_{ctx} + z_{psg}$
      • softmax over all the paths and candidates get $score(path_{kj})$
  • Prediction
    • $prob(c_k)=\sum_j score(path_{kj})$

BAG

BAG: Bi-directional Attention Entity Graph Convolutional Network for Multi-hop Reasoning Question Answering.
NAACL,2019.

1.Motivation

  • comprehend the relationships of entities across documents before answering questions

2.Model Details

  • Graph Construction
    • node: all mentions of candidates
    • edge: undirected
      • 1)cross-document edge
      • 2)within-document edge
  • Multi-Level Features:
    • input node embedding
    • concatenation of GLoVe/ELMo(+linear)/NER/POS
  • Query Encoding: BiLSTM + linear
  • Relational Graph Convolutional Network, same with Entity-GCN.
  • Bi-directional Attention Between a Graph and a Query
    • similarity matrix $S = pooling_{mean}(f_a ([h_n; f_q; h_n \circ f_q] ))$
      • $f_q$: query representation
      • $h_n$: all node representation
    • directional computation is the same with BiDAF
  • Prediction: the probability of each node becoming answer.
    • the probability of each candidate is the sum of all corresponding nodes.

Entity-GCN

Question Answering by Reasoning Across Documents with Graph Convolutional Networks.
NAACL,2019.

1.Motivation

This work

  • frame question answering as an inference problem on a graph representing the document collection.

2.Model Details

  • Graph Construct
    • node: mentions of candidate choices and head entity in the query
    • edge:
      • co-occurrence in the same document
      • mentions that exactly match across document
  • Encoding
    • ELMo: a concatenation of three 1024-dimensional vectors resulting in 3072-dimensional input vectors
  • Graph Encoding: Relational GCN to model message passing process
    • at layer $l$:
    • aggregation: aggregate information from neighbors of each node
      • $N_i$ is the set of indices of nodes neighbouring $i$-th node
      • $R_{ij}$ is the set of edge annotations between $i$ and $j$
    • combination
    • updating: how much of the update message propagates to the next step
  • Prediction
    • $Prob(c|q,C_q,P) \varpropto exp(max_{i\in M_c} f_o( [q, h_i^L] ) )$
      • $M_c$ is the set of node indicate that $i \in M_c$ only if node $i$ is a mention of candidate choice $c$

Heterogeneous Document-Entity

Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs.
ACL,2019.

1.Motivation

  • mainly compared with Entity-GCN
  • contains different granularity levels of information including candidates, documents and entities in specific document contexts

This work

  • design the Heterogeneous graph contains
    • three kinds of nodes
    • seven types of edges
  • include nodes corresponding to candidates, documents and entities.

2.Model Details

  • Graph Construction
    • node:
      • 1) candidate entity nodes
      • 2) entity nodes extracted from documents
      • 3) document nodes
    • edge
      • between (doc, entity)
        • 1) if the candidate appear in the document at least one time
      • between (doc, entity)
        • 2) if the entity is extracted from the document
      • between (entity, candidate)
        • 3) if the entity is a mention of the candidate
      • between (entity, entity)
        • 4) if they are extracted from the same document
        • 5) if they are mentions of the same candidate or query subject and they are extracted from different documents
        • 7) entity nodes that do not meet previous conditions are connected
      • between (cnadidate, candidate)
        • 6) all candidate nodes connect with each other
  • Graph Encoding
    • Relational GCN, the same with Entity-GCN
  • Prediction
    • $a = f_C(H^C) + ACC_{max}(f_E(H^E)$
    • $H^C$ : node representations of all candidate nodes
    • $H^E$: node representations of all entity nodes that correspond to candidates
    • $ACC_{max}$: max pooling of entites belong to the same candidate
    • $f(\cdot)$: two-layers MLP with tanh

Summary

Models UnMask
Dev
UnMaks
Test
Mask
Dev
Maks
Test
BiDAF 49.7 42.9 59.8 -
Coref-GRU 56.0 59.3 - -
MHQA-GCN 62.6 - - -
MHQA-GRN 62.8 65.4 - -
Entity-GCN 64.8 67.6 - -
CFC 66.4 70.6 - -
Kundu.2018 67.1 - - -
BAG 66.5 69 70.9 68.9
**** END of This Post. Thank for Your READING ****
If you have any Question, welcome to Email me or leave your comments below.