WIP| Analysis of FEVER

FEVER Introduction

  • FEVER data: link
    • 包含 185,445 条断言(claims)
  • FEVER shared Task(with NAACL 2018) info: details here
  • FEVER official baseline code: github repo

FEVER(Fact Extraction and VERification) 任务中,给定一个未经验证的断言(claim),即一个句子,要求模型/系统从Wikipedia中找到对应的证据句(evidence),来验证这个断言,是否可以被证实(SUPPORTED)、反驳(REFUTED)或是没有足够的信息判断(NOT ENOUGH INFO)。对于SUPPORTED和REFUTED的判断,需要给出证据句子。其中 16.82%的例子中,需要多个证据句子来进行判断,12.15%的情况下,证据句来源于多篇文档。

Data Statistics

split SUPPORTED REFUTED NEI
Train 80035 29775 35639
Dev 6666 6666 6666
Test 6666 6666 6666

Baseline System

baseline 系统是 pipelined 形式,由三部分组成: 1.document retrieval, 2.sentence-level evidence selection, 3.textual entailment.

其中文本蕴含识别(recognizing textual entailment)部分采用的是 谷歌的 Decomposable Attention 模型

Score Metrics

official scorer: https://github.com/sheffieldnlp/fever-scorer

判断 claim 的类型(SUPPORTED/REFUTED/NOT ENOUGH INFO)是个三分类问题,对此使用 accuracy 来评价。
对于 SUPPORTED 和 REFUTED 的类别,还需要提供证据片段,对此使用 F1 来评价。

FEVER Shared Task Top-3 Systems Solutions

The Fact Extraction and VERification (FEVER) Shared Task

Top-1: UNC-NLP

Combining Fact Extraction and Verification with Neural Semantic Matching Networks
AAAI 2019

Top-2: UCL Machine Reading Group

UCL Machine Reading Group: Four Factor Framework For Fact Finding (HexaF)

Top-3: Athene UKP TU Darmstadt

Multi-Sentence Textual Entailment for Claim Verification

Shared Task Overview

FEVER Workshop Notes

**** END of This Post. Thank for Your READING ****
If you have any Question, welcome to Email me or leave your comments below.