WIP| Analysis of FEVER

FEVER Introduction

FEVER data: link
- 包含 185,445 条断言(claims)
FEVER shared Task(with NAACL 2018) info: details here
FEVER official baseline code: github repo

FEVER(Fact Extraction and VERification) 任务中，给定一个未经验证的断言(claim)，即一个句子，要求模型/系统从Wikipedia中找到对应的证据句(evidence)，来验证这个断言，是否可以被证实(SUPPORTED)、反驳(REFUTED)或是没有足够的信息判断(NOT ENOUGH INFO)。对于SUPPORTED和REFUTED的判断，需要给出证据句子。其中 16.82%的例子中，需要多个证据句子来进行判断，12.15%的情况下，证据句来源于多篇文档。

Data Statistics

split	SUPPORTED	REFUTED	NEI
Train	80035	29775	35639
Dev	6666	6666	6666
Test	6666	6666	6666

Baseline System

baseline 系统是 pipelined 形式，由三部分组成: 1.document retrieval, 2.sentence-level evidence selection, 3.textual entailment.

其中文本蕴含识别(recognizing textual entailment)部分采用的是谷歌的 Decomposable Attention 模型

Score Metrics

official scorer: https://github.com/sheffieldnlp/fever-scorer

判断 claim 的类型(SUPPORTED/REFUTED/NOT ENOUGH INFO)是个三分类问题，对此使用 accuracy 来评价。
对于 SUPPORTED 和 REFUTED 的类别，还需要提供证据片段，对此使用 F1 来评价。

FEVER Shared Task Top-3 Systems Solutions

The Fact Extraction and VERification (FEVER) Shared Task

Top-1: UNC-NLP

Combining Fact Extraction and Verification with Neural Semantic Matching Networks
AAAI 2019

Top-2: UCL Machine Reading Group

UCL Machine Reading Group: Four Factor Framework For Fact Finding (HexaF)

Top-3: Athene UKP TU Darmstadt

Multi-Sentence Textual Entailment for Claim Verification

Shared Task Overview

FEVER Workshop Notes

FEVER workshop info: details here