The Effect of Embeddings on SQuAD v2.0
Published in pre-print, 2018
Recommended citation: Perez, Luis. “The Effect of Embeddings on SQuAD v 2 . 0.” (2019). https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/reports/default/15709657.pdf
In this paper, we explore several extensions of BERT for the Stanford Question and Answer v2. 0 task in NLP, and summarize our analyzes of these results. In particular, we explore the benefits of using a fined-tuned BERT model for word-embdeddings underlying different, more complex question-and-answer architectures. We continue by exploring different more traditional QA architectures, such as a BiDAF and a few other customized models. Additionally, we experiment with replacing the BERT embeddings with GPT-2 [6] embeddings. We find that model complexity provides smaller benefit to performance, compared to improved embeddings. We also find that fine-tuning for longer epochs leads to higher performance. We conclude by submitting our single-best performing model achieving a test F1 75.449, EM 72.206 on never-before seen held-out test-set.