Compositional De-Attention Networks

Yi Tay, Luu Anh Tuan, Aston Zhang, Shuohang Wang, Siu Cheung Hui

January 2019

Abstract

Attentional models are distinctly characterized by their ability to learn relative importance, i.e., assigning a different weight to input values. This paper proposes a new quasi-attention that is compositional in nature, i.e., learning whether to add, subtract, or nullify a certain vector when learning representations. This is strongly contrasted with vanilla attention, which simply re-weights input tokens. Our proposed Compositional De-Attention (CoDA) is fundamentally built upon the intuition of both similarity and dissimilarity (negative affinity) when computing affinity scores, benefiting from a greater extent of expressiveness. We evaluate CoDA on six NLP tasks, i.e., open-domain question answering, retrieval/ranking, natural language inference, machine translation, sentiment analysis, and text-to-code generation. We obtain promising experimental results, achieving state-of-the-art performance on several tasks/datasets.

Type

Conference

Publication

The Advances in Neural Information Processing Systems

Link: https://papers.nips.cc/paper_files/paper/2019/hash/16fc18d787294ad5171100e33d05d4e2-Abstract.html

NeurIPS

Compositional De-Attention Networks

Abstract

Luu Anh Tuan

Assistant Professor