Skip to main content

SoBigData Articles

Natural Language Processing: Attention is Explanation

Sentiment Analysis is a sub-field of Natural Language Processing (NLP) that, combining tools and techniques from Linguistics and Computer Science, aims at systematically identifying, extracting, and studying emotional states and personal opinion in natural language.

Machine Learning models have recently grown exponentially in performance in this field. Specifically, Transformers models suppressed any other competitor for their superior performance, but this comes at a cost: complexity. The state of the art in this sector is the GTP-2 published by Open-AI with a whopping 1.5 Billion parameters, which is the worst nightmare for any GPU. This massive number of parameters makes these models an inscrutable black-box. If we want to use these models for real applications, we need to explain their decisions.

Fortunately, there are several solutions to transformers explanations. The most common is to produce a heatmap of the sentence which highlights which words have contributed the most to the sentiment prediction. The most relevant frameworks to create such a heatmap are LIME and INTGRAD. LIME is the oldest framework to produce insights on the model decisions. Still, it is slow and sometimes shaky. INTGRAD is an innovative model which uses path integral to produce the heatmap. These two techniques need to query the model several times to provide the explanations. For big models like transformers, this is not feasible.

Transformers models rely on a particular mechanism called attention. The attention mechanism allows looking over all the information the original sentence holds and then creates the proper output according to the context. Transformers models incorporate this by encoding each word position, so it is possible to link two very distant words. These attention scores are powerful since they relate concepts but they raise a question: Is it possible to use attention scores as explanations? This question is controversial, and some authors think that attention is not feasible as an explanation.

I think it is possible to use it. During my master thesis, I've explored such a possibility. Recently we have published an article where we explored how attention-based techniques can be exploited to extract meaningful sentiment scores with a lower computational cost than existing XAI methods.

Figure. Comparison between attention scores (orange), LIME scores (blue), and Integrated Gradients scores (green) for the sentence: “We never really feel involved with the story, as all of its ideas remain just that: abstract ideas.”

 

We inserted an additional attention layer before the classification layer of a general transformer model called BERT. Then, we used this attention layer to produce explanations. Our results are consistent with the results provided by LIME and INTGRAD. However, since we used the weights of the model, our results are produced almost instantly without any model call.

We demonstrate that attention layers could be used as an explanation method, and we think that attention should be more explored to have useful insight on the model decisions.

 

References:
 

  • (LIME) Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "" Why should i trust you?" Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016.
  • (INTGRAD) Sundararajan, Mukund, Ankur Taly, and Qiqi Yan. "Axiomatic attribution for deep networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
  • Jain, Sarthak, and Byron C. Wallace. "Attention is not explanation." arXiv preprint arXiv:1902.10186 (2019).
  • Francesco Bodria, Andrè Panisson, Alan Perotti, and Simone Piaggesi. "Explainability Methods for Natural Language Processing: Applications to Sentiment Analysis." SEBD. 2020.

 

Written by: Francesco Bodria