Text classification is an emerging topic in the field of text data mining, but the current methods of deducing sentence polarity have two major shortcomings: on the one hand, there is currently a lack of a large and well-curated corpus; on the other hand, current solutions based on deep learning are particularly vulnerable to attacks from adversarial samples. To overcome the limitations above, we propose an adversarial training method HNN-GRAT (Hierarchical Neural Network and Gradient Reversal) for text classification. Firstly, A Robustly Optimized BERT Pretraining Approach (RoBERTa) pretraining model is used to extract text features and feature gradient information; secondly, the original gradient information is passed through the gradient reversal layer designed to obtain the inverted gradient information; finally, the original gradient information and the inverted gradient information are fused to obtain the new gradient of the model. HNN-GRAT method are tested on three real datasets and five attack methods, compared with RoBERTa pretraining model, HNN-GRAT improves the robustness accuracy and reduces the probability of the model being attacked. In addition, using six text defense methods, HNN-GRAT achieves the best Boa and Succ (Such as, DeepWordBug attack, for AGNEWS, IMDB, SST-2 datasets, with an improvement of Boa up to 41.50%, 67.50%, 28.15% and Succ drop to 55.90%, 27.45%, 69.89%, respectively).

An adversarial training method for text classification

Fiumara G.;De Meo P.
2023-01-01

Abstract

Text classification is an emerging topic in the field of text data mining, but the current methods of deducing sentence polarity have two major shortcomings: on the one hand, there is currently a lack of a large and well-curated corpus; on the other hand, current solutions based on deep learning are particularly vulnerable to attacks from adversarial samples. To overcome the limitations above, we propose an adversarial training method HNN-GRAT (Hierarchical Neural Network and Gradient Reversal) for text classification. Firstly, A Robustly Optimized BERT Pretraining Approach (RoBERTa) pretraining model is used to extract text features and feature gradient information; secondly, the original gradient information is passed through the gradient reversal layer designed to obtain the inverted gradient information; finally, the original gradient information and the inverted gradient information are fused to obtain the new gradient of the model. HNN-GRAT method are tested on three real datasets and five attack methods, compared with RoBERTa pretraining model, HNN-GRAT improves the robustness accuracy and reduces the probability of the model being attacked. In addition, using six text defense methods, HNN-GRAT achieves the best Boa and Succ (Such as, DeepWordBug attack, for AGNEWS, IMDB, SST-2 datasets, with an improvement of Boa up to 41.50%, 67.50%, 28.15% and Succ drop to 55.90%, 27.45%, 69.89%, respectively).
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11570/3275269
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact