TextGuard: Identifying and Neutralizing Adversarial Threats in Textual Data
Keywords:
Text-Guard, Natural Language Processing (NLP), Local Outlier Factor (LOF), Textual DataAbstract
Adversarial attacks inside the text domain pose a serious risk to the integrity of Natural Language Processing (NLP) systems. In this study, we propose "Text-Guard," a unique approach to detect hostile instances in natural language processing, based on the Local Outlier Factor (LOF) algorithm. This paper compares TextGuard's performance against that of more traditional NLP classifiers such as LSTM, CNN, and transformer-based models, while also experimentally verifying its effectiveness on a variety of real-world datasets. TextGuard significantly surpasses earlier state-of-the-art methods like DISP and FGWS, with F1 recognition accuracy scores as high as 94.8%. This sets a new benchmark in the field as the first use of the LOF technique for adversarial example identification in the text domain
References
Asghar, N. (2016). Yelp dataset challenge: Review rating prediction. arXiv preprint arXiv:1605.05362.
Bai, M., Wang, X., Xin, J., & Wang, G. (2016). An efficient algorithm for distributed density-based outlier detection on big data. Neurocomputing, 181, 19-28.
Cheng, Z., Zou, C., & Dong, J. (2019). Outlier detection using isolation forest and local outlier factor. In Proceedings of the conference on research in adaptive and convergent systems (pp. 161-168).
Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2012). Learning hierarchical features for scene labeling. IEEE transactions on pattern analysis and machine intelligence, 35(8), 1915-1929.
Gao, J., Lanchantin, J., Soffa, M. L., & Qi, Y. (2018). Black-box generation of adversarial text sequences to evade deep learning classifiers. In 2018 IEEE Security and Privacy Workshops (SPW) (pp. 50-56). IEEE.
Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
Graves, A., & Graves, A. (2012). Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37-45.
Jin, D., Jin, Z., Zhou, J. T., & Szolovits, P. (2020). Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 05, pp. 8018-8025).
Jones, E., Jia, R., Raghunathan, A., & Liang, P. (2020). Robust encodings: A framework for combating adversarial typos. arXiv preprint arXiv:2005.01229.
Keller, Y., Mackensen, J., & Eger, S. (2021). BERT-defense: A probabilistic model based on BERT to combat cognitively inspired orthographic adversarial attacks. arXiv preprint arXiv:2106.01452.
Kingma, D. P. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Li, D., Zhang, Y., Peng, H., Chen, L., Brockett, C., Sun, M. T., & Dolan, B. (2020). Contextualized perturbation for textual adversarial attack. arXiv preprint arXiv:2009.07502.
Lozano, E., & Acufia, E. (2005). Parallel algorithms for distance-based and density-based outliers. In Fifth IEEE International Conference on Data Mining (ICDM'05) (pp. 4-pp). IEEE.
Ma, X., Jin, R., Paik, J. Y., & Chung, T. S. (2018). Large scale text classification with efficient word embedding. In Mobile and Wireless Technologies 2017: ICMWT 2017 4 (pp. 465-469). Springer Singapore.
Madry, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.
Mika, S., Schölkopf, B., Smola, A., Müller, K. R., Scholz, M., & Rätsch, G. (1998). Kernel PCA and de-noising in feature spaces. Advances in neural information processing systems, 11.
Morris, J. X., Lifland, E., Lanchantin, J., Ji, Y., & Qi, Y. (2020). Reevaluating adversarial examples in natural language. arXiv preprint arXiv:2004.14174.
Morris, J. X., Lifland, E., Yoo, J. Y., Grigsby, J., Jin, D., & Qi, Y. (2020). Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp. arXiv preprint arXiv:2005.05909.
Mozes, M., Stenetorp, P., Kleinberg, B., & Griffin, L. D. (2020). Frequency-guided word substitutions for detecting textual adversarial examples. arXiv preprint arXiv:2004.05887.
Mrkšić, N., Séaghdha, D. O., Thomson, B., Gašić, M., Rojas-Barahona, L., Su, P. H., ... & Young, S. (2016). Counter-fitting word vectors to linguistic constraints. arXiv preprint arXiv:1603.00892.
Omar, M. (2022). Machine learning for cybersecurity: Innovative deep learning solutions. Springer Nature.
Omar, M. (2023). VulDefend: A Novel Technique based on Pattern-exploiting Training for Detecting Software Vulnerabilities Using Language Models. In 2023 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT) (pp. 287-293). IEEE.
Omar, M., & Sukthankar, G. (2023). Text-defend: detecting adversarial examples using local outlier factor. In 2023 IEEE 17th international conference on semantic computing (ICSC) (pp. 118-122). IEEE.
Omar, M., Choi, S., Nyang, D., & Mohaisen, D. (2022). Robust natural language processing: Recent advances, challenges, and future directions. IEEE Access, 10, 86038-86056.
Omar, M., Jones, R., Burrell, D. N., Dawson, M., Nobles, C., Mohammed, D., & Bashir, A. K. (2023). Harnessing the power and simplicity of decision trees to detect IoT Malware. In Transformational Interventions for Business, Technology, and Healthcare (pp. 215-229). IGI Global.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.
Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).
Poth, C., Pfeiffer, J., Rücklé, A., & Gurevych, I. (2021). What to pre-train on? efficient intermediate task selection. arXiv preprint arXiv:2104.08247.
Pruthi, D., Dhingra, B., & Lipton, Z. C. (2019). Combating adversarial misspellings with robust word recognition. arXiv preprint arXiv:1905.11268.
Sakaguchi, K., Post, M., & Van Durme, B. (2017). Grammatical error correction with neural reinforcement learning. arXiv preprint arXiv:1707.00299.
Sun, G., Su, Y., Qin, C., Xu, W., Lu, X., & Ceglowski, A. (2020). Complete defense framework to protect deep neural networks against adversarial examples. Mathematical Problems in Engineering, 2020(1), 8319249.
Topal, M. O., Bas, A., & van Heerden, I. (2021). Exploring transformers in natural language generation: Gpt, bert, and xlnet. arXiv preprint arXiv:2102.08036.
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., & Madry, A. (2018). Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152.
Wang, T., Wang, X., Qin, Y., Packer, B., Li, K., Chen, J., ... & Chi, E. (2020). Cat-gen: Improving robustness in nlp models via controlled adversarial text generation. arXiv preprint arXiv:2010.02338.
Wang, W., Wang, R., Ke, J., & Wang, L. (2021). Textfirewall: Omni-defending against adversarial texts in sentiment classification. IEEE Access, 9, 27467-27475.
Wang, X., Yang, Y., Deng, Y., & He, K. (2021, May). Adversarial training with fast gradient projection method against synonym substitution based text attacks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, No. 16, pp. 13997-14005).
Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., & Jordan, M. (2019). Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning (pp. 7472-7482). PMLR.
Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. Advances in neural information processing systems, 28.
Zhou, Y., Jiang, J. Y., Chang, K. W., & Wang, W. (2019). Learning to discriminate perturbations for blocking adversarial attacks in text classification. arXiv preprint arXiv:1909.03084.