Optimization of Data Augmentation Based on Synonym Replacement in News Text Classification Using Neural Network

Main Article Content

Iffah Risma Huriah
Amelia Ismania Sita Widianingrum
Azlina
Taslim

Abstract

In the digital era, online news has become one of the primary sources of information, encompassing various categories such as politics, technology, entertainment, and business. The increasing volume of news poses challenges in organizing and categorizing information into relevant categories. This study aims to enhance the accuracy of news text classification through a data augmentation approach based on synonym replacement. The methods employed include text preprocessing for data cleaning, augmentation using synonym replacement to improve data diversity, feature representation using TF-IDF and Word2Vec, and modeling with Neural Networks. Evaluation metrics such as accuracy, precision, recall, and F1-score were used to assess performance. The results indicate that data augmentation can improve model accuracy by up to 95%, with balanced training and validation data distributions. The confusion matrix shows that most data can be correctly classified, although some errors occur in categories with similar features. This study demonstrates that synonym replacement-based data augmentation is effective in improving news text classification performance, particularly for datasets with limited training data.

Article Details

Section

Articles

How to Cite

Optimization of Data Augmentation Based on Synonym Replacement in News Text Classification Using Neural Network. (2025). Komputa : Jurnal Ilmiah Komputer Dan Informatika, 14(1), 100-107. https://doi.org/10.34010/komputa.v14i1.15339

References

[1] D. McQuail, McQuail′s Mass Communication Theory. SAGE, 2010.

[2] R. Nanda, E. Haerani, S. K. Gusti, and S. Ramadhani, “Klasifikasi Berita Menggunakan Metode Support Vector Machine,” Jurnal Nasional Komputasi dan Teknologi Informasi (JNKTI), vol. 5, no. 2, pp. 269–278, Apr. 2022, doi: 10.32672/jnkti.v5i2.4193.

[3] S. N. Azizah and T. Sutabri, "Analisa teknologi internet yang berperan sebagai media transaksi e-commerce dalam meningkatkan perkembangan ekonomi secara signifikan," Scientica: Jurnal Ilmiah Sains dan Teknologi, vol. 2, no. 5, pp. 35-40, Apr. 2024.

[4] A. Alfando and R. Hayami, “KLASIFIKASI TEKS BERITA BERBAHASA INDONESIA MENGGUNAKAN MACHINE LEARNING DAN DEEP LEARNING: STUDI LITERATUR,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 1, pp. 681–686, Mar. 2023, doi: 10.36040/jati.v7i1.6486..

[5] Guruh Wijaya, Dudi Irawan, Zainul Arifin, Hardian Oktavianto, Miftahur Rahman, and Ginanjar Abdurrahman, “STUDI KLASIFIKASI TOPIK BERITA DENGAN ALGORITMA MACHINE LEARNING,” J-ENSITEC, vol. 11, no. 01, pp. 10202–10206, Dec. 2024, doi: 10.31949/jensitec.v11i01.12037.

[6] C. N. Daiman, A. Yuniar Rahman, and F. Nudiyansyah, “KLASIFIKASI TEKS BERITA BREAKING NEWS DI MANGGARAI MENGGUNAKAN LONG SHORT TERM MEMORY (LSTM),” Jurnal Mnemonic, vol. 7, no. 2, pp. 170–174, Jun. 2024, doi: 10.36040/mnemonic.v7i2.9939.

[7] L. Bojic, N. Prodanovic, and A. D. Samala, "Maintaining Journalistic Integrity in the Digital Age: A Comprehensive NLP Framework for Evaluating Online News Content," arXiv preprint arXiv:2401.03467, 2024.

[8] I. A. Rahma and L. H. Suadaa, “Penerapan Text Augmentation untuk Mengatasi Data yang Tidak Seimbang pada Klasifikasi Teks Berbahasa Indonesia,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 10, no. 6, pp. 1329–1340, Dec. 2023, doi: 10.25126/jtiik.1067325.

[9] N. Fadilah and S. Priyanta, “Automatic Essay Scoring Using Data Augmentation in Bahasa Indonesia,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 16, no. 4, p. 401, Oct. 2022, doi: 10.22146/ijccs.76396.

[10] B. K. Iwana and S. Uchida, “An empirical survey of data augmentation for time series classification with neural networks,” PLOS ONE, vol. 16, no. 7, p. e0254841, Jul. 2021, doi: 10.1371/journal.pone.0254841.

[11] N. Nurfauziyah, R. Dwiyansaputra, S. I. Murpratiwi, and A. Aranta, "ANALISIS SENTIMEN PADA PENGGUNA APLIKASI X TERHADAP PEMILIHAN UMUM PRESIDEN 2024 MENGGUNAKAN METODE CONVOLUTIONAL NEURAL NETWORK (CNN)," JATI (Jurnal Mahasiswa Teknik Informatika), vol. 9, no. 1, pp. 635-642, 2025.

[12] I. Maulana, N. Khairunisa, and R. Mufidah, “DETEKSI BENTUK WAJAH MENGGUNAKAN CONVOLUTIONAL NEURAL NETWORK (CNN),” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 6, pp. 3348–3355, Jan. 2024, doi: 10.36040/jati.v7i6.8171.

[13] B. K. Iwana and S. Uchida, “An empirical survey of data augmentation for time series classification with neural networks,” PLOS ONE, vol. 16, no. 7, p. e0254841, Jul. 2021, doi: 10.1371/journal.pone.0254841.

[14] https://www.kaggle.com/datasets/akash14/news-category-dataset/data

[15] S. Khairunnisa, A. Adiwijaya, and S. A. Faraby, “Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19),” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 2, p. 406, Apr. 2021, doi: 10.30865/mib.v5i2.2835.

[16] B. Raharjo, Pembelajaran Mesin (Machine Learning), M. C. Wibowo, Ed., J. T. Santoso, Peny., Yayasan Prima Agus Teknik, 1st ed., ISBN: 978-6235-7342-31, Semarang, Indonesia, 2021.

[17] F. Putra, R. M. Ihsan, H. F. Tahiyat, L. Efrizoni, and R. Rahmaddeni, “Evaluasi Performa Aplikasi Gojek Melalui Klasifikasi Kata Ulasan Pengguna Dengan Metode SVM,” Techno.Com, vol. 23, no. 3, pp. 704–715, Aug. 2024, doi: 10.62411/tc.v23i3.11379.