Improving Sentiment Classification using Ensemble Learning
DOI:
https://doi.org/10.34010/injiiscom.v6i2.13921Keywords:
Ensemble Learning, Logistic Regression, Random Forest, Bidirectional Long Short-Term Memory, Sentiment ClassificationAbstract
This paper presents an ensemble learning-based approach to improve sentiment classification accuracy in the IMDB movie reviews dataset. To this end, we tap three diversified models: Logistic Regression, Random Forest, and a Bidirectional Long Short-Term Memory neural network. Each one contributes its unique strengths to the ensemble, enhancing the overall performance. The text data has been processed using a statistical formula that converts the text document into a vector from the relevancy of the word with bigrams; data have been transformed to make it useful for Logistic Regression and Random Forest classifiers. The LSTM neural network is designed to capture sequential dependencies through an embedding layer followed by a bidirectional LSTM and dense layers with dropout regularization. The ensemble method then combines predictions of these models by majority voting, thus the interpretability and robustness of conventional classifiers are preserved, while advanced capabilities from neural networks are maintained. Our experiments prove that this ensemble approach does obtain an accuracy of 89.2% on the test dataset, which outperforms individual models. This study realizes some possible ways to combine traditional machine learning techniques with deep learning models in sentiment analysis tasks.