Seeing the Unseen? Opportunities and Challenges in Understanding Tourist Behaviors from UGC using Large Language Models (LLMs)

Authors

  • Fondina Gusriza Postgraduate Tourism Studies, Gadjah Mada University, Indonesia
  • Tri Kuntoro Priyambodo Department of Computer Science and Electronics, Universitas Gadjah Mada, Indonesia https://orcid.org/0000-0003-1906-7224
  • Khabib Mustofa Department of Computer Science and Electronics, Gadjah Mada University, Indonesia
  • Dyah Mutiarin Department of of Government Affairs and Administration, Muhammadiyah University of Yogyakarta, Indonesia

Keywords:

Opportunities Challenges, Tourist Behaviours, User-Generated Content (UGC), Large Language Models (LLMs), Big Data

Abstract

The era of digital tourism has resulted in various of User-Generated Content (UGC), such as online reviews, social media, and travel blogs. UGC is an important source of data for understanding tourist behaviours but often overlooked. This study explores the opportunities and challenges of using Large Language Models (LLMs) as a new approach to analyse UGC. The data were collected from TripAdvisor, focusing on 15 cultural heritage sites in Yogyakarta, Indonesia, with more than 13,000 user reviews. OpenAI’s Large Language Model was applied through API integration in Python, and a Natural Language Processing (NLP) approach was used to analyse tourist motivations, experiences, and satisfaction. The analytical process includes web scraping, text pre-processing, theory-driven classification, prompt engineering, and lexicon mapping. This paper contributes to the emerging discourse on the use of Artificial Intelligence in tourism studies, particularly in understanding tourist behaviours through textual data.

References

Albtsoh, L., & Omar, M. (2025). Textguard: Identifying and neutralizing adversarial threats in textual data. International Journal of Informatics, Information System and Computer Engineering (INJIISCOM), 6(2), 212-224.

Chen, J., Shoval, N., & Stantic, B. (2024). Tracking tourist mobility in the big data era: Insights from data, theory, and future directions. Tourism Geographies, 26(8), 1381-1411.

Fuchs, M., Höpken, W., & Lexhagen, M. (2014). Big data analytics for knowledge generation in tourism destinations–A case from Sweden. Journal of destination marketing & management, 3(4), 198-209.

Girardin, F., Calabrese, F., Dal Fiore, F., Ratti, C., & Blat, J. (2008). Digital footprinting: Uncovering tourists with user-generated content. IEEE Pervasive computing, 7(4), 36-43.

Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online social research. Annual review of sociology, 40(1), 129-152.

Kang, Y., Cai, Z., Tan, C. W., Huang, Q., & Liu, H. (2020). Natural language processing (NLP) in management research: A literature review. Journal of Management Analytics, 7(2), 139-172.

Miah, S. J., Vu, H., & Gammack, J. (2019). A big-data analytics method for capturing visitor activities and flows: The case of an island country. Information Technology and Management, 20(4), 203-221.

Mou, N., Wang, Y., Zheng, Y., Zhang, L., Makkonen, T., Jiang, Q., & Yang, T. (2023). Understanding tourists' travel behavior before, during, and after the trip with data from social media platforms. Transactions in GIS, 27(4), 1043-1067.

Salas-Olmedo, M. H., Moya-Gómez, B., García-Palomares, J. C., & Gutiérrez, J. (2018). Tourists' digital footprint in cities: Comparing Big Data sources. Tourism Management, 66, 13-25.

Santos, M. L. B. D. (2022). The “so-called” UGC: an updated definition of user-generated content in the age of social media. Online Information Review, 46(1), 95-113.

Schoedel, R., Au, Q., Völkel, S. T., Lehmann, F., Becker, D., Bühner, M., ... & Stachl, C. (2019). Digital footprints of sensation seeking. Zeitschrift für Psychologie.

Sivarajkumar, S., Kelley, M., Samolyk-Mazzanti, A., Visweswaran, S., & Wang, Y. (2024). An empirical evaluation of prompting strategies for large language models in zero-shot clinical natural language processing: algorithm development and validation study. JMIR Medical Informatics, 12, e55318.

Sondakh, D. E., Kom, S., Taju, S. W., Yuune, J. K. T., Kaminang, A. F., & Wagey, S. G. (2024). Research Project Topic Recommender System Using Generative Language Model. CogITo Smart Journal, 10(1), 233-245.

TripAdvisor. (2023). Tripadvisor’s Review Transparency Report Highlights Fight Against Fake Reviews and Strong Growth in User-Generated Content. Tripadvisor.Mediaroom.Com. https://tripadvisor.mediaroom.com/review-transparency-report-2023-ANZ

Wang, L., Guo, Z., Zhang, G. Y., & Xu, X. A. (2022). Effective destination user-generated advertising: Matching effect between goal framing and self-esteem. Tourism Management, 92, 104557.

Weaver, A. (2021). Tourism, big data, and a crisis of analysis. Annals of Tourism Research, 88, 103158.

Zhang, Z., Li, C., & Zhang, H. (2026). Digital confusion: Comprehending the impact mechanisms of artificial intelligence-generated content and user-generated content on tourism decision making. Tourism Management, 112, 105269.

Zheng, S., Zhang, Y., Zhu, Y., Xi, C., Gao, P., Xun, Z., & Chang, K. (2024). Gpt-fathom: Benchmarking large language models to decipher the evolutionary path towards gpt-4 and beyond. In Findings of the Association for Computational Linguistics: NAACL 2024 (pp. 1363-1382)

Downloads

Published

2026-02-27

How to Cite

[1]
F. Gusriza, T. K. Priyambodo, K. Mustofa, and D. Mutiarin, “Seeing the Unseen? Opportunities and Challenges in Understanding Tourist Behaviors from UGC using Large Language Models (LLMs)”, Int. J. Inform. Inf. Sys. and Comp. Eng., vol. 8, no. 1, pp. 63–74, Feb. 2026, Accessed: Jun. 06, 2026. [Online]. Available: https://ojs.unikom.ac.id/index.php/injiiscom/article/view/19447