Experimental Evaluation of CLIP-Based Zero-Shot Classification of Imbalanced Remote Sensing Scenes: Addressing Quantity Disparities in Data

Authors

  • Tanvir Ahmed Nanjing University of Information Science and Technology
  • Asfika Jaman Tanha Nanjing University of Information Science and Technology, Nanjing, China
  • Shekh Ifteesham Iftee Nanjing University of Information Science and Technology, Nanjing, China
  • Tanjoy Mahmud
  • Ekra MD Emadur Rahman
  • Hossain MD Maruf

DOI:

https://doi.org/10.34010/injiiscom.v6i1.14164

Keywords:

Remote Sensing Scene Classification (RSSC), Contrastive Language Image Pretraining (CLIP), Zero-Shot Learning (ZSL)

Abstract

This paper presents a zero-shot learning framework based on Contrastive Language Image Pretraining (CLIP) for Remote Sensing Scene Classification (RSSC). The proposed method addresses the challenge of imbalanced image quantities across different categories, which is often encountered in practical ap-plications. Traditional zero-shot learning methods in RSSC leverage pre-trained word embeddings to extract semantic features from category names or descriptions, which are then fixed during the learning process without adaptation to visual features. This leads to a gap between visual and semantic representations. We have integrated the Vision Transformer with CLIP to enhance the alignment between visual and semantic features. Extensive experiments conducted on WHU-RS19 dataset demonstrate the effectiveness of the proposed framework, show-casing improved classification performance and generalization capabilities.

References

Chaib, S., H. Liu, Y. Gu, and H. Yao. 2017. 'Deep Feature Fusion for VHR Remote Sensing Scene Classification', IEEE Transactions on Geoscience and Remote Sensing, 55: 4775-84.

Chen, Feihao, and Jin Yeu Tsou. 2021. 'DRSNet: Novel architecture for small patch and low-resolution remote sensing image scene classification', International Journal of Applied Earth Observation and Geoinformation, 104: 102577.

Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. 'An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale', ArXiv, abs/2010.11929.

Gawlikowski, J., S. Saha, A. Kruspe, and X. X. Zhu. 2022. 'An Advanced Dirichlet Prior Network for Out-of-Distribution Detection in Remote Sensing', IEEE Transactions on Geoscience and Remote Sensing, 60: 1-19.

Jia, Chao, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. 2021. "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision." In Proceedings of the 38th International Conference on Machine Learning, edited by Meila Marina and Zhang Tong, 4904--16. Proceedings of Machine Learning Research: PMLR.

Jin, Jianhui, Wujie Zhou, Lv Ye, Jingsheng Lei, Lu Yu, Xiaohong Qian, and Ting Luo. 2022. 'DASFNet: Dense-Attention–Similarity-Fusion Network for scene classification of dual-modal remote-sensing images', International Journal of Applied Earth Observation and Geoinformation, 115: 103087.

Li, A., Z. Lu, L. Wang, T. Xiang, and J. R. Wen. 2017. 'Zero-Shot Scene Classification for High Spatial Resolution Remote Sensing Images', IEEE Transactions on Geoscience and Remote Sensing, 55: 4157-67.

Li, Y., Z. Zhu, J. G. Yu, and Y. Zhang. 2021. 'Learning Deep Cross-Modal Embedding Networks for Zero-Shot Remote Sensing Image Scene Classification', IEEE Transactions on Geoscience and Remote Sensing, 59: 10590-603.

Li, Zihao, Daobing Zhang, Yang Wang, Daoyu Lin, and Jinghua Zhang. 2022a. "Generative Adversarial Networks for Zero-Shot Remote Sensing Scene Classification." In Applied Sciences.

———. 2022b. 'Generative Adversarial Networks for Zero-Shot Remote Sensing Scene Classification', Applied Sciences, 12: 3760.

Ma, Suqiang, Chun Liu, Zheng Li, and Wei Yang. 2022. "Integrating Adversarial Generative Network with Variational Autoencoders towards Cross-Modal Alignment for Zero-Shot Remote Sensing Image Scene Classification." In Remote Sensing.

Penatti, O. A. B., K. Nogueira, and J. A. dos Santos. 2015. "Do deep features generalize from everyday objects to remote sensing and aerial scenes domains?" In 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 44-51.

Quan, J., C. Wu, H. Wang, and Z. Wang. 2018. "Structural Alignment based Zero-shot Classification for Remote Sensing Scenes." In 2018 IEEE International Conference on Electronics and Communication Engineering (ICECE), 17-21.

Radford, Alec, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. "Learning Transferable Visual Models From Natural Language Supervision." In Proceedings of the 38th International Conference on Machine Learning, edited by Meila Marina and Zhang Tong, 8748--63. Proceedings of Machine Learning Research: PMLR.

Romera-Paredes, Bernardino, and Philip H. S. Torr. 2017. 'An Embarrassingly Simple Approach to Zero-Shot Learning.' in Rogerio Schmidt Feris, Christoph Lampert and Devi Parikh (eds.), Visual Attributes (Springer International Publishing: Cham).

Wang, C., J. Li, A. Tanvir, J. Yang, T. Xie, L. Ji, and T. Zhang. 2024. 'Zero-Shot Remote Sensing Scene Classification Method Based on Local-Global Feature Fusion and Weight Mapping Loss', IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17: 2763-76.

Wang, C., G. Peng, and B. De Baets. 2021. 'A Distance-Constrained Semantic Autoencoder for Zero-Shot Remote Sensing Scene Classification', IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14: 12545-56.

Xian, Y., Z. Akata, G. Sharma, Q. Nguyen, M. Hein, and B. Schiele. 2016. "Latent Embeddings for Zero-Shot Classification." In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 69-77.

Zhu, Junjie, Ke Yang, Naiyang Guan, Xiaodong Yi, and Chunping Qiu. 2023. 'HCPNet: Learning discriminative prototypes for few-shot remote sensing image scene classification', International Journal of Applied Earth Observation and Geoinformation, 123: 103447.

Zhu, Qiqi, Yang Lei, Xiongli Sun, Qingfeng Guan, Yanfei Zhong, Liangpei Zhang, and Deren Li. 2022. 'Knowledge-guided land pattern depiction for urban land use mapping: A case study of Chinese cities', Remote Sensing of Environment, 272: 112916.

Downloads

Published

2024-12-17

How to Cite

[1]
“Experimental Evaluation of CLIP-Based Zero-Shot Classification of Imbalanced Remote Sensing Scenes: Addressing Quantity Disparities in Data”, Int. J. Inform. Inf. Sys. and Comp. Eng., vol. 6, no. 1, pp. 130–143, Dec. 2024, doi: 10.34010/injiiscom.v6i1.14164.