Research on Image Representation Learning Method Based on Self-Supervised Learning

Juanpeng Zhang1
1 Department of Electrical Engineering, Cheongju University, Cheongju, Republic of Korea
International Scientific Technical and Economic Research 2026, Vol. 4, No. 2, pp. 78-97
DOI: 10.71451/ISTAER2616
Received: 20 January 2026; Revised: 27 February 2026; Accepted: 3 April 2026; Published: 12 April 2026
Abstract

Aiming at the problems of negative sample dependence, representation degradation, and insufficient cross-scale modeling in self-supervised image representation learning, this paper proposes a self-supervised learning framework that combines multi-view consistent learning and cross-scale feature fusion. This method constructs a multi-branch collaborative structure, introduces a non-negative sample optimization strategy and a feature distribution constraint mechanism, and achieves efficient mining and stable expression of image semantic information. On the ImageNet dataset, the accuracy of linear evaluation reached 77.8%, which was 8.5% and 2.5% higher than that of SimCLR and SwAV, respectively; In downstream tasks, the target detection mAP increased by about 2.5%, and the semantic segmentation mIoU increased by about 2.5%. At the same time, the accuracy improves by 7.5% under noise disturbance, demonstrating stronger robustness. The experimental results show that this method is superior to the existing mainstream methods in terms of characterization quality, generalization ability and training stability, and has good application potential.

Keywords
Self-supervised learning Image representation learning Cross-scale feature fusion Non-negative sample learning Deep learning
References
  1. Bayoudh, K., Knani, R., Hamdaoui, F., & Mtibaa, A. (2022). A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. The Visual Computer, 38(8), 2939-2970. DOI: 10.1007/s00371-021-02166-7
  2. Mahadevkar, S. V., Khemani, B., Patil, S., Kotecha, K., Vora, D. R., Abraham, A., & Gabralla, L. A. (2022). A review on machine learning styles in computer vision—techniques and future directions. IEEE Access, 10, 107293-107329. DOI: 10.1109/access.2022.3209825
  3. Ericsson, L., Gouk, H., Loy, C. C., & Hospedales, T. M. (2022). Self-supervised representation learning: Introduction, advances, and challenges. IEEE Signal Processing Magazine, 39(3), 42-62. DOI: 10.1109/MSP.2021.3134634
  4. Wang, H., Liu, Z., Ge, Y., & Peng, D. (2022). Self-supervised signal representation learning for machinery fault diagnosis under limited annotation data. Knowledge-Based Systems, 239, 107978. DOI: 10.1016/j.knosys.2021.107978
  5. Rani, V., Kumar, M., Gupta, A., Sachdeva, M., Mittal, A., & Kumar, K. (2024). Self-supervised learning for medical image analysis: a comprehensive review. Evolving Systems, 15(4), 1607-1633. DOI: 10.1007/s12530-024-09581-w
  6. Liu, X., Zhang, F., Hou, Z., Mian, L., Wang, Z., Zhang, J., & Tang, J. (2021). Self-supervised learning: Generative or contrastive. IEEE Transactions on Knowledge and Data Engineering, 35(1), 857-876. DOI: 10.1109/TKDE.2021.3090866
  7. Yin, J., Wu, H., & Sun, S. (2023). Effective sample pairs based contrastive learning for clustering. Information Fusion, 99, 101899. DOI: 10.1016/j.inffus.2023.101899
  8. Hu, H., Wang, X., Zhang, Y., Chen, Q., & Guan, Q. (2024). A comprehensive survey on contrastive learning. Neurocomputing, 610, 128645. DOI: 10.1016/j.neucom.2024.128645
  9. Xiao, B., Tang, Y., & Liu, Y. (2025). Integrating Materials Representations Into Feature Engineering in Machine Learning for Crystalline Materials: From Local to Global Chemistry‐Structure Information Coupling. Wiley Interdisciplinary Reviews: Computational Molecular Science, 15(4), e70044. DOI: 10.1002/wcms.70044
  10. Kumar, P., Rawat, P., & Chauhan, S. (2022). Contrastive self-supervised learning: review, progress, challenges and future research directions. International Journal of Multimedia Information Retrieval, 11(4), 461-488. DOI: 10.1007/s13735-022-00245-6
  11. Gui, J., Chen, T., Zhang, J., Cao, Q., Sun, Z., Luo, H., & Tao, D. (2024). A survey on self-supervised learning: Algorithms, applications, and future trends. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12), 9052-9071. DOI: 10.1109/TPAMI.2024.3415112
  12. Chen, Z., Hu, B., Chen, Z., & Zhang, J. (2024). Progress and thinking on self-supervised learning methods in computer vision: A review. IEEE Sensors Journal, 24(19), 29524-29544. DOI: 10.1109/jsen.2024.3443885
  13. Jaiswal, A., Babu, A. R., Zadeh, M. Z., Banerjee, D., & Makedon, F. (2020). A survey on contrastive self-supervised learning. Technologies, 9(1), 2. DOI: 10.3390/technologies9010002
  14. Yang, Z., Ding, M., Huang, T., Cen, Y., Song, J., Xu, B., ... & Tang, J. (2024). Does negative sampling matter? a review with insights into its theory and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8), 5692-5711. DOI: 10.1109/TPAMI.2024.3371473
  15. Li, P., Shao, B., Zhao, G., & Liu, Z. P. (2025). Negative sampling strategies impact the prediction of scale-free biomolecular network interactions with machine learning. BMC Biology, 23(1), 123. DOI: 10.1186/S12915-025-02231-W
  16. Chen, C., Ma, W., Zhang, M., Wang, C., Liu, Y., & Ma, S. (2023). Revisiting negative sampling vs. non-sampling in implicit recommendation. ACM Transactions on Information Systems, 41(1), 1-25. DOI: 10.1145/3522672
  17. Iliadis, D., De Baets, B., & Waegeman, W. (2022). Multi-target prediction for dummies using two-branch neural networks. Machine Learning, 111(2), 651-684. DOI: 10.1007/s10994-021-06104-5
  18. Wang, S., Cheng, X., Li, Y., Song, X., Guo, R., Zhang, H., & Liang, Z. (2023). Rapid visual simulation of the progressive collapse of regular reinforced concrete frame structures based on machine learning and physics engine. Engineering Structures, 286, 116129. DOI: 10.1016/j.engstruct.2023.116129
  19. Zhou, S., Xu, H., Zheng, Z., Chen, J., Li, Z., Bu, J., ... & Ester, M. (2024). A comprehensive survey on deep clustering: Taxonomy, challenges, and future directions. ACM Computing Surveys, 57(3), 1-38. DOI: 10.1145/3689036
  20. Xu, J., Ren, Y., Tang, H., Yang, Z., Pan, L., Yang, Y., ... & He, L. (2022). Self-supervised discriminative feature learning for deep multi-view clustering. IEEE Transactions on Knowledge and Data Engineering, 35(7), 7470-7482. DOI: 10.1109/TKDE.2022.3193569
  21. Jiao, L., Gao, J., Liu, X., Liu, F., Yang, S., & Hou, B. (2021). Multiscale representation learning for image classification: A survey. IEEE Transactions on Artificial Intelligence, 4(1), 23-43. DOI: 10.1109/tai.2021.3135248
  22. Jiao, L., Wang, M., Liu, X., Li, L., Liu, F., Feng, Z., ... & Hou, B. (2024). Multiscale deep learning for detection and recognition: A comprehensive survey. IEEE Transactions on Neural Networks and Learning Systems, 36(4), 5900-5920. DOI: 10.1109/TNNLS.2024.3389454
  23. Zhang, Z., Yang, Q., & Zi, Y. (2021). Multi-scale and multi-pooling sparse filtering: a simple and effective representation learning method for intelligent fault diagnosis. Neurocomputing, 451, 138-151. DOI: 10.1016/j.neucom.2021.04.066
  24. Chen, N., Yang, R., Zhao, Y., Dai, Q., & Wang, L. (2025). Remote Sensing Image Segmentation Network That Integrates Global–Local Multi-Scale Information with Deep and Shallow Features. Remote Sensing, 17(11), 1880. DOI: 10.3390/rs17111880
  25. Qin, J., Huang, Y., & Wen, W. (2020). Multi-scale feature fusion residual network for single image super-resolution. Neurocomputing, 379, 334-342. DOI: 10.1016/j.neucom.2019.10.076
  26. Bian, K., & Priyadarshi, R. (2024). Machine learning optimization techniques: a survey, classification, challenges, and future research issues. Archives of Computational Methods in Engineering, 31(7), 4209-4233. DOI: 10.1007/s11831-024-10110-w
  27. Kim, D., Sohn, C. B., Kim, D. Y., & Kim, D. Y. (2025). A Taxonomy and Theoretical Analysis of Collapse Phenomena in Unsupervised Representation Learning. Mathematics, 13(18), 2986. DOI: 10.3390/math13182986
  28. Ribas, L. C., Casaca, W., & Fares, R. T. (2025). Conditional generative adversarial networks and deep learning data augmentation: a multi-perspective data-driven survey across multiple application fields and classification architectures. AI, 6(2), 32. DOI: 10.3390/ai6020032
  29. Lin, J., Hu, G., & Chen, J. (2025). A data augmentation method for computer vision task with feature conversion between class. Computers and Electronics in Agriculture, 231, 109909. DOI: 10.1016/j.compag.2025.109909
  30. Khan, A., Rauf, Z., Sohail, A., Khan, A. R., Asif, H., Asif, A., & Farooq, U. (2023). A survey of the vision transformers and their CNN-transformer based variants. Artificial Intelligence Review, 56(Suppl 3), 2917-2970. DOI: 10.1007/s10462-023-10595-0
  31. Kim, J. W., Khan, A. U., & Banerjee, I. (2025). Systematic review of hybrid vision transformer architectures for radiological image analysis. Journal of Imaging Informatics in Medicine, 1-15. DOI: 10.1007/s10278-024-01322-4