Research on Image Representation Learning Method Based on Self-Supervised Learning

Juanpeng Zhang

doi:10.71451/ISTAER2616

Authors

Juanpeng Zhang Department of Electrical Engineering, Cheongju University, Cheongju, Seoul, Republic of Korea Author https://orcid.org/0009-0000-6778-6755

DOI:

https://doi.org/10.71451/ISTAER2616

Keywords:

Self-supervised learning; Image representation learning; Cross-scale feature fusion; Non-negative sample learning; Deep learning

Abstract

Aiming at the problems of negative sample dependence, representation degradation, and insufficient cross-scale modeling in self-supervised image representation learning, this paper proposes a self-supervised learning framework that combines multi-view consistent learning and cross-scale feature fusion. This method constructs a multi-branch collaborative structure, introduces a non-negative sample optimization strategy and a feature distribution constraint mechanism, and achieves efficient mining and stable expression of image semantic information. On the ImageNet dataset, the accuracy of linear evaluation reached 77.8%, which was 8.5% and 2.5% higher than that of SimCLR and SwAV, respectively; In downstream tasks, the target detection mAP increased by about 2.5%, and the semantic segmentation mIoU increased by about 2.5%. At the same time, the accuracy improves by 7.5% under noise disturbance, demonstrating stronger robustness. The experimental results show that this method is superior to the existing mainstream methods in terms of characterization quality, generalization ability and training stability, and has good application potential.

References

[1] Bayoudh, K., Knani, R., Hamdaoui, F., & Mtibaa, A. (2022). A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. The Visual Computer, 38(8), 2939-2970. DOI: https://doi.org/10.1007/s00371-021-02166-7

[2] Mahadevkar, S. V., Khemani, B., Patil, S., Kotecha, K., Vora, D. R., Abraham, A., & Gabralla, L. A. (2022). A review on machine learning styles in computer vision—techniques and future directions. Ieee Access, 10, 107293-107329. DOI: https://doi.org/10.1109/access.2022.3209825

[3] Ericsson, L., Gouk, H., Loy, C. C., & Hospedales, T. M. (2022). Self-supervised representation learning: Introduction, advances, and challenges. IEEE Signal Processing Magazine, 39(3), 42-62. DOI: https://doi.org/10.1109/MSP.2021.3134634

[4] Wang, H., Liu, Z., Ge, Y., & Peng, D. (2022). Self-supervised signal representation learning for machinery fault diagnosis under limited annotation data. Knowledge-based systems, 239, 107978. DOI: https://doi.org/10.1016/j.knosys.2021.107978

[5] Rani, V., Kumar, M., Gupta, A., Sachdeva, M., Mittal, A., & Kumar, K. (2024). Self-supervised learning for medical image analysis: a comprehensive review. Evolving Systems, 15(4), 1607-1633. DOI: https://doi.org/10.1007/s12530-024-09581-w

[6] Liu, X., Zhang, F., Hou, Z., Mian, L., Wang, Z., Zhang, J., & Tang, J. (2021). Self-supervised learning: Generative or contrastive. IEEE transactions on knowledge and data engineering, 35(1), 857-876. DOI: https://doi.org/10.1109/TKDE.2021.3090866

[7] Yin, J., Wu, H., & Sun, S. (2023). Effective sample pairs based contrastive learning for clustering. Information Fusion, 99, 101899. DOI: https://doi.org/10.1016/j.inffus.2023.101899

[8] Hu, H., Wang, X., Zhang, Y., Chen, Q., & Guan, Q. (2024). A comprehensive survey on contrastive learning. Neurocomputing, 610, 128645. DOI: https://doi.org/10.1016/j.neucom.2024.128645

[9] Xiao, B., Tang, Y., & Liu, Y. (2025). Integrating Materials Representations Into Feature Engineering in Machine Learning for Crystalline Materials: From Local to Global Chemistry‐Structure Information Coupling. Wiley Interdisciplinary Reviews: Computational Molecular Science, 15(4), e70044. DOI: https://doi.org/10.1002/wcms.70044

[10] Kumar, P., Rawat, P., & Chauhan, S. (2022). Contrastive self-supervised learning: review, progress, challenges and future research directions. International Journal of Multimedia Information Retrieval, 11(4), 461-488. DOI: https://doi.org/10.1007/s13735-022-00245-6

[11] Gui, J., Chen, T., Zhang, J., Cao, Q., Sun, Z., Luo, H., & Tao, D. (2024). A survey on self-supervised learning: Algorithms, applications, and future trends. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12), 9052-9071. DOI: https://doi.org/10.1109/TPAMI.2024.3415112

[12] Chen, Z., Hu, B., Chen, Z., & Zhang, J. (2024). Progress and thinking on self-supervised learning methods in computer vision: A review. IEEE Sensors Journal, 24(19), 29524-29544. DOI: https://doi.org/10.1109/jsen.2024.3443885

[13] Jaiswal, A., Babu, A. R., Zadeh, M. Z., Banerjee, D., & Makedon, F. (2020). A survey on contrastive self-supervised learning. Technologies, 9(1), 2. DOI: https://doi.org/10.3390/technologies9010002

[14] Yang, Z., Ding, M., Huang, T., Cen, Y., Song, J., Xu, B., ... & Tang, J. (2024). Does negative sampling matter? a review with insights into its theory and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8), 5692-5711. DOI: https://doi.org/10.1109/TPAMI.2024.3371473

[15] Li, P., Shao, B., Zhao, G., & Liu, Z. P. (2025). Negative sampling strategies impact the prediction of scale-free biomolecular network interactions with machine learning. BMC biology, 23(1), 123. DOI: https://doi.org/10.1186/S12915-025-02231-W

[16] Chen, C., Ma, W., Zhang, M., Wang, C., Liu, Y., & Ma, S. (2023). Revisiting negative sampling vs. non-sampling in implicit recommendation. ACM Transactions on Information Systems, 41(1), 1-25. DOI: https://doi.org/10.1145/3522672

[17] Iliadis, D., De Baets, B., & Waegeman, W. (2022). Multi-target prediction for dummies using two-branch neural networks. Machine Learning, 111(2), 651-684. DOI: https://doi.org/10.1007/s10994-021-06104-5

[18] Wang, S., Cheng, X., Li, Y., Song, X., Guo, R., Zhang, H., & Liang, Z. (2023). Rapid visual simulation of the progressive collapse of regular reinforced concrete frame structures based on machine learning and physics engine. Engineering Structures, 286, 116129. DOI: https://doi.org/10.1016/j.engstruct.2023.116129

[19] Zhou, S., Xu, H., Zheng, Z., Chen, J., Li, Z., Bu, J., ... & Ester, M. (2024). A comprehensive survey on deep clustering: Taxonomy, challenges, and future directions. ACM Computing Surveys, 57(3), 1-38. DOI: https://doi.org/10.1145/3689036

[20] Xu, J., Ren, Y., Tang, H., Yang, Z., Pan, L., Yang, Y., ... & He, L. (2022). Self-supervised discriminative feature learning for deep multi-view clustering. IEEE Transactions on Knowledge and Data Engineering, 35(7), 7470-7482. DOI: https://doi.org/10.1109/TKDE.2022.3193569

[21] Jiao, L., Gao, J., Liu, X., Liu, F., Yang, S., & Hou, B. (2021). Multiscale representation learning for image classification: A survey. IEEE Transactions on Artificial Intelligence, 4(1), 23-43. DOI: https://doi.org/10.1109/tai.2021.3135248

[22] Jiao, L., Wang, M., Liu, X., Li, L., Liu, F., Feng, Z., ... & Hou, B. (2024). Multiscale deep learning for detection and recognition: A comprehensive survey. IEEE Transactions on Neural Networks and Learning Systems, 36(4), 5900-5920. DOI: https://doi.org/10.1109/TNNLS.2024.3389454

[23] Zhang, Z., Yang, Q., & Zi, Y. (2021). Multi-scale and multi-pooling sparse filtering: a simple and effective representation learning method for intelligent fault diagnosis. Neurocomputing, 451, 138-151. DOI: https://doi.org/10.1016/j.neucom.2021.04.066

[24] Chen, N., Yang, R., Zhao, Y., Dai, Q., & Wang, L. (2025). Remote Sensing Image Segmentation Network That Integrates Global–Local Multi-Scale Information with Deep and Shallow Features. Remote Sensing, 17(11), 1880. DOI: https://doi.org/10.3390/rs17111880

[25] Qin, J., Huang, Y., & Wen, W. (2020). Multi-scale feature fusion residual network for single image super-resolution. Neurocomputing, 379, 334-342. DOI: https://doi.org/10.1016/j.neucom.2019.10.076

[26] Bian, K., & Priyadarshi, R. (2024). Machine learning optimization techniques: a survey, classification, challenges, and future research issues. Archives of Computational Methods in Engineering, 31(7), 4209-4233. DOI: https://doi.org/10.1007/s11831-024-10110-w

[27] Kim, D., Sohn, C. B., Kim, D. Y., & Kim, D. Y. (2025). A Taxonomy and Theoretical Analysis of Collapse Phenomena in Unsupervised Representation Learning. Mathematics, 13(18), 2986. DOI: https://doi.org/10.3390/math13182986

[28] Ribas, L. C., Casaca, W., & Fares, R. T. (2025). Conditional generative adversarial networks and deep learning data augmentation: a multi-perspective data-driven survey across multiple application fields and classification architectures. AI, 6(2), 32. DOI: https://doi.org/10.3390/ai6020032

[29] Lin, J., Hu, G., & Chen, J. (2025). A data augmentation method for computer vision task with feature conversion between class. Computers and Electronics in Agriculture, 231, 109909. DOI: https://doi.org/10.1016/j.compag.2025.109909

[30] Khan, A., Rauf, Z., Sohail, A., Khan, A. R., Asif, H., Asif, A., & Farooq, U. (2023). A survey of the vision transformers and their CNN-transformer based variants. Artificial Intelligence Review, 56(Suppl 3), 2917-2970. DOI: https://doi.org/10.1007/s10462-023-10595-0

[31] Kim, J. W., Khan, A. U., & Banerjee, I. (2025). Systematic review of hybrid vision transformer architectures for radiological image analysis. Journal of Imaging Informatics in Medicine, 1-15. DOI: https://doi.org/10.1007/s10278-024-01322-4