<?xml version="1.0"?>
<!DOCTYPE ArticleSet PUBLIC "-//NLM//DTD PubMed 2.0//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/static/PubMed.dtd">
<ArticleSet>
  <Article>
    <Journal>
      <PublisherName>Sichuan Knowledgeable Intelligent Sciences</PublisherName>
      <JournalTitle>International Scientific Technical  and Economic Research </JournalTitle>
      <Issn>2959-1309</Issn>
      <Volume>4</Volume>
      <Issue>2</Issue>
      <PubDate PubStatus="epublish">
        <Year>2026</Year>
        <Month>04</Month>
        <Day>12</Day>
      </PubDate>
    </Journal>
    <ArticleTitle>Research on Image Representation Learning Method Based on Self-Supervised Learning</ArticleTitle>
    <FirstPage>78</FirstPage>
    <LastPage>97</LastPage>
    <ELocationID EIdType="doi">10.71451/ISTAER2616</ELocationID>
    <Language>eng</Language>
    <AuthorList>
      <Author>
        <FirstName>Juanpeng</FirstName>
        <LastName>Zhang</LastName>
        <Affiliation>Department of Electrical Engineering, Cheongju University, Cheongju, Seoul, Republic of Korea</Affiliation>
        <Identifier Source="ORCID">0009-0000-6778-6755</Identifier>
      </Author>
    </AuthorList>
    <History>
      <PubDate PubStatus="received">
        <Year>2026</Year>
        <Month>04</Month>
        <Day>12</Day>
      </PubDate>
      <PubDate PubStatus="accepted">
        <Year>2026</Year>
        <Month>04</Month>
        <Day>12</Day>
      </PubDate>
    </History>
    <Abstract>
Aiming at the problems of negative sample dependence, representation degradation, and insufficient cross-scale modeling in self-supervised image representation learning, this paper proposes a self-supervised learning framework that combines multi-view consistent learning and cross-scale feature fusion. This method constructs a multi-branch collaborative structure, introduces a non-negative sample optimization strategy and a feature distribution constraint mechanism, and achieves efficient mining and stable expression of image semantic information. On the ImageNet dataset, the accuracy of linear evaluation reached 77.8%, which was 8.5% and 2.5% higher than that of SimCLR and SwAV, respectively; In downstream tasks, the target detection mAP increased by about 2.5%, and the semantic segmentation mIoU increased by about 2.5%. At the same time, the accuracy improves by 7.5% under noise disturbance, demonstrating stronger robustness. The experimental results show that this method is superior to the existing mainstream methods in terms of characterization quality, generalization ability and training stability, and has good application potential.
</Abstract>
  </Article>
</ArticleSet>
