Research on Intelligent Generation Algorithm of Interface Icon Based on Diffusion Model

Lijun Liu1
1 School of Art and Design, Guangzhou Institute of Science and Technology, Guangzhou, Guangdong, China
International Scientific Technical and Economic Research 2026, Vol. 4, No. 1, pp. 149-167
DOI: 10.71451/ISTAER2607
Received: 31 December 2025; Revised: 14 February 2026; Accepted: 13 March 2026; Published: 21 March 2026
Abstract

To address the problems in interface icon generation, such as a lack of structural expression, difficulty in maintaining style consistency, and limited capability for multi-condition generation, this paper proposes a structure-aware intelligent icon generation method named IconDiff, which is based on a diffusion model. Based on the classical diffusion framework, this method introduces a structure-guided branching mechanism and a multimodal condition fusion mechanism to achieve collaborative modeling of text semantics, style features, and attribute information. It also enhances boundary clarity and semantic identifiability by designing an icon-specific loss function. At the same time, a multidimensional annotation data set containing 268000 icon samples is constructed, and a special evaluation index system for icon tasks is designed. Under a unified experimental setup, compared with various mainstream generation methods, the proposed method reduces the FID by approximately 25.2%, improves structural clarity by about 6.0%, enhances identifiability by about 6.8%, and increases style consistency by about 7.8%. In addition, ablation experiments verify the effectiveness of the key modules. Generalization and robustness analysis show that the model maintains stable performance even in the absence of semantic and style conditions. The research results show that the method in this paper has significantly improved the generation quality and controllability, and provides an effective solution for the automatic design of interface icons.

Keywords
Diffusion model Interface icon generation Multimodal conditional control Structure perception Style consistency
Funding

This research was funded by Guangzhou Institute of Science and Technology, Project No.: 2025gip010.

References
  1. Petković, G., Pasanec Preprotić, S., & Kozjan Cindrić, A. (2025). Experiential Graphic Design: Informing, Inspiring, and Integrating People in Physical Spaces—A Review. Buildings, 15(11), 1862. DOI: 10.3390/buildings15111862
  2. Zhao, Y., Liang, Z., Qiu, Y., & Wang, X. (2025). A novel flexible identity-net with diffusion models for painting-style generation. Scientific Reports, 15(1), 27896. DOI: 10.1038/s41598-025-12434-4
  3. Jiang, S., Wu, M., Lai, Z., & Pu, Q. (2025). Mapping with a sense of place: a crowdsourced image-based color generation approach. Cartography and Geographic Information Science, 1-21. DOI: 10.1080/15230406.2025.2580432
  4. Eswaran, U., & Eswaran, V. (2025). AI-driven cross-platform design: Enhancing usability and user experience. In Navigating usability and user experience in a multi-platform world (pp. 19-48). IGI Global. DOI: 10.4018/979-8-3693-2337-3.ch002
  5. Yuzhao, Z. (2025). Research on Cross-Platform Data Fusion and Intelligent Analysis Methods for Online Communication. International Journal of High Speed Electronics and Systems, 2540876. DOI: 10.1142/S0129156425408769
  6. Collaud, R., Reppa, I., Défayes, L., McDougall, S., Henchoz, N., & Sonderegger, A. (2022). Design standards for icons: The independent role of aesthetics, visual complexity and concreteness in icon design and icon understanding. Displays, 74, 102290. DOI: 10.1016/j.displa.2022.102290
  7. Zhou, Y., Leng, H., Meng, S., Wu, H., & Zhang, Z. (2024). StructDiffusion: End-to-end intelligent shear wall structure layout generation and analysis using diffusion model. Engineering Structures, 309, 118068. DOI: 10.1016/j.engstruct.2024.118068
  8. Leng, H., Gao, Y., & Zhou, Y. (2024). ArchiDiffusion: A novel diffusion model connecting architectural layout generation from sketches to Shear Wall Design. Journal of Building Engineering, 98, 111373. DOI: 10.1016/j.jobe.2024.111373
  9. Po, R., Yifan, W., Golyanik, V., Aberman, K., Barron, J. T., Bermano, A., ... & Wetzstein, G. (2024). State of the art on diffusion models for visual computing. In Computer Graphics Forum (Vol. 43, No. 2, p. e15063). DOI: 10.1111/cgf.15063
  10. Wang, B., Chen, Q., & Wang, Z. (2025). Diffusion-based visual art creation: A survey and new perspectives. ACM Computing Surveys, 57(10), 1-37. DOI: 10.1145/3728459
  11. Amador-Domínguez, E., Serrano, E., & Manrique, D. (2024). Neurosymbolic system profiling: A template-based approach. Knowledge-Based Systems, 287, 111441. DOI: 10.1016/j.knosys.2024.111441
  12. Yu, S., Fang, C., Tuo, Z., Zhang, Q., Chen, C., Chen, Z., & Su, Z. (2025). Vision-based mobile app gui testing: A survey. ACM Computing Surveys, 58(6), 1-46. DOI: 10.1145/3773027
  13. França, R. P., Monteiro, A. C. B., Arthur, R., & Iano, Y. (2021). An overview of deep learning in big data, image, and signal processing in the modern digital age. Trends in Deep Learning Methodologies, 63-87. DOI: 10.1016/B978-0-12-822226-3.00003-9
  14. Zhang, X., & Jia, Y. (2023). Fractal Art Graphic Generation Based on Deep Learning Driven Intelligence. Computer-Aided Design and Applications, 152-165. DOI: 10.14733/cadaps.2024.S3.152-165
  15. Wang, S., Du, Y., Guo, X., Pan, B., Qin, Z., & Zhao, L. (2024). Controllable data generation by deep learning: A review. ACM Computing Surveys, 56(9), 1-38. DOI: 10.1145/3648609
  16. Li, J., Yang, J., Zhang, J., Liu, C., Wang, C., & Xu, T. (2020). Attribute-conditioned layout gan for automatic graphic design. IEEE Transactions on Visualization and Computer Graphics, 27(10), 4039-4048. DOI: 10.1109/TVCG.2020.2999335
  17. Silva‐Silverio, A., Gómez‐Gil, P., & Sánchez‐Argüelles, D. O. (2025). Conditional GAN Approaches on Regression Labels: A State‐of‐the‐Art Review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 15(4), e70050. DOI: 10.1002/widm.70050
  18. Wołczyk, M., Proszewska, M., Maziarka, Ł., Zieba, M., Wielopolski, P., Kurczab, R., & Smieja, M. (2022). Plugen: Multi-label conditional generation from pre-trained models. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 8, pp. 8647-8656). DOI: 10.1109/TPAMI.2024.3382008
  19. Ma, H., & Wong, H. C. (2026). A Survey of Diffusion Models: Methods and Applications. Applied Sciences, 16(5), 2482. DOI: 10.3390/app16052482
  20. Croitoru, F. A., Hondru, V., Ionescu, R. T., & Shah, M. (2023). Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9), 10850-10869. DOI: 10.1109/TPAMI.2023.3261988
  21. Luo, J., Yang, L., Liu, Y., Hu, C., Wang, G., Yang, Y., ... & Zhou, X. (2025). Review of diffusion models and its applications in biomedical informatics. BMC Medical Informatics and Decision Making, 25(1), 390. DOI: 10.1186/s12911-025-03210-5
  22. Wu, T., Li, M., Chen, J., Ji, W., Lin, W., Gao, J., ... & Wu, F. (2024). Semantic alignment for multimodal large language models. In Proceedings of the 32nd ACM International Conference on Multimedia (pp. 3489-3498). DOI: 10.1145/3664647.3681014
  23. Peng, Y. (2025). A CLIP-based cross-modal matching model for image-text retrieval. Information Technology and Control, 54(3), 1030-1048. DOI: 10.5755/j01.itc.54.3.41801
  24. Peng, F., Yang, X., Xiao, L., Wang, Y., & Xu, C. (2023). Sgva-clip: Semantic-guided visual adapting of vision-language models for few-shot image classification. IEEE Transactions on Multimedia, 26, 3469-3480. DOI: 10.1109/TMM.2023.3311646
  25. Huang, Q., & Huang, J. (2025). Comprehensive review of edge and contour detection: from traditional methods to recent advances. Neural Computing and Applications, 37(4), 2175-2209. DOI: 10.1007/s00521-024-10936-2
  26. Chen, Z., Zhou, H., Lai, J., Yang, L., & Xie, X. (2020). Contour-aware loss: Boundary-aware learning for salient object segmentation. IEEE Transactions on Image Processing, 30, 431-443. DOI: 10.1109/TIP.2020.3037536
  27. Wang, J., Zhou, C., & Huang, Y. (2025). Contour-aware multi-expert model for ambiguous medical image segmentation. IEEE Transactions on Medical Imaging. DOI: 10.1109/TMI.2025.3561117
  28. Ma, S., Li, X., Tang, J., & Guo, F. (2024). Aggregate-aware model with bidirectional edge generation for medical image segmentation. Applied Soft Computing, 163, 111918. DOI: 10.1016/j.asoc.2024.111918
  29. Jiang, H., Imran, M., Zhang, T., Zhou, Y., Liang, M., Gong, K., & Shao, W. (2025). Fast-DDPM: Fast denoising diffusion probabilistic models for medical image-to-image generation. IEEE Journal of Biomedical and Health Informatics. DOI: 10.1109/JBHI.2025.3565183
  30. Zhang, H., Yuan, J., Tian, X., & Ma, J. (2021). GAN-FM: Infrared and visible image fusion using GAN with full-scale skip connection and dual Markovian discriminators. IEEE Transactions on Computational Imaging, 7, 1134-1147. DOI: 10.1109/TCI.2021.3119954
  31. Ran, X., Xi, Y., Lu, Y., Wang, X., & Lu, Z. (2023). Comprehensive survey on hierarchical clustering algorithms and the recent developments. Artificial Intelligence Review, 56(8), 8219-8264. DOI: 10.1007/s10462-022-10366-3