Research on Intelligent Generation Algorithm of Interface Icon Based on Diffusion Model

Lijun  Liu

doi:10.71451/ISTAER2607

Authors

Lijun Liu School of Art and Design, Guangzhou Institute of Science and Technology, Guangzhou, Guangdong, China Author https://orcid.org/0009-0008-8785-7897

DOI:

https://doi.org/10.71451/ISTAER2607

Keywords:

Diffusion model; Interface icon generation; Multimodal conditional control; Structure perception; Style consistency

Abstract

To address the problems in interface icon generation, such as a lack of structural expression, difficulty in maintaining style consistency, and limited capability for multi-condition generation, this paper proposes a structure-aware intelligent icon generation method named IconDiff, which is based on a diffusion model. Based on the classical diffusion framework, this method introduces a structure-guided branching mechanism and a multimodal condition fusion mechanism to achieve collaborative modeling of text semantics, style features, and attribute information. It also enhances boundary clarity and semantic identifiability by designing an icon-specific loss function. At the same time, a multidimensional annotation data set containing 268000 icon samples is constructed, and a special evaluation index system for icon tasks is designed. Under a unified experimental setup, compared with various mainstream generation methods, the proposed method reduces the FID by approximately 25.2%, improves structural clarity by about 6.0%, enhances identifiability by about 6.8%, and increases style consistency by about 7.8%. In addition, ablation experiments verify the effectiveness of the key modules. Generalization and robustness analysis show that the model maintains stable performance even in the absence of semantic and style conditions. The research results show that the method in this paper has significantly improved the generation quality and controllability, and provides an effective solution for the automatic design of interface icons.

References

[1] Petković, G., Pasanec Preprotić, S., & Kozjan Cindrić, A. (2025). Experiential Graphic Design: Informing, Inspiring, and Integrating People in Physical Spaces—A Review. Buildings, 15(11), 1862. DOI: https://doi.org/10.3390/buildings15111862

[2] Zhao, Y., Liang, Z., Qiu, Y., & Wang, X. (2025). A novel flexible identity-net with diffusion models for painting-style generation. Scientific Reports, 15(1), 27896. DOI: https://doi.org/10.1038/s41598-025-12434-4

[3] Jiang, S., Wu, M., Lai, Z., & Pu, Q. (2025). Mapping with a sense of place: a crowdsourced image-based color generation approach. Cartography and Geographic Information Science, 1-21. DOI: https://doi.org/10.1080/15230406.2025.2580432

[4] Eswaran, U., & Eswaran, V. (2025). AI-driven cross-platform design: Enhancing usability and user experience. In Navigating usability and user experience in a multi-platform world (pp. 19-48). IGI Global. DOI: https://doi.org/10.4018/979-8-3693-2337-3.ch002

[5] Yuzhao, Z. (2025). Research on Cross-Platform Data Fusion and Intelligent Analysis Methods for Online Communication. International Journal of High Speed Electronics and Systems, 2540876. DOI: https://doi.org/10.1142/S0129156425408769

[6] Collaud, R., Reppa, I., Défayes, L., McDougall, S., Henchoz, N., & Sonderegger, A. (2022). Design standards for icons: The independent role of aesthetics, visual complexity and concreteness in icon design and icon understanding. Displays, 74, 102290. DOI: https://doi.org/10.1016/j.displa.2022.102290

[7] Zhou, Y., Leng, H., Meng, S., Wu, H., & Zhang, Z. (2024). StructDiffusion: End-to-end intelligent shear wall structure layout generation and analysis using diffusion model. Engineering Structures, 309, 118068. DOI: https://doi.org/10.1016/j.engstruct.2024.118068

[8] Leng, H., Gao, Y., & Zhou, Y. (2024). ArchiDiffusion: A novel diffusion model connecting architectural layout generation from sketches to Shear Wall Design. Journal of Building Engineering, 98, 111373. DOI: https://doi.org/10.1016/j.jobe.2024.111373

[9] Po, R., Yifan, W., Golyanik, V., Aberman, K., Barron, J. T., Bermano, A., ... & Wetzstein, G. (2024, May). State of the art on diffusion models for visual computing. In Computer graphics forum (Vol. 43, No. 2, p. e15063). DOI: https://doi.org/10.1111/cgf.15063

[10] Wang, B., Chen, Q., & Wang, Z. (2025). Diffusion-based visual art creation: A survey and new perspectives. ACM Computing Surveys, 57(10), 1-37. DOI: https://doi.org/10.1145/3728459

[11] Amador-Domínguez, E., Serrano, E., & Manrique, D. (2024). Neurosymbolic system profiling: A template-based approach. Knowledge-Based Systems, 287, 111441. DOI: https://doi.org/10.1016/j.knosys.2024.111441

[12] Yu, S., Fang, C., Tuo, Z., Zhang, Q., Chen, C., Chen, Z., & Su, Z. (2025). Vision-based mobile app gui testing: A survey. ACM Computing Surveys, 58(6), 1-46. DOI: https://doi.org/10.1145/3773027

[13] França, R. P., Monteiro, A. C. B., Arthur, R., & Iano, Y. (2021). An overview of deep learning in big data, image, and signal processing in the modern digital age. Trends in deep learning methodologies, 63-87. DOI: https://doi.org/10.1016/B978-0-12-822226-3.00003-9

[14] Zhang, X., & Jia, Y. (2023). Fractal Art Graphic Generation Based on Deep Learning Driven Intelligence. Computer-Aided Design and Applications, 152-165. DOI: https://doi.org/10.14733/cadaps.2024.S3.152-165

[15] Wang, S., Du, Y., Guo, X., Pan, B., Qin, Z., & Zhao, L. (2024). Controllable data generation by deep learning: A review. ACM Computing Surveys, 56(9), 1-38. DOI: https://doi.org/10.1145/3648609

[16] Li, J., Yang, J., Zhang, J., Liu, C., Wang, C., & Xu, T. (2020). Attribute-conditioned layout gan for automatic graphic design. IEEE Transactions on Visualization and Computer Graphics, 27(10), 4039-4048. DOI: https://doi.org/10.1109/TVCG.2020.2999335

[17] Silva‐Silverio, A., Gómez‐Gil, P., & Sánchez‐Argüelles, D. O. (2025). Conditional GAN Approaches on Regression Labels: A State‐of‐the‐Art Review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 15(4), e70050. DOI: https://doi.org/10.1002/widm.70050

[18] Wołczyk, M., Proszewska, M., Maziarka, Ł., Zieba, M., Wielopolski, P., Kurczab, R., & Smieja, M. (2022, June). Plugen: Multi-label conditional generation from pre-trained models. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36, No. 8, pp. 8647-8656). DOI: https://doi.org/10.1109/TPAMI.2024.3382008

[19] Ma, H., & Wong, H. C. (2026). A Survey of Diffusion Models: Methods and Applications. Applied Sciences, 16(5), 2482. DOI: https://doi.org/10.3390/app16052482

[20] Croitoru, F. A., Hondru, V., Ionescu, R. T., & Shah, M. (2023). Diffusion models in vision: A survey. IEEE transactions on pattern analysis and machine intelligence, 45(9), 10850-10869. DOI: https://doi.org/10.1109/TPAMI.2023.3261988

[21] Luo, J., Yang, L., Liu, Y., Hu, C., Wang, G., Yang, Y., ... & Zhou, X. (2025). Review of diffusion models and its applications in biomedical informatics. BMC Medical Informatics and Decision Making, 25(1), 390. DOI: https://doi.org/10.1186/s12911-025-03210-5

[22] Wu, T., Li, M., Chen, J., Ji, W., Lin, W., Gao, J., ... & Wu, F. (2024, October). Semantic alignment for multimodal large language models. In Proceedings of the 32nd ACM International Conference on Multimedia (pp. 3489-3498). DOI: https://doi.org/10.1145/3664647.3681014

[23] Peng, Y. (2025). A CLIP-based cross-modal matching model for image-text retrieval. Information Technology and Control, 54(3), 1030-1048. DOI: https://doi.org/10.5755/j01.itc.54.3.41801

[24] Peng, F., Yang, X., Xiao, L., Wang, Y., & Xu, C. (2023). Sgva-clip: Semantic-guided visual adapting of vision-language models for few-shot image classification. IEEE Transactions on Multimedia, 26, 3469-3480. DOI: https://doi.org/10.1109/TMM.2023.3311646

[25] Huang, Q., & Huang, J. (2025). Comprehensive review of edge and contour detection: from traditional methods to recent advances. Neural Computing and Applications, 37(4), 2175-2209. DOI: https://doi.org/10.1007/s00521-024-10936-2

[26] Chen, Z., Zhou, H., Lai, J., Yang, L., & Xie, X. (2020). Contour-aware loss: Boundary-aware learning for salient object segmentation. IEEE Transactions on Image Processing, 30, 431-443. DOI: https://doi.org/10.1109/TIP.2020.3037536

[27] Wang, J., Zhou, C., & Huang, Y. (2025). Contour-aware multi-expert model for ambiguous medical image segmentation. IEEE Transactions on Medical Imaging. DOI: https://doi.org/10.1109/TMI.2025.3561117

[28] Ma, S., Li, X., Tang, J., & Guo, F. (2024). Aggregate-aware model with bidirectional edge generation for medical image segmentation. Applied Soft Computing, 163, 111918. DOI: https://doi.org/10.1016/j.asoc.2024.111918

[29] Jiang, H., Imran, M., Zhang, T., Zhou, Y., Liang, M., Gong, K., & Shao, W. (2025). Fast-DDPM: Fast denoising diffusion probabilistic models for medical image-to-image generation. IEEE Journal of Biomedical and Health Informatics. DOI: https://doi.org/10.1109/JBHI.2025.3565183

[30] Zhang, H., Yuan, J., Tian, X., & Ma, J. (2021). GAN-FM: Infrared and visible image fusion using GAN with full-scale skip connection and dual Markovian discriminators. IEEE Transactions on Computational Imaging, 7, 1134-1147. DOI: https://doi.org/10.1109/TCI.2021.3119954

[31] Ran, X., Xi, Y., Lu, Y., Wang, X., & Lu, Z. (2023). Comprehensive survey on hierarchical clustering algorithms and the recent developments. Artificial Intelligence Review, 56(8), 8219-8264. DOI: https://doi.org/10.1007/s10462-022-10366-3