Application of YOLOv10 Integrated with Attention Mechanism in the Senseless Monitoring of Students' Classroom Psychological State

Shaochong Yao

doi:10.71451/ISTAER2621

Authors

Shaochong Yao School of Information Engineering, Xi'an Mingde Institute of Technology, Xian, Shaanxi, China Author https://orcid.org/0009-0001-8284-5239

DOI:

https://doi.org/10.71451/ISTAER2621

Keywords:

YOLOv10; Attention mechanism; Classroom psychological state; Senseless monitoring; Real-time target detection

Abstract

This paper proposes a YOLOv10 model integrated with an attention mechanism for the senseless monitoring of students' psychological states in class, aiming to achieve high-precision, real-time, and non-invasive psychological state recognition. The method introduces a multi-layer attention module in both channel and spatial dimensions to enhance the representation ability of key features. At the same time, collaborative optimization of detection and mental state recognition is achieved by combining lightweight feature enhancement with an end-to-end mental state classification network. The model is validated on a large-scale real classroom dataset (561,200 images covering multiple disciplines, different lighting, and occlusion conditions). It achieves an mAP@0.5 of 0.873, a psychological state classification accuracy of 0.835, and an F1-score of 0.812, while maintaining a real-time performance of 69 FPS. Ablation experiments show that the attention module and the feature enhancement module contribute 4.4% and 5.3% to mAP, respectively, demonstrating the model's robustness in complex scenes. The stability and long-term monitoring capability of the system are further verified in 50 real classroom deployment experiments. The results show that this method achieves high-precision, real-time, and deployable monitoring of students' psychological states in intelligent education scenarios, providing quantifiable data support for classroom management and teaching optimization.

References

[1] Hickey, B. A., Chalmers, T., Newton, P., Lin, C. T., Sibbritt, D., McLachlan, C. S., ... & Lal, S. (2021). Smart devices and wearable technologies to detect and monitor mental health conditions and stress: A systematic review. Sensors, 21(10), 3461. DOI: https://doi.org/10.3390/s21103461

[2] Gomes, N., Pato, M., Lourenco, A. R., & Datia, N. (2023). A survey on wearable sensors for mental health monitoring. Sensors, 23(3), 1330. DOI: https://doi.org/10.3390/s23031330

[3] Sheikh, M., Qassem, M., & Kyriacou, P. A. (2021). Wearable, environmental, and smartphone-based passive sensing for mental health monitoring. Frontiers in digital health, 3, 662811. DOI: https://doi.org/10.3389/fdgth.2021.662811

[4] Gopalakrishnan, A., Gururajan, R., Zhou, X., Venkataraman, R., Chan, K. C., & Higgins, N. (2024). A survey of autonomous monitoring systems in mental health. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 14(3), e1527. DOI: https://doi.org/10.1002/widm.1527

[5] Vamshi Krishna, B., Padmavathy, N., & Kumar, A. (2025). Deep Learning Models for Monitoring Student's Emotion During the Class: A Comprehensive Survey. Artificial Intelligence and IoT in Online Education Systems: Monitoring, Assessment, and Evaluation, 165-201. DOI: https://doi.org/10.1002/9781394302666.ch6

[6] Zhang, X., Ding, Y., Huang, X., Li, W., Long, L., & Ding, S. (2024). Smart classrooms: How sensors and ai are shaping educational paradigms. Sensors, 24(17), 5487. DOI: https://doi.org/10.3390/s24175487

[7] Leo, M., Carcagnì, P., Mazzeo, P. L., Spagnolo, P., Cazzato, D., & Distante, C. (2020). Analysis of facial information for healthcare applications: A survey on computer vision-based approaches. Information, 11(3), 128. DOI: https://doi.org/10.3390/info11030128

[8] Manakitsa, N., Maraslidis, G. S., Moysis, L., & Fragulis, G. F. (2024). A review of machine learning and deep learning for object detection, semantic segmentation, and human action recognition in machine and robotic vision. Technologies, 12(2), 15. DOI: https://doi.org/10.3390/technologies12020015

[9] Jiang, Z., Luskus, M., Seyedi, S., Griner, E. L., Rad, A. B., Clifford, G. D., ... & Cotes, R. O. (2022). Utilizing computer vision for facial behavior analysis in schizophrenia studies: A systematic review. PloS one, 17(4), e0266828. DOI: https://doi.org/10.1371/journal.pone.0266828

[10] Sarma, D., & Bhuyan, M. K. (2021). Methods, databases and recent advancement of vision-based hand gesture recognition for hci systems: A review. SN Computer Science, 2(6), 436. DOI: https://doi.org/10.1007/s42979-021-00827-x

[11] Tong, F. (2025). Edge-Assisted CNN-Attention Model for Real-Time Multimodal Learner State Recognition in IoT-Enhanced Educational Systems. Informatica, 49(32). DOI: https://doi.org/10.31449/inf.v49i32.10569

[12] Rasheed, S. (2026). Lightweight Deep Learning Models for Face Mask Detection in Real-Time Edge Environments: A Review and Future Research Directions. Machine Learning and Knowledge Extraction, 8(4), 102. DOI: https://doi.org/10.3390/make8040102

[13] Hosain, M. T., Zaman, A., Abir, M. R., Akter, S., Mursalin, S., & Khan, S. S. (2024). Synchronizing object detection: Applications, advancements and existing challenges. IEEE access, 12, 54129-54167. DOI: https://doi.org/10.1109/access.2024.3388889

[14] Saikrishna, P. S. (2026). Affective Edge Computing: Challenges and Opportunities in Decoding Emotional States. Bridging the Gap between Mind and Machine: Exploring the Future of Human-AI-Neurotechnology Integration, 41-63. DOI: https://doi.org/10.1007/978-3-032-06713-5_3

[15] Elhanashi, A., Dini, P., Saponara, S., & Zheng, Q. (2023). Integration of deep learning into the iot: A survey of techniques and challenges for real-world applications. Electronics, 12(24), 4925. DOI: https://doi.org/10.3390/electronics12244925

[16] Li, H., Yue, X., & Meng, L. (2022). Enhanced mechanisms of pooling and channel attention for deep learning feature maps. PeerJ Computer Science, 8, e1161. DOI: https://doi.org/10.7717/peerj-cs.1161

[17] Zhu, Y., Han, G., Zhu, H., & Zhang, F. (2025). Feature Description Attention: Channel-independent local–global fusion for multi-scale feature representation. Engineering Applications of Artificial Intelligence, 161, 112139. DOI: https://doi.org/10.1016/j.engappai.2025.112139

[18] Liu, T., Luo, R., Xu, L., Feng, D., Cao, L., Liu, S., & Guo, J. (2022). Spatial channel attention for deep convolutional neural networks. Mathematics, 10(10), 1750. DOI: https://doi.org/10.3390/math10101750

[19] Li, X., Lei, L., Sun, Y., Li, M., & Kuang, G. (2020). Multimodal bilinear fusion network with second-order attention-based channel selection for land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 1011-1026. DOI: https://doi.org/10.1109/jstars.2020.2975252

[20] Caicedo, J. E., Agudelo-Martínez, D., Rivas-Trujillo, E., & Meyer, J. (2023). A systematic review of real-time detection and classification of power quality disturbances. Protection and Control of Modern Power Systems, 8(1), 1-37. DOI: https://doi.org/10.1186/s41601-023-00277-y

[21] Panigrahi, R., Borah, S., Bhoi, A. K., Ijaz, M. F., Pramanik, M., Jhaveri, R. H., & Chowdhary, C. L. (2021). Performance assessment of supervised classifiers for designing intrusion detection systems: a comprehensive review and recommendations for future research. Mathematics, 9(6), 690. DOI: https://doi.org/10.3390/math9060690

[22] Thakkar, A., & Lohiya, R. (2022). A survey on intrusion detection system: feature selection, model, performance measures, application perspective, challenges, and future research directions. Artificial Intelligence Review, 55(1), 453-563. DOI: https://doi.org/10.1007/s10462-021-10037-9

[23] Şentaş, A., Tashiev, İ., Küçükayvaz, F., Kul, S., Eken, S., Sayar, A., & Becerikli, Y. (2020). Performance evaluation of support vector machine and convolutional neural network algorithms in real-time vehicle type and color classification. Evolutionary Intelligence, 13(1), 83-91. DOI: https://doi.org/10.1007/s12065-018-0167-z

[24] Mateen, M., Wen, J., Hassan, M., Nasrullah, N., Sun, S., & Hayat, S. (2020). Automatic detection of diabetic retinopathy: a review on datasets, methods and evaluation metrics. IEEE Access, 8, 48784-48811. DOI: https://doi.org/10.1109/access.2020.2980055

[25] Fan, C., Ghaemi, S., Khazaei, H., & Musilek, P. (2020). Performance evaluation of blockchain systems: A systematic survey. Ieee Access, 8, 126927-126950. DOI: https://doi.org/10.1109/access.2020.3006078

[26] Atilgan, C., & Mercimek, M. (2025). Balancing Precision and Speed: Introducing The Performance Efficiency Evaluation Ratio (PEER) in Visual Odometry. IEEE Access. DOI: https://doi.org/10.1109/access.2025.3571921

[27] Vdoviak, G., Sledevič, T., Serackis, A., Plonis, D., Matuzevičius, D., & Abromavičius, V. (2025). Evaluation of deep learning models for insects detection at the hive entrance for a bee behavior recognition system. Agriculture, 15(10), 1019. DOI: https://doi.org/10.3390/agriculture15101019

[28] Zhang, X., Zhang, Y., Li, Z., Song, Y., Chen, S., Mao, Z., ... & Nie, L. (2025). A real-time cell image segmentation method based on multi-scale feature fusion. Bioengineering, 12(8), 843. DOI: https://doi.org/10.3390/bioengineering12080843

[29] Raghavan, K., B, S., & v, K. (2024). Attention guided grad-CAM: an improved explainable artificial intelligence model for infrared breast cancer detection. Multimedia Tools and Applications, 83(19), 57551-57578. DOI: https://doi.org/10.1007/s11042-023-17776-7

[30] Zhang, Y., Zhu, Y., Liu, J., Yu, W., & Jiang, C. (2024). An interpretability optimization method for deep learning networks based on Grad-CAM. IEEE Internet of Things Journal, 12(4), 3961-3970. DOI: https://doi.org/10.1109/jiot.2024.3485765