Research on Long Sequence Learning Behavior Modeling Based on Transformer-XL

In view of the difficulties in modeling long-sequence dependence and the high computational complexity of online learning behavior data, this paper proposes a long sequence learning behavior modeling method based on Transformer-XL. This method improves the performance of the model from the two levels of structure and information modeling by constructing multidimensional behavior feature representation, integrating dynamic memory enhancement mechanism, behavioral semantic perception attention and sparse long sequence modeling strategy. Experimental results on real educational datasets such as ASSISTments and EdNet show that the proposed model is superior to the mainstream methods in terms of AUC, ACC and RMSE. The AUC increases by about 4.2% and RMSE decreases by about 8.1%. Further ablation experiments and parameter analysis verify the effectiveness of each module. Cross dataset experiments and noise tests show that the model has good generalization ability and robustness. In addition, interpretability analysis shows that the model can effectively focus on key learning behaviors. The results show that this method has significant advantages in the long sequence learning behavior modeling task, and provides effective support for personalized recommendation and learning state evaluation in intelligent education system.

Saqr, M., & López-Pernas, S. (2023). The temporal dynamics of online problem-based learning: Why and when sequence matters. International Journal of Computer-Supported Collaborative Learning, 18(1), 11-37. DOI: 10.1007/s11412-023-09385-1
Hippalgaonkar, K., Li, Q., Wang, X., Fisher III, J. W., Kirkpatrick, J., & Buonassisi, T. (2023). Knowledge-integrated machine learning for materials: lessons from gameplaying and robotics. Nature Reviews Materials, 8(4), 241-260. DOI: 10.1038/s41578-022-00513-1
Lin, Y., Chen, H., Xia, W., Lin, F., Wang, Z., & Liu, Y. (2025). A Comprehensive Survey on Deep Learning Techniques in Educational Data Mining. Data Science and Engineering, 1-27. DOI: 10.1007/s41019-025-00303-z
Charitopoulos, A., Rangoussi, M., & Koulouriotis, D. (2020). On the use of soft computing methods in educational data mining and learning analytics research: A review of years 2010–2018. International Journal of Artificial Intelligence in Education, 30(3), 371-430. DOI: 10.1007/s40593-020-00200-8
Du, X., Yang, J., Hung, J. L., & Shelton, B. (2020). Educational data mining: a systematic review of research and emerging trends. Information Discovery and Delivery, 48(4), 225-236. DOI: 10.1108/IDD-09-2019-0070
Chen, J. A., Niu, W., Ren, B., Wang, Y., & Shen, X. (2023). Survey: Exploiting data redundancy for optimization of deep learning. ACM Computing Surveys, 55(10), 1-38. DOI: 10.1145/3564663
Winget, M., & Persky, A. M. (2022). A practical review of mastery learning. American Journal of Pharmaceutical Education, 86(10), ajpe8906. DOI: 10.5688/ajpe8906
Chen, W. (2025). Problem-solving skills, memory power, and early childhood mathematics: Understanding the significance of the early childhood mathematics in an individual's life. Journal of the Knowledge Economy, 16(1), 1-25. DOI: 10.1007/s13132-023-01557-6
Han, L., Checco, A., Difallah, D., Demartini, G., & Sadiq, S. (2020). Modelling user behavior dynamics with embeddings. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (pp. 445-454). DOI: 10.1145/3340531.3411985
Mienye, I. D., Swart, T. G., & Obaido, G. (2024). Recurrent neural networks: A comprehensive review of architectures, variants, and applications. Information, 15(9), 517. DOI: 10.3390/info15090517
Das, S., Tariq, A., Santos, T., Kantareddy, S. S., & Banerjee, I. (2023). Recurrent neural networks (RNNs): architectures, training tricks, and introduction to influential research. In Machine Learning for Brain Disorders (pp. 117-138). DOI: 10.1007/978-1-0716-3195-9_4
Tsantekidis, A., Passalis, N., & Tefas, A. (2022). Recurrent neural networks. In Deep Learning for Robot Perception and Cognition (pp. 101-115). Academic Press. DOI: 10.1016/B978-0-32-385787-1.00010-5
Lindemann, B., Müller, T., Vietz, H., Jazdi, N., & Weyrich, M. (2021). A survey on long short-term memory networks for time series prediction. Procedia CIRP, 99, 650-655. DOI: 10.1016/j.procir.2021.03.088
Rezk, N. M., Purnaprajna, M., Nordström, T., & Ul-Abdin, Z. (2020). Recurrent neural networks: An embedded computing perspective. IEEE Access, 8, 57967-57996. DOI: 10.1109/ACCESS.2020.2982416
Xie, W., Wang, H., Fang, M., Yu, R., Guo, W., Liu, Y., ... & Chen, E. (2025). Breaking the Bottleneck: User-Specific Optimization and Real-Time Inference Integration for Sequential Recommendation. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (pp. 3333-3343). DOI: 10.1145/3711896.3736865
Wang, X., Zhang, C., Chen, L., & Zhong, P. (2025). Optimization and Practice of Long Text Foreign Language Translation Algorithm Based on Transformer-XL Architecture. Procedia Computer Science, 262, 766-775. DOI: 10.1016/j.procs.2025.05.109
Alva Principe, R., Chiarini, N., & Viviani, M. (2025). Long Document classification in the transformer era: a survey on challenges, advances, and open issues. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 15(2), e70019. DOI: 10.1002/widm.70019
Hernández, A., & Amigó, J. M. (2021). Attention mechanisms and their applications to complex systems. Entropy, 23(3), 283. DOI: 10.3390/e23030283
Guo, M. H., Xu, T. X., Liu, J. J., Liu, Z. N., Jiang, P. T., Mu, T. J., ... & Hu, S. M. (2022). Attention mechanisms in computer vision: A survey. Computational Visual Media, 8(3), 331-368. DOI: 10.1007/s41095-022-0271-y
Šarić-Grgić, I., Grubišić, A., & Gašpar, A. (2024). Twenty-five years of Bayesian knowledge tracing: a systematic review. User Modeling and User-Adapted Interaction, 34(4), 1127-1173. DOI: 10.1007/s11257-023-09389-4
Lyu, L., Wang, Z., Yun, H., Yang, Z., & Li, Y. (2022). Deep knowledge tracing based on spatial and temporal representation learning for learning performance prediction. Applied Sciences, 12(14), 7188. DOI: 10.3390/app12147188
Ma, F., Zhu, C., & Liu, D. (2024). A deeper knowledge tracking model integrating cognitive theory and learning behavior. Journal of Intelligent & Fuzzy Systems, 46(3), 6607-6617. DOI: 10.3233/JIFS-235723
Noh, S. H. (2021). Analysis of gradient vanishing of RNNs and performance comparison. Information, 12(11), 442. DOI: 10.3390/info12110442
Liu, H. I., & Chen, W. L. (2022). X-transformer: a machine translation model enhanced by the self-attention mechanism. Applied Sciences, 12(9), 4502. DOI: 10.3390/app12094502
Choi, S. R., & Lee, M. (2023). Transformer architecture and attention mechanisms in genome data analysis: a comprehensive review. Biology, 12(7), 1033. DOI: 10.3390/biology12071033
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., ... & Liu, T. (2025). A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43(2), 1-55. DOI: 10.1145/3703155
Xu, C., Feng, J., Zhao, P., Zhuang, F., Wang, D., Liu, Y., & Sheng, V. S. (2021). Long-and short-term self-attention network for sequential recommendation. Neurocomputing, 423, 580-589. DOI: 10.1016/j.neucom.2020.10.066
Jierula, A., Wang, S., Oh, T. M., & Wang, P. (2021). Study on accuracy metrics for evaluating the predictions of damage locations in deep piles using artificial neural networks with acoustic emission data. Applied Sciences, 11(5), 2314. DOI: 10.3390/app11052314
Saldaña-Villota, T. M., & Cotes-Torres, J. M. (2021). Comparison of statistical indices for the evaluation of crop models performance. Revista Facultad Nacional de Agronomía Medellín, 74(3), 9675-9684. DOI: 10.15446/rfnam.v74n3.93562
Namdar, K., Haider, M. A., & Khalvati, F. (2021). A modified AUC for training convolutional neural networks: taking confidence into account. Frontiers in Artificial Intelligence, 4, 582928. DOI: 10.3389/frai.2021.582928