Research on Long Sequence Learning Behavior Modeling Based on Transformer-XL
1 Education College, Seoul School of Integrated Sciences and Technologies, Seoul, Republic of Korea
Abstract
In view of the difficulties in modeling long-sequence dependence and the high computational complexity of online learning behavior data, this paper proposes a long sequence learning behavior modeling method based on Transformer-XL. This method improves the performance of the model from the two levels of structure and information modeling by constructing multidimensional behavior feature representation, integrating dynamic memory enhancement mechanism, behavioral semantic perception attention and sparse long sequence modeling strategy. Experimental results on real educational datasets such as ASSISTments and EdNet show that the proposed model is superior to the mainstream methods in terms of AUC, ACC and RMSE. The AUC increases by about 4.2% and RMSE decreases by about 8.1%. Further ablation experiments and parameter analysis verify the effectiveness of each module. Cross dataset experiments and noise tests show that the model has good generalization ability and robustness. In addition, interpretability analysis shows that the model can effectively focus on key learning behaviors. The results show that this method has significant advantages in the long sequence learning behavior modeling task, and provides effective support for personalized recommendation and learning state evaluation in intelligent education system.
Keywords
Transformer-XL
Long sequence modeling
Learning behavior analysis
Attention mechanism
Dynamic memory
References
- Saqr, M., & López-Pernas, S. (2023). The temporal dynamics of online problem-based learning: Why and when sequence matters. International Journal of Computer-Supported Collaborative Learning, 18(1), 11-37. DOI: 10.1007/s11412-023-09385-1
- Hippalgaonkar, K., Li, Q., Wang, X., Fisher III, J. W., Kirkpatrick, J., & Buonassisi, T. (2023). Knowledge-integrated machine learning for materials: lessons from gameplaying and robotics. Nature Reviews Materials, 8(4), 241-260. DOI: 10.1038/s41578-022-00513-1
- Lin, Y., Chen, H., Xia, W., Lin, F., Wang, Z., & Liu, Y. (2025). A Comprehensive Survey on Deep Learning Techniques in Educational Data Mining. Data Science and Engineering, 1-27. DOI: 10.1007/s41019-025-00303-z
- Charitopoulos, A., Rangoussi, M., & Koulouriotis, D. (2020). On the use of soft computing methods in educational data mining and learning analytics research: A review of years 2010–2018. International Journal of Artificial Intelligence in Education, 30(3), 371-430. DOI: 10.1007/s40593-020-00200-8
- Du, X., Yang, J., Hung, J. L., & Shelton, B. (2020). Educational data mining: a systematic review of research and emerging trends. Information Discovery and Delivery, 48(4), 225-236. DOI: 10.1108/IDD-09-2019-0070
- Chen, J. A., Niu, W., Ren, B., Wang, Y., & Shen, X. (2023). Survey: Exploiting data redundancy for optimization of deep learning. ACM Computing Surveys, 55(10), 1-38. DOI: 10.1145/3564663
- Winget, M., & Persky, A. M. (2022). A practical review of mastery learning. American Journal of Pharmaceutical Education, 86(10), ajpe8906. DOI: 10.5688/ajpe8906
- Chen, W. (2025). Problem-solving skills, memory power, and early childhood mathematics: Understanding the significance of the early childhood mathematics in an individual's life. Journal of the Knowledge Economy, 16(1), 1-25. DOI: 10.1007/s13132-023-01557-6
- Han, L., Checco, A., Difallah, D., Demartini, G., & Sadiq, S. (2020). Modelling user behavior dynamics with embeddings. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (pp. 445-454). DOI: 10.1145/3340531.3411985
- Mienye, I. D., Swart, T. G., & Obaido, G. (2024). Recurrent neural networks: A comprehensive review of architectures, variants, and applications. Information, 15(9), 517. DOI: 10.3390/info15090517
- Das, S., Tariq, A., Santos, T., Kantareddy, S. S., & Banerjee, I. (2023). Recurrent neural networks (RNNs): architectures, training tricks, and introduction to influential research. In Machine Learning for Brain Disorders (pp. 117-138). DOI: 10.1007/978-1-0716-3195-9_4
- Tsantekidis, A., Passalis, N., & Tefas, A. (2022). Recurrent neural networks. In Deep Learning for Robot Perception and Cognition (pp. 101-115). Academic Press. DOI: 10.1016/B978-0-32-385787-1.00010-5
- Lindemann, B., Müller, T., Vietz, H., Jazdi, N., & Weyrich, M. (2021). A survey on long short-term memory networks for time series prediction. Procedia CIRP, 99, 650-655. DOI: 10.1016/j.procir.2021.03.088
- Rezk, N. M., Purnaprajna, M., Nordström, T., & Ul-Abdin, Z. (2020). Recurrent neural networks: An embedded computing perspective. IEEE Access, 8, 57967-57996. DOI: 10.1109/ACCESS.2020.2982416
- Xie, W., Wang, H., Fang, M., Yu, R., Guo, W., Liu, Y., ... & Chen, E. (2025). Breaking the Bottleneck: User-Specific Optimization and Real-Time Inference Integration for Sequential Recommendation. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (pp. 3333-3343). DOI: 10.1145/3711896.3736865
- Wang, X., Zhang, C., Chen, L., & Zhong, P. (2025). Optimization and Practice of Long Text Foreign Language Translation Algorithm Based on Transformer-XL Architecture. Procedia Computer Science, 262, 766-775. DOI: 10.1016/j.procs.2025.05.109
- Alva Principe, R., Chiarini, N., & Viviani, M. (2025). Long Document classification in the transformer era: a survey on challenges, advances, and open issues. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 15(2), e70019. DOI: 10.1002/widm.70019
- Hernández, A., & Amigó, J. M. (2021). Attention mechanisms and their applications to complex systems. Entropy, 23(3), 283. DOI: 10.3390/e23030283
- Guo, M. H., Xu, T. X., Liu, J. J., Liu, Z. N., Jiang, P. T., Mu, T. J., ... & Hu, S. M. (2022). Attention mechanisms in computer vision: A survey. Computational Visual Media, 8(3), 331-368. DOI: 10.1007/s41095-022-0271-y
- Šarić-Grgić, I., Grubišić, A., & Gašpar, A. (2024). Twenty-five years of Bayesian knowledge tracing: a systematic review. User Modeling and User-Adapted Interaction, 34(4), 1127-1173. DOI: 10.1007/s11257-023-09389-4
- Lyu, L., Wang, Z., Yun, H., Yang, Z., & Li, Y. (2022). Deep knowledge tracing based on spatial and temporal representation learning for learning performance prediction. Applied Sciences, 12(14), 7188. DOI: 10.3390/app12147188
- Ma, F., Zhu, C., & Liu, D. (2024). A deeper knowledge tracking model integrating cognitive theory and learning behavior. Journal of Intelligent & Fuzzy Systems, 46(3), 6607-6617. DOI: 10.3233/JIFS-235723
- Noh, S. H. (2021). Analysis of gradient vanishing of RNNs and performance comparison. Information, 12(11), 442. DOI: 10.3390/info12110442
- Liu, H. I., & Chen, W. L. (2022). X-transformer: a machine translation model enhanced by the self-attention mechanism. Applied Sciences, 12(9), 4502. DOI: 10.3390/app12094502
- Choi, S. R., & Lee, M. (2023). Transformer architecture and attention mechanisms in genome data analysis: a comprehensive review. Biology, 12(7), 1033. DOI: 10.3390/biology12071033
- Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., ... & Liu, T. (2025). A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43(2), 1-55. DOI: 10.1145/3703155
- Xu, C., Feng, J., Zhao, P., Zhuang, F., Wang, D., Liu, Y., & Sheng, V. S. (2021). Long-and short-term self-attention network for sequential recommendation. Neurocomputing, 423, 580-589. DOI: 10.1016/j.neucom.2020.10.066
- Jierula, A., Wang, S., Oh, T. M., & Wang, P. (2021). Study on accuracy metrics for evaluating the predictions of damage locations in deep piles using artificial neural networks with acoustic emission data. Applied Sciences, 11(5), 2314. DOI: 10.3390/app11052314
- Saldaña-Villota, T. M., & Cotes-Torres, J. M. (2021). Comparison of statistical indices for the evaluation of crop models performance. Revista Facultad Nacional de Agronomía Medellín, 74(3), 9675-9684. DOI: 10.15446/rfnam.v74n3.93562
- Namdar, K., Haider, M. A., & Khalvati, F. (2021). A modified AUC for training convolutional neural networks: taking confidence into account. Frontiers in Artificial Intelligence, 4, 582928. DOI: 10.3389/frai.2021.582928