Research on Long Sequence Learning Behavior Modeling Based on Transformer-XL

Yuxiao Qin1
1 Education College, Seoul School of Integrated Sciences and Technologies, Seoul, Republic of Korea
International Scientific Technical and Economic Research 2026, Vol. 4, No. 2, pp. 1-20
DOI: 10.71451/ISTAER2613
Received: 19 January 2026; Revised: 28 February 2026; Accepted: 27 March 2026; Published: 2 April 2026
Abstract

In view of the difficulties in modeling long-sequence dependence and the high computational complexity of online learning behavior data, this paper proposes a long sequence learning behavior modeling method based on Transformer-XL. This method improves the performance of the model from the two levels of structure and information modeling by constructing multidimensional behavior feature representation, integrating dynamic memory enhancement mechanism, behavioral semantic perception attention and sparse long sequence modeling strategy. Experimental results on real educational datasets such as ASSISTments and EdNet show that the proposed model is superior to the mainstream methods in terms of AUC, ACC and RMSE. The AUC increases by about 4.2% and RMSE decreases by about 8.1%. Further ablation experiments and parameter analysis verify the effectiveness of each module. Cross dataset experiments and noise tests show that the model has good generalization ability and robustness. In addition, interpretability analysis shows that the model can effectively focus on key learning behaviors. The results show that this method has significant advantages in the long sequence learning behavior modeling task, and provides effective support for personalized recommendation and learning state evaluation in intelligent education system.

Keywords
Transformer-XL Long sequence modeling Learning behavior analysis Attention mechanism Dynamic memory
References
  1. Saqr, M., & López-Pernas, S. (2023). The temporal dynamics of online problem-based learning: Why and when sequence matters. International Journal of Computer-Supported Collaborative Learning, 18(1), 11-37. DOI: 10.1007/s11412-023-09385-1
  2. Hippalgaonkar, K., Li, Q., Wang, X., Fisher III, J. W., Kirkpatrick, J., & Buonassisi, T. (2023). Knowledge-integrated machine learning for materials: lessons from gameplaying and robotics. Nature Reviews Materials, 8(4), 241-260. DOI: 10.1038/s41578-022-00513-1
  3. Lin, Y., Chen, H., Xia, W., Lin, F., Wang, Z., & Liu, Y. (2025). A Comprehensive Survey on Deep Learning Techniques in Educational Data Mining. Data Science and Engineering, 1-27. DOI: 10.1007/s41019-025-00303-z
  4. Charitopoulos, A., Rangoussi, M., & Koulouriotis, D. (2020). On the use of soft computing methods in educational data mining and learning analytics research: A review of years 2010–2018. International Journal of Artificial Intelligence in Education, 30(3), 371-430. DOI: 10.1007/s40593-020-00200-8
  5. Du, X., Yang, J., Hung, J. L., & Shelton, B. (2020). Educational data mining: a systematic review of research and emerging trends. Information Discovery and Delivery, 48(4), 225-236. DOI: 10.1108/IDD-09-2019-0070
  6. Chen, J. A., Niu, W., Ren, B., Wang, Y., & Shen, X. (2023). Survey: Exploiting data redundancy for optimization of deep learning. ACM Computing Surveys, 55(10), 1-38. DOI: 10.1145/3564663
  7. Winget, M., & Persky, A. M. (2022). A practical review of mastery learning. American Journal of Pharmaceutical Education, 86(10), ajpe8906. DOI: 10.5688/ajpe8906
  8. Chen, W. (2025). Problem-solving skills, memory power, and early childhood mathematics: Understanding the significance of the early childhood mathematics in an individual's life. Journal of the Knowledge Economy, 16(1), 1-25. DOI: 10.1007/s13132-023-01557-6
  9. Han, L., Checco, A., Difallah, D., Demartini, G., & Sadiq, S. (2020). Modelling user behavior dynamics with embeddings. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (pp. 445-454). DOI: 10.1145/3340531.3411985
  10. Mienye, I. D., Swart, T. G., & Obaido, G. (2024). Recurrent neural networks: A comprehensive review of architectures, variants, and applications. Information, 15(9), 517. DOI: 10.3390/info15090517
  11. Das, S., Tariq, A., Santos, T., Kantareddy, S. S., & Banerjee, I. (2023). Recurrent neural networks (RNNs): architectures, training tricks, and introduction to influential research. In Machine Learning for Brain Disorders (pp. 117-138). DOI: 10.1007/978-1-0716-3195-9_4
  12. Tsantekidis, A., Passalis, N., & Tefas, A. (2022). Recurrent neural networks. In Deep Learning for Robot Perception and Cognition (pp. 101-115). Academic Press. DOI: 10.1016/B978-0-32-385787-1.00010-5
  13. Lindemann, B., Müller, T., Vietz, H., Jazdi, N., & Weyrich, M. (2021). A survey on long short-term memory networks for time series prediction. Procedia CIRP, 99, 650-655. DOI: 10.1016/j.procir.2021.03.088
  14. Rezk, N. M., Purnaprajna, M., Nordström, T., & Ul-Abdin, Z. (2020). Recurrent neural networks: An embedded computing perspective. IEEE Access, 8, 57967-57996. DOI: 10.1109/ACCESS.2020.2982416
  15. Xie, W., Wang, H., Fang, M., Yu, R., Guo, W., Liu, Y., ... & Chen, E. (2025). Breaking the Bottleneck: User-Specific Optimization and Real-Time Inference Integration for Sequential Recommendation. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (pp. 3333-3343). DOI: 10.1145/3711896.3736865
  16. Wang, X., Zhang, C., Chen, L., & Zhong, P. (2025). Optimization and Practice of Long Text Foreign Language Translation Algorithm Based on Transformer-XL Architecture. Procedia Computer Science, 262, 766-775. DOI: 10.1016/j.procs.2025.05.109
  17. Alva Principe, R., Chiarini, N., & Viviani, M. (2025). Long Document classification in the transformer era: a survey on challenges, advances, and open issues. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 15(2), e70019. DOI: 10.1002/widm.70019
  18. Hernández, A., & Amigó, J. M. (2021). Attention mechanisms and their applications to complex systems. Entropy, 23(3), 283. DOI: 10.3390/e23030283
  19. Guo, M. H., Xu, T. X., Liu, J. J., Liu, Z. N., Jiang, P. T., Mu, T. J., ... & Hu, S. M. (2022). Attention mechanisms in computer vision: A survey. Computational Visual Media, 8(3), 331-368. DOI: 10.1007/s41095-022-0271-y
  20. Šarić-Grgić, I., Grubišić, A., & Gašpar, A. (2024). Twenty-five years of Bayesian knowledge tracing: a systematic review. User Modeling and User-Adapted Interaction, 34(4), 1127-1173. DOI: 10.1007/s11257-023-09389-4
  21. Lyu, L., Wang, Z., Yun, H., Yang, Z., & Li, Y. (2022). Deep knowledge tracing based on spatial and temporal representation learning for learning performance prediction. Applied Sciences, 12(14), 7188. DOI: 10.3390/app12147188
  22. Ma, F., Zhu, C., & Liu, D. (2024). A deeper knowledge tracking model integrating cognitive theory and learning behavior. Journal of Intelligent & Fuzzy Systems, 46(3), 6607-6617. DOI: 10.3233/JIFS-235723
  23. Noh, S. H. (2021). Analysis of gradient vanishing of RNNs and performance comparison. Information, 12(11), 442. DOI: 10.3390/info12110442
  24. Liu, H. I., & Chen, W. L. (2022). X-transformer: a machine translation model enhanced by the self-attention mechanism. Applied Sciences, 12(9), 4502. DOI: 10.3390/app12094502
  25. Choi, S. R., & Lee, M. (2023). Transformer architecture and attention mechanisms in genome data analysis: a comprehensive review. Biology, 12(7), 1033. DOI: 10.3390/biology12071033
  26. Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., ... & Liu, T. (2025). A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43(2), 1-55. DOI: 10.1145/3703155
  27. Xu, C., Feng, J., Zhao, P., Zhuang, F., Wang, D., Liu, Y., & Sheng, V. S. (2021). Long-and short-term self-attention network for sequential recommendation. Neurocomputing, 423, 580-589. DOI: 10.1016/j.neucom.2020.10.066
  28. Jierula, A., Wang, S., Oh, T. M., & Wang, P. (2021). Study on accuracy metrics for evaluating the predictions of damage locations in deep piles using artificial neural networks with acoustic emission data. Applied Sciences, 11(5), 2314. DOI: 10.3390/app11052314
  29. Saldaña-Villota, T. M., & Cotes-Torres, J. M. (2021). Comparison of statistical indices for the evaluation of crop models performance. Revista Facultad Nacional de Agronomía Medellín, 74(3), 9675-9684. DOI: 10.15446/rfnam.v74n3.93562
  30. Namdar, K., Haider, M. A., & Khalvati, F. (2021). A modified AUC for training convolutional neural networks: taking confidence into account. Frontiers in Artificial Intelligence, 4, 582928. DOI: 10.3389/frai.2021.582928