From LLM to Agent: A large-language-model-driven machine learning framework for catalyst design of MgH2 dehydrogenation

2026-01-02

Tongao Yao, Yang Yang, Jianghao Cai, Rui Liu, Zhaoyan Dong, Xiaotian Tang, Xuqiang Shao, Zhengyang Gao, Guangyao An, Weijie Yang,
From LLM to Agent: A large-language-model-driven machine learning framework for catalyst design of MgH2 dehydrogenation,
Journal of Magnesium and Alloys,
2025,
,
ISSN 2213-9567,
https://doi.org/10.1016/j.jma.2025.08.021.
(https://www.sciencedirect.com/science/article/pii/S2213956725002853)
Abstract: Magnesium hydride (MgH2), a promising high-capacity hydrogen storage material, is hindered by slow dehydrogenation kinetics. AI-driven catalyst discovery to address this is often hampered by the laborious extraction of data from unstructured literature. To overcome this, we introduce a transformative “LLM to Agent” framework that synergistically integrates Large Language Models (LLMs) for automated data curation with Machine Learning (ML) for predictive design. We automatically constructed a comprehensive database of 809 MgH2 catalysts (6555 data rows) with high fidelity and an ∼40-fold acceleration over manual methods. The resulting ML models achieved high accuracy (average R² > 0.91) in predicting dehydrogenation temperature and activation energy, subsequently guiding a Genetic Algorithm (GA) in an exploratory inverse design that autonomously uncovered key design principles for high-performance catalysts. Encouragingly, a strong alignment was found between these AI-discovered principles and the design strategies of recently reported, state-of-the-art experimental systems, providing substantial evidence for the validity of our approach. The framework culminates in Cat-Advisor, a novel, domain-adapted multi-agent system. Cat-Advisor translates ML predictions and retrieval-augmented knowledge into actionable design guidance, demonstrating capabilities that surpass those of general-purpose LLMs in this specialized domain. This work delivers a practical AI toolkit for accelerated materials discovery and advances the emerging Agent-based paradigm for designing next-generation energy technologies.
Keywords: MgH2 dehydrogenation; Large language model; Machine learning; Genetic algorithm; Catalyst design; Hydrogen storage