From LLM to Agent: A large-language-model-driven machine learning framework for catalyst design of MgH2 dehydrogenation
Tongao Yao, Yang Yang, Jianghao Cai, Rui Liu, Zhaoyan Dong, Xiaotian Tang, Xuqiang Shao, Zhengyang Gao, Guangyao An, Weijie Yang,
From LLM to Agent: A large-language-model-driven machine learning framework for catalyst design of MgH2 dehydrogenation,
Journal of Magnesium and Alloys,
2025,
,
ISSN 2213-9567,
https://doi.org/10.1016/j.jma.2025.08.021.
(https://www.sciencedirect.com/science/article/pii/S2213956725002853)
Abstract: Magnesium hydride (MgH2), a promising high-capacity hydrogen storage material, is hindered by slow dehydrogenation kinetics. AI-driven catalyst discovery to address this is often hampered by the laborious extraction of data from unstructured literature. To overcome this, we introduce a transformative “LLM to Agent” framework that synergistically integrates Large Language Models (LLMs) for automated data curation with Machine Learning (ML) for predictive design. We automatically constructed a comprehensive database of 809 MgH2 catalysts (6555 data rows) with high fidelity and an ∼40-fold acceleration over manual methods. The resulting ML models achieved high accuracy (average R² > 0.91) in predicting dehydrogenation temperature and activation energy, subsequently guiding a Genetic Algorithm (GA) in an exploratory inverse design that autonomously uncovered key design principles for high-performance catalysts. Encouragingly, a strong alignment was found between these AI-discovered principles and the design strategies of recently reported, state-of-the-art experimental systems, providing substantial evidence for the validity of our approach. The framework culminates in Cat-Advisor, a novel, domain-adapted multi-agent system. Cat-Advisor translates ML predictions and retrieval-augmented knowledge into actionable design guidance, demonstrating capabilities that surpass those of general-purpose LLMs in this specialized domain. This work delivers a practical AI toolkit for accelerated materials discovery and advances the emerging Agent-based paradigm for designing next-generation energy technologies.
Keywords: MgH2 dehydrogenation; Large language model; Machine learning; Genetic algorithm; Catalyst design; Hydrogen storage