Game theory-based electricity pricing and microgrids management using online deep reinforcement learning

2026-03-09


Mahdi Shademan, Ali Azizi, Shahram Jadid,
Game theory-based electricity pricing and microgrids management using online deep reinforcement learning,
Applied Soft Computing,
Volume 182,
2025,
113621,
ISSN 1568-4946,
https://doi.org/10.1016/j.asoc.2025.113621.
(https://www.sciencedirect.com/science/article/pii/S1568494625009329)
Abstract: This study addresses a bi-level problem involving a retailer and multiple residential microgrids. The retailer, at the upper level, disseminates selling and buying electricity price signals to maximize profit, while microgrid agents, at the lower level, manage their resources based on these signals to minimize costs. Additionally, a distribution system operator oversees network constraints. The interaction between microgrids and the retailer is modeled as a Stackelberg game, allowing for double-sided trading. To deal with uncertainties related to sustainable resources, loads, and wholesale market prices, a hybrid fuzzy/stochastic optimization (HFSO) approach is employed. This method combines fuzzy chance-constrained programming at the upper level with risk-neutral programming at the lower level. Due to privacy-preserving concerns, the deep reinforcement learning approach is used to solve this problem. This approach is evolved to online learning to prevent data drift, especially when the load profile changes, and attain an acceptable answer quickly. To prove this claim, the ability to predict profit over a relatively long period is investigated for both the offline learning method and the proposed online learning method. The results show that the offline learning method has a prediction error of 15.54 %, while the online learning method has only a 1.8 % error. Specifically, the online learning method can predict the profit that the retailer will obtain with 96.75 % accuracy, while the offline learning method's prediction fails with −150.64 % accuracy. Also, the online learning method can predict each microgrid’s power transactions with more than 89.6 % accuracy.
Keywords: Microgrids; Resource management; Deep reinforcement learning; Fuzzy chance constrained programming; Model-drift; Demand response