A new perspective on estimating Chlorophyll-a concentrations using machine learning and remote sensing: a case study of New York state lakes

2025-12-21

Jillian A. Greene, Lenny Metlitsky, Andrew Levine, Erin Foley, Melody Henry, Marzieh Azarderakhsh, Reginald A. Blake, Hamidreza Norouzi,
A new perspective on estimating Chlorophyll-a concentrations using machine learning and remote sensing: a case study of New York state lakes,
Ecological Indicators,
Volume 180,
2025,
114316,
ISSN 1470-160X,
https://doi.org/10.1016/j.ecolind.2025.114316.
(https://www.sciencedirect.com/science/article/pii/S1470160X25012488)
Abstract: Algal bloom proliferation across the United States has been increasing in congruence with several anthropogenically influenced processes. Challenges in monitoring algal bloom growth include the high labor and equipment requirements necessary to quantify algal presence which makes widespread availability of data limited. In New York, USA, where numerous inland lakes experience severe anthropogenic impacts, only a fraction of the over 7,000 lakes have reliable quantitative algal bloom data. Operational remote sensing can be used to fill the gap in unmonitored lakes, though previous techniques do not accommodate small lakes due to spatial resolution capabilities, atmospheric correction algorithms, and lack of a comprehensive empirical formula. In this study, we used Landsat-8 and −9, (2013–2023), and Sentinel-2 (2019–2023), processed with inland water-based atmospheric correction techniques, and joined geospatial watershed characteristics to model algal presence in New York lakes. We implemented several machine learning models and found Extra Trees Regression to be the best performing with marginal error (R2 = 0.72, RMSE = 8.19 μg/l). From these results, we can obtain a comprehensive view of algal blooms across New York from 2013 − the present, which can inform stakeholders of the trends in presence and concentrations in both large and small lakes. Results from this study will be made available on a public interface that will contain in situ data, key lake characteristics, model time-series, and model raster predictions. This model procedure is easily replicable for expansion outside of New York State and can be expanded to include more lakes in other regions.
Keywords: Algal blooms; Landsat-8; Landsat-9; Machine learning; Remote sensing; Sentinel-2