Decoding oxygen preference: Machine learning discovers functional genes in Bacteria
Siqi Wan, Haida Liu, Geyi Zhu, Yuanming Geng, Wenhao Li, Lijuan Chen, Yunhua Zhang, Guomin Han,
Decoding oxygen preference: Machine learning discovers functional genes in Bacteria,
Genomics,
Volume 117, Issue 5,
2025,
111095,
ISSN 0888-7543,
https://doi.org/10.1016/j.ygeno.2025.111095.
(https://www.sciencedirect.com/science/article/pii/S0888754325001119)
Abstract: Predicting bacterial oxygen preference and identifying associated genes is critical in microbiology. This study developed a machine learning model using genomic features to predict bacterial oxygen preference and discover potential functional genes. Trained on a dataset of 1813 bacterial genomes, a Random Forest model achieved 90.62 % accuracy in predicting oxygen preference, outperforming prior methods. Feature analysis pinpointed key protein domains and candidate genes. Experimental overexpression of model-identified genes (encoding SOD, SAM radical enzyme, GCV-T, FDH domains) in Escherichia coli enhanced growth under aerobic conditions, validating their role in oxygen adaptation. Applying the model to rumen metagenomes revealed a predominantly anaerobic community. This work establishes machine learning as an effective strategy for bacterial oxygen preference prediction and functional gene identification, offering a novel strategy and tool for in-depth understanding of bacterial oxygen adaptation mechanisms, discovering key functional genes, and efficient exploration of uncultured microbial resources.
Keywords: Machine learning; Bacterial oxygen requirement; Protein domain; Gene function; Application