Drone-guard: A self-supervised deep learning framework for real-time spatiotemporal anomaly detection in UAV surveillance systems
Wassim Sliti, Olfa Besbes,
Drone-guard: A self-supervised deep learning framework for real-time spatiotemporal anomaly detection in UAV surveillance systems,
Neurocomputing,
Volume 653,
2025,
131168,
ISSN 0925-2312,
https://doi.org/10.1016/j.neucom.2025.131168.
(https://www.sciencedirect.com/science/article/pii/S0925231225018405)
Abstract: Anomaly detection is a cornerstone of intelligent video surveillance, facilitating the early identification of irregular or potentially dangerous events. While deep learning has significantly advanced this domain, current approaches often face a fundamental trade-off between detection accuracy and computational efficiency. High-accuracy models are typically computationally intensive and unsuitable for real-time deployment on resource-constrained platforms such as Unmanned Aerial Vehicles (UAVs). Conversely, lightweight alternatives often lack the representational capacity to model complex spatiotemporal patterns in dynamic environments. To address this challenge, we present Drone-Guard, a self-supervised deep learning framework designed for real-time spatiotemporal anomaly detection in UAV surveillance systems. Drone-Guard introduces three core contributions: (1) a lightweight encoder-decoder architecture with multi-stage feature extraction to jointly capture fine-grained and high-level spatial structures; (2) a novel Multi-Scale Grouped Query Attention (MS-GQA) mechanism for efficient fusion of hierarchical spatial and temporal features, enabling context-aware anomaly modeling; and (3) a Residual Vector Quantization (RVQ) module that enhances latent representation compactness and reconstruction fidelity, crucial for discriminative anomaly detection. To overcome the lack of labeled anomalies—a central limitation in self-supervised learning—we further propose a latent-space pseudo-anomaly synthesizer. This component perturbs the learned representations of normal samples to generate synthetic anomalies, thereby facilitating effective decision boundary learning without requiring manual annotations. Extensive experiments and ablation studies on multiple benchmark datasets demonstrate that Drone-Guard outperforms state-of-the-art methods in both accuracy and efficiency. Its low computational footprint and robust anomaly localization capabilities make it well-suited for real-time deployment in edge-aware, IoT-enabled UAV surveillance applications. The source code and pre-trained models of the proposed framework are publicly available at: https://github.com/slitiWassim/Drone-Guard.
Keywords: Self-supervised learning; Spatiotemporal anomaly detection; UAV-based video surveillance; Multi-scale attention mechanism; Residual vector quantization (RVQ); Hierarchical feature fusion; Edge-aware deep learning