This project develops adaptive AI methods for understanding human activities in dynamic, real-world video. Most activity-recognition systems work under a closed-world assumption and hence can only recognise actions seen during training. In applications like, public safety, healthcare monitoring, and smart environments, new behaviours emerge continuously and often need to be recognised without rebuilding the system from scratch. This research focuses on developing a Continual Open-Vocabulary Action Detection framework that can learn new activity concepts over time, localise them in video, and retain previously learned knowledge. The aim is to make video understanding algorithms more flexible and useful in changing environments where safety, wellbeing, and timely response matter.
Human activity analysis is a key capability for intelligent systems that need to interpret what is happening in a scene, when it is happening, and why it may matter. Current methods have made strong progress in recognising predefined activities. The recent vision-language models have opened the door to open-vocabulary recognition, where actions can be described as concepts rather than fixed classes. These advances improve generalisation, but they do not fully solve the problem of long-term adaptation where deployed system may still struggle when environments, routines, risks, or user needs evolve.
This research brings together open-vocabulary learning, continual learning, and spatiotemporal video understanding. The framework in this research explores how models can detect activities from semantic descriptions, learn incrementally from new examples, and avoid catastrophic forgetting of previously learned behaviours. This is especially important for real deployments, where retraining from scratch is expensive, data access may be limited, and the system must remain reliable as new scenarios appear.
The wider impact is in making activity-aware AI more practical for society-facing applications. In assisted living, it could support safer monitoring while adapting to individual routines. In healthcare and rehabilitation, it could help track clinically relevant movements and behaviours over time. In public safety and smart-city contexts, it could support more responsive systems that recognise emerging events without requiring every possible scenario to be specified in advance. The project is therefore a great opportunity for collaboration between researchers and practitioners in computer vision, healthcare technology, robotics, human behaviour analysis, security, and responsible AI.