Volleyball Analytics for real-time decisions
Quantum developed an on-device ML model that analyzes video streaming to classify volleyball serves in real-time, exceeding the project's accuracy target for instant sports analytics.
The client develops a new generation of home surveillance and assistance ecosystems that includes smart doorbells, home surveillance systems, and voice-activated home assistants empowered by AI.
The client has a range of devices for home surveillance and wants to use video and audio information to improve security in the home and neighborhood, automatically analyzing and processing the incoming data and notifying its users in case of any suspicious activity.
The developed solution processes video and audio feeds from the recording devices. The video processing module is aimed at detecting people on the video feed and determining if the person belongs to the group of authorized users or not. The audio processing module listens to custom voice activation commands, e.g., “Okay Google”, “Hey Siri”, to be activated. Audio processing also includes target sound detection – baby cry monitoring, gunshot, glass-shattering sounds, and others. The modules send alerts to the user to notify him/her of any disturbance cases.
Quantum team has created a pipeline that takes video and audio frames from data streams and processes them. The processing for video frames includes target object detection – people, pets, etc. Audio processing consists of 2 separate modules – wake word detection and target sound detection. Wake word detection spots the keyword to activate the user’s command. Target sound detection (for example, baby crying, gunshot, glass shattering sounds) distinguishes sounds of interest in the audio stream and sends an alert to the system in case of positive classification.
The dataset for custom wake word detection was collected using a crowdsourcing website – Amazon Mechanical Turk. It allowed training the model for various accents and voice timbres. The interface was set up, and the audio recordings were filtered and assessed for quality for further usage in preprocessing, modeling, and evaluation.
The project was developed with Python. OpenCV and Tensorflow were mainly used in the video processing module, while audio processing modules incorporated Librosa, SoundFile, Tensorflow and ScikitLearn. Postprocessing included Pandas and NumPy usage.