OpenYAMNet/YAMNet+

This repository contains the models for OpenYAMNet/YAMNet+, introduced in the paper A dataset and model for auditory scene recognition for hearing devices: AHEAD-DS and OpenYAMNet.

OpenYAMNet is a sound recognition model designed for deployment on edge devices like smartphones connected to hearing devices (e.g., hearing aids and wireless earphones). It serves as a baseline model for sound-based scene recognition.

Model Performance

OpenYAMNet achieved the following results on the testing set of the AHEAD-DS dataset:

Mean Average Precision (mAP): 0.86
Accuracy: 0.93

The model is optimized for real-time use, with approximately 50ms of latency for loading and a processing time of ~30ms per 1 second of audio on a 2018 Google Pixel 3.

Resources

Supported Scene Classes

Classes
cocktail_party
interfering_speakers
in_traffic
in_vehicle
music
quiet_indoors
reverberant_environment
wind_turbulence
speech_in_traffic
speech_in_vehicle
speech_in_music
speech_in_quiet_indoors
speech_in_reverberant_environment
speech_in_wind_turbulence

Licence

Licenced under CC BY-SA 4.0. See LICENCE.txt. Licence for original YAMNet weights.

Citation

@misc{zhong2026datasetmodelauditoryscene,
      title={A dataset and model for auditory scene recognition for hearing devices: AHEAD-DS and OpenYAMNet}, 
      author={Henry Zhong and Jörg M. Buchholz and Julian Maclaren and Simon Carlile and Richard Lyon},
      year={2026},
      eprint={2508.10360},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2508.10360}, 
}

Downloads last month: 1,168

Paper for hzhongresearch/yamnetp_ahead_ds

A dataset and model for recognition of audiologically relevant environments for hearing aids: AHEAD-DS and YAMNet+

Paper • 2508.10360 • Published Aug 14, 2025