OpenYAMNet/YAMNet+

This repository contains the models for OpenYAMNet/YAMNet+, introduced in the paper A dataset and model for auditory scene recognition for hearing devices: AHEAD-DS and OpenYAMNet.

OpenYAMNet is a sound recognition model designed for deployment on edge devices like smartphones connected to hearing devices (e.g., hearing aids and wireless earphones). It serves as a baseline model for sound-based scene recognition.

Model Performance

OpenYAMNet achieved the following results on the testing set of the AHEAD-DS dataset:

  • Mean Average Precision (mAP): 0.86
  • Accuracy: 0.93

The model is optimized for real-time use, with approximately 50ms of latency for loading and a processing time of ~30ms per 1 second of audio on a 2018 Google Pixel 3.

Resources

Supported Scene Classes

Classes
cocktail_party
interfering_speakers
in_traffic
in_vehicle
music
quiet_indoors
reverberant_environment
wind_turbulence
speech_in_traffic
speech_in_vehicle
speech_in_music
speech_in_quiet_indoors
speech_in_reverberant_environment
speech_in_wind_turbulence

Licence

Licenced under CC BY-SA 4.0. See LICENCE.txt. Licence for original YAMNet weights.

Citation

@misc{zhong2026datasetmodelauditoryscene,
      title={A dataset and model for auditory scene recognition for hearing devices: AHEAD-DS and OpenYAMNet}, 
      author={Henry Zhong and Jörg M. Buchholz and Julian Maclaren and Simon Carlile and Richard Lyon},
      year={2026},
      eprint={2508.10360},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2508.10360}, 
}
Downloads last month
1,168
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for hzhongresearch/yamnetp_ahead_ds