Content
summary Summary

Google Research has developed an AI system called Health Acoustic Representations (HeAR) that analyzes coughs and breathing sounds to assess health.

Ad

HeAR uses self-supervised learning and was trained on over 300 million short audio clips from non-copyrighted YouTube videos. The neural network is based on the Transformer architecture.

During training, parts of audio spectrograms were hidden, and the network learned to reconstruct these missing sections. This allowed HeAR to create compact representations of audio data containing relevant health information.

Image: Baur et al.

Google published its findings in March 2024 and has now released the code for other researchers to use.

Ad
Ad

Improved tuberculosis detection

Researchers tested HeAR on 33 tasks from 6 datasets, including recognizing health-related sounds, classifying cough recordings, and estimating lung function values. HeAR outperformed existing audio AI models in most benchmarks.

It achieved an accuracy (AUROC) of 0.739 in detecting tuberculosis from cough sounds, surpassing the second-best model TRILL at 0.652. The authors see potential for using AI cough analysis to identify people in resource-poor areas who need further testing.

HeAR also showed promise in estimating lung function parameters like FEV1 (one-second capacity) and FVC (vital capacity) from smartphone recordings. With an average error of only 0.418 liters for FEV1, it was more accurate than the best comparison method (0.479 liters). This could lead to new, accessible screening tools for lung diseases such as COPD.

The researchers stress that HeAR is still a research tool. Any diagnostic applications would require clinical validation. There are also technical limitations, such as HeAR's current ability to process only two-second audio clips. Google plans to use techniques like model distillation and quantization to enable more efficient use of HeAR directly on mobile devices.

The StopTB Partnership, a UN-backed organization aiming to cure tuberculosis by 2030, supports this approach. Researchers can now request the trained HeAR model and an anonymized version of the CIDRZ dataset (cough audio data) from Google. More information is available on GitHub.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Google Research has developed an AI system called HeAR that can infer a person's health status from sounds like coughing or breathing. It has been trained on more than 300 million audio clips.
  • In tests to detect tuberculosis from cough sounds and to estimate lung function parameters from smartphone recordings, HeAR outperformed previous top models. This could open up new opportunities for AI-based screening tools.
  • HeAR is currently a research artifact with limitations such as being limited to two-second audio clips. Further optimization and clinical validation are needed before it can be used in practice. Google is making the code and data available to other researchers.
Sources
Jonathan works as a freelance tech journalist for THE DECODER, focusing on AI tools and how GenAI can be used in everyday work.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.