Audio Mining

Audio Mining

Audio Mining makes your audio-visual content as accessible as text.


Fraunhofer IAIS

Contact person(s)

For technical information:

  • Heike Horstmann
  • Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS
  • Send mail

For licensing information:

What it does

Audio Mining analyses German or English-language audio/video files (e.g. content from a TV news show) and returns textual information suitable for indexing (e.g. for search engines). Audio Mining performs speech and speaker segmentation as well as speech recognition in order to render speech into text. The SE delivers segments, speaker identification, characteristic keywords and additional metadata in XML and JSON. Finally, the SE builds an index for multimedia search.

How it works

Audio Mining incorporates state-of-the-art multimedia pattern recognition algorithms such as speech detection, speaker diarisation, speaker recognition and speech recognition. By cascading these algorithms, it automatically obtains a broad spectrum of metadata for media files. This enables users to search for terms, quotations or specific speakers, to browse through archives using content-based recommendation or to obtain media information such as keywords or SRT-compatible subtitles. Audio Mining incorporates a powerful Apache Solr search engine that stores all metadata and makes it available via the provided SOAP/REST interface.

What you get

Audio Mining offers a RESTful API which can be used to convert audiovisual content in German or English into machine-readable text using automatic speech recognition (ASR). Via the API, the technology can be integrated into existing architectures like CMS.

Delivery model

  • SaaS

Discover more FIWARE Media & Content Enablers