MPF audio component?

I'm in the process of "componentifying" the speech/non-speech system, and was wondering how you would like us to handle audio input. One option is to use standard gstreamer audio input and mpf toi output. Another option is to write an audio data type library for mpf (if you haven't already, that is). Is there a standard way to do the former? How much effort will the latter be?

   Thanks!

    Adam Janin

 

Categories:

mpf-pocketsphinx location

Adam,

You can get the code from https://svn.appscio.com/svn/MPF/trunk/data-types/mpf-pocketsphinx - you don't need any credentials to get read-only access. Let us know if you run into any problems.

Gareth

gstreamer pocketsphinx link at CMU

sidenote - something you might want to look at ... our mpf-pocketsphinx is a step in getting pocketsphinx into the pipeline, but the pocketsphinx author undertook the same thing (not the entire metadata framework idea, just 'get pocketsphinx into gstreamer') and does a bangup job of explaining it, at http://www.speech.cs.cmu.edu/cmusphinx/moinmoin/GStreamer And, he has the 'vader' Element for utterance start/end detection.

He outputs using signals (as opposed to every-buffer RDF graphs)

use gst audio -> oasr metadata -> mpf:interest

We haven't talked about an MPF-specific 'audio' library yet, but I can see we'll need one for the basic mappings to audio events-of-interest (speech events, non-speech audio recognitions, etc) or maybe even audio context-evaluation (evaluations of environmental characteristics such as reverberation). So I guess this is the start ;)

IMO, 'audio' is a basic gstreamer type and MPF should honor that and not try to recreate the gstreamer capabilities in a pure 'MPF' library - that would include audio stream analyisis of the 'signal processing' variety (measuring the volumes or frequency distributions, etc); I wouldn't try to turn that sort of characterization-data into MPF-level metadata, instead we should accept gstreamer audio data types and enhance gstreamer adio-processing Elements if we have to improve the lower-level 'signal processing' aspect.

Anyway, back to your specific point - your idea of generating appscio:interest (same as the mpf-toi component does) is good and compact, but I would think about whether the "isSpeech" estimation is reusable as other than an "interest" ... if so, consider breaking it down into two Components:

  1. estimating the "speechiness" of the audio (audio -> oasr:speech_quality, where the output pad is a metadata mime type)
  2. indicating "interest" as a function of the speechiness (oasr:speech_quality -> mpf:interest)

(Yes, I am planning that we are rework the appscio: namespace for segmenter and toi into an "mpf:" namespace)

Just to explain why I make the suggestion - I think MPF efforts to integrate a real query language like RDQL wil let us write a general transform Component that takes a parameter of a selection query string that operates against metadata graphs to pull out specific attributes, and provides some simple "toInterest" transform functions, so we can change what aspect of a graph is "interesting" in different applications (pipelines). When we get that, it would replace your second Component above.

For an MPF component mapping a media type to an mpf-toi output, you might look at the work done in data-types/mpf-pocketsphinx/mpfpocketsphinx.c ... but, I don't think  that was kept up as the component template evolved, it's more gstreamer-native than I would like. Still, it might be interesting, and I do want to revive that effort for MPF.

cpfpcoketsphinx location?

I assume mpfpocketsphinx is on Appscio's svn, since it isn't in rpm repository. Can I get access?

    Thanks,

    Adam