Configurations

Config files in MMSA-FET are json files containing 3 sections: audio, video and text. An overall template is shown here. In this page, detailed configurations for each tool is introduced.

1. Audio Tools

1.1 Librosa

{
  "audio": {
    "tool": "librosa",
    "sample_rate": null,          // null means auto detect
    "args": {
      "mfcc": {
        "n_mfcc": 20,
        "htk": true
      },
      "rms": {},                  // remove this line if you don't need rms feature
      "zero_crossing_rate": {},   // remove this line if you don't need zero_crossing_rate feature
      "spectral_rolloff": {},     // remove this line if you don't need spectral_rolloff feature
      "spectral_centroid": {}     // remove this line if you don't need spectral_centroid feature

      // add more features to your need here. 
      // supported features are listed on this page: https://librosa.org/doc/latest/feature.html
    }
  }
}

1.2 openSMILE

{
  "audio": {
    "tool": "opensmile",
    "sample_rate": 16000,               // opensmile uses 16000 bitrate
    "args": {
      "feature_set": "eGeMAPS",         // opensmile feature sets: https://audeering.github.io/opensmile-python/api-smile.html#featureset
      "feature_level": "Functionals",   // opensmile config: https://audeering.github.io/opensmile-python/api-smile.html#featurelevel
      "start": null,                    // passed to opensmile.process_signal: 
      "end": null                       // https://audeering.github.io/opensmile-python/api-smile.html#opensmile.Smile.process_signal
    }
  }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurations

1. Audio Tools

1.1 Librosa

1.2 openSMILE

1.3 Wav2vec2

2. Video Tools

2.1 OpenFace

2.2 Mediapipe

3. Text Tools

3.1 BERT

3.2 XLNet

Clone this wiki locally