Skip to content

Configurations

FlameSky edited this page Jan 14, 2022 · 9 revisions

Config files in MMSA-FET are json files containing 3 sections: audio, video and text. An overall template is shown here. In this page, detailed configurations for each tool is introduced.

1. Audio Tools

1.1 Librosa

{
  "audio": {
    "tool": "librosa",
    "sample_rate": null,          // null means auto detect
    "args": {
      "mfcc": {
        "n_mfcc": 20,
        "htk": true
      },
      "rms": {},                  // remove this line if you don't need rms feature
      "zero_crossing_rate": {},   // remove this line if you don't need zero_crossing_rate feature
      "spectral_rolloff": {},     // remove this line if you don't need spectral_rolloff feature
      "spectral_centroid": {}     // remove this line if you don't need spectral_centroid feature

      // add more features to your need here. 
      // supported features are listed on this page: https://librosa.org/doc/latest/feature.html
    }
  }
}

1.2 openSMILE

{
  "audio": {
    "tool": "opensmile",
    "sample_rate": 16000,               // opensmile uses 16000 bitrate
    "args": {
      "feature_set": "eGeMAPS",         // opensmile feature sets: https://audeering.github.io/opensmile-python/api-smile.html#featureset
      "feature_level": "Functionals",   // opensmile config: https://audeering.github.io/opensmile-python/api-smile.html#featurelevel
      "start": null,                    // passed to opensmile.process_signal: 
      "end": null                       // https://audeering.github.io/opensmile-python/api-smile.html#opensmile.Smile.process_signal
    }
  }
}

1.3 Wav2vec2

2. Video Tools

2.1 OpenFace

2.2 Mediapipe

3. Text Tools

3.1 BERT

3.2 XLNet

Clone this wiki locally