-
Notifications
You must be signed in to change notification settings - Fork 22
APIs
The get_default_config
function returns configs in python dictionaries. It takes only one argument which is the name of the extraction method.
Supported methods are: 'bert'
, 'glove'
, 'librosa'
, 'mediapipe'
, 'openface'
, 'opensmile'
, 'roberta'
, 'wav2vec'
, 'aligned'
.
Example code:
from MSA_FET import get_default_config, FeatureExtractionTool
# Get default config for OpenFace & alter
config_v = get_default_config('openface')
# Enable Active Speaker Detection
config_v['video']['multiFace']['enable'] = True
# Get default config for openSMILE & alter
config_a = get_default_config('opensmile')
# Use LLD features
config_a['audio']['args']['feature_level'] = 'LowLevelDescriptors'
# Get default config for bert & alter
config_t = get_default_config('bert')
# Switch to Chinese
config_t['text']['pretrained'] = 'bert-base-chinese'
# Combine the three modalities
config = {**config_a, **config_v, **config_t}
# Initialize main class
fet = FeatureExtractionTool(config)
The FeatureExtractionTool
class is the main class of this toolkit. The initialization function takes in 5 arguments:
-
config
(Required): Python dictionary or path to a JSON file or name of an example config. -
dataset_root_dir
: Path to datasets parent directory. Used when extracting dataset features withdataset_name
. -
tmp_dir
: Temporary directory path. Default:'~/.MSA-FET/tmp'
. -
log_dir
: Log directory path. Default:'~/.MSA-FET/log'
. -
verbose
: Verbose level of stdout.0
for error,1
for info,2
for debug. Default:1
.
Example code:
from MSA_FET import FeatureExtractionTool
# Initialize with example config & change temp dir
fet = FeatureExtractionTool(config="librosa", tmp_dir="/tmp")
# Initialize with custom_config.json & suppress output
fet = FeatureExtractionTool(config="custom_config.json", verbose=0)
The FeatureExtractionTool.run_single()
function extract features from a single video file. It takes in 4 arguments:
-
in_file
(Required): Path to input video file. -
out_file
: Path to output file. If omitted, no output file will be created. -
text_file
: Path to text transcript file. Required when extracting text features. -
return_type
:'pt'
for pytorch tensor,'np'
for numpy array. Default:'np'
.
Example code:
from MSA_FET import FeatureExtractionTool
# Extract visual feature with default openface config from input.mp4
fet = FeatureExtractionTool("openface")
feature = fet.run_single("input.mp4")
print(feature)
# Extract multimodal feature with custom config file and save features to features.pkl
# the parameter 'text_file' is required if text features are to be extracted
fet = FeatureExtractionTool("custom_config.json")
fet.run_single(in_file="input.mp4", out_file="feature.pkl", text_file="input.txt")
Note: To extract features for datasets, the datasets need to be organized in a specific file structure, and a
label.csv
file is needed. See Dataset and Structure for details. Raw video files and label files for MOSI, MOSEI and CH-SIMS can be downloaded here.
Note: From version v_0.4.0, the
run_dataset
function has been rewritten to support multiprocessing. To enable this we have to reconstruct the code thus the function is no longer a class method underFeatureExtractionTool
class. It is a stand alone function which needs to be imported directly. See below examples for reference.
The run_dataset()
function extract features from a specificly arranged dataset folder. The function takes in 9 arguments:
-
config
: Python dictionary of config, or path to a JSON file, or name of an example config. -
dataset_dir
: Path to dataset directory. If specified, will override 'dataset_name'. -
out_file
: Output feature file. If not specified, features will be saved under the dataset directory with the name 'feature.pkl'. -
return_type
:'pt'
for pytorch tensor,'np'
for numpy array. Default:'np'
. -
num_workers
: Number of workers for parallel processing. Default:4
. -
padding_value
: padding value for sequence padding.'zero'
or'norm'
. Default:'zero'
. -
padding_location
: padding location for sequence padding.'end'
or'start'
. Default:'end'
. -
face_detection_failure
: action to take when face detection fails.'skip'
the frame or'pad'
with zeros. Default:'skip'
. -
tmp_dir
: Directory for temporary files. Default:'~/.MSA-FET/tmp'
. -
log_dir
: Log directory. Default:'~/.MSA-FET/log'
. -
log_level
: Verbose level of stdout. Default:logging.INFO
-
progress_q
: Reserved for M-SENA platform. Multiprocessing queue used for reporting progress. -
task_id
: Reserved for M-SENA platform. Task ID.
Example Code:
from MSA_FET import run_dataset
# Extract audio features for MOSI using default aligned feature config
run_dataset(
config = 'aligned',
dataset_dir = 'MSA-Datasets/MOSI',
out_file = './feature_aligned.pkl',
num_workers = 8
)