APIs

1. Customize Configs

The get_default_config function returns configs in python dictionaries. It takes only one argument which is the name of the extraction method. Supported methods are: 'bert', 'glove', 'librosa', 'mediapipe', 'openface', 'opensmile', 'roberta', 'wav2vec', 'aligned'.

Example code:

from MSA_FET import get_default_config, FeatureExtractionTool

# Get default config for OpenFace & alter
config_v = get_default_config('openface')
# Enable Active Speaker Detection
config_v['video']['multiFace']['enable'] = True 

# Get default config for openSMILE & alter
config_a = get_default_config('opensmile')
# Use LLD features
config_a['audio']['args']['feature_level'] = 'LowLevelDescriptors'

# Get default config for bert & alter
config_t = get_default_config('bert')
# Switch to Chinese
config_t['text']['pretrained'] = 'bert-base-chinese'

# Combine the three modalities
config = {**config_a, **config_v, **config_t}

# Initialize main class
fet = FeatureExtractionTool(config)

2. Initialize Main Class

The FeatureExtractionTool class is the main class of this toolkit. The initialization function takes in 5 arguments:

config(Required): Python dictionary or path to a JSON file or name of an example config.
dataset_root_dir: Path to datasets parent directory. Used when extracting dataset features with dataset_name.
tmp_dir: Temporary directory path. Default: '~/.MSA-FET/tmp'.
log_dir: Log directory path. Default: '~/.MSA-FET/log'.
verbose: Verbose level of stdout. 0 for error, 1 for info, 2 for debug. Default: 1.

Example code:

from MSA_FET import FeatureExtractionTool

# Initialize with example config & change temp dir
fet = FeatureExtractionTool(config="librosa", tmp_dir="/tmp")

# Initialize with custom_config.json & suppress output
fet = FeatureExtractionTool(config="custom_config.json", verbose=0)

2. Extract Features for Video Files

The FeatureExtractionTool.run_single() function extract features from a single video file. It takes in 4 arguments:

in_file(Required): Path to input video file.
out_file: Path to output file. If omitted, no output file will be created.
text_file: Path to text transcript file. Required when extracting text features.
return_type: 'pt' for pytorch tensor, 'np' for numpy array. Default: 'np'.

Example code:

from MSA_FET import FeatureExtractionTool

# Extract visual feature with default openface config from input.mp4
fet = FeatureExtractionTool("openface")
feature = fet.run_single("input.mp4")
print(feature)

# Extract multimodal feature with custom config file and save features to features.pkl
# the parameter 'text_file' is required if text features are to be extracted
fet = FeatureExtractionTool("custom_config.json")
fet.run_single(in_file="input.mp4", out_file="feature.pkl", text_file="input.txt")

3. Extract Features for Datasets

Note: To extract features for datasets, the datasets need to be organized in a specific file structure, and a label.csv file is needed. See Dataset and Structure for details. Raw video files and label files for MOSI, MOSEI and CH-SIMS can be downloaded here.

Note: From version v_0.4.0, the run_dataset function has been rewritten to support multiprocessing. To enable this we have to reconstruct the code thus the function is no longer a class method under FeatureExtractionTool class. It is a stand alone function which needs to be imported directly. See below examples for reference.

The run_dataset() function extract features from a specificly arranged dataset folder. The function takes in 9 arguments:

config: Python dictionary of config, or path to a JSON file, or name of an example config.
dataset_dir: Path to dataset directory. If specified, will override 'dataset_name'.
out_file: Output feature file. If not specified, features will be saved under the dataset directory with the name 'feature.pkl'.
return_type: 'pt' for pytorch tensor, 'np' for numpy array. Default: 'np'.
num_workers: Number of workers for parallel processing. Default: 4.
padding_value: padding value for sequence padding. 'zero' or 'norm'. Default: 'zero'.
padding_location: padding location for sequence padding. 'end' or 'start'. Default: 'end'.
face_detection_failure: action to take when face detection fails. 'skip' the frame or 'pad' with zeros. Default: 'skip'.
tmp_dir: Directory for temporary files. Default: '~/.MSA-FET/tmp'.
log_dir: Log directory. Default: '~/.MSA-FET/log'.
log_level: Verbose level of stdout. Default: logging.INFO
progress_q: Reserved for M-SENA platform. Multiprocessing queue used for reporting progress.
task_id: Reserved for M-SENA platform. Task ID.

Example Code:

from MSA_FET import run_dataset

# Extract audio features for MOSI using default aligned feature config
run_dataset(
    config = 'aligned', 
    dataset_dir = 'MSA-Datasets/MOSI', 
    out_file = './feature_aligned.pkl',
    num_workers = 8
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

APIs

1. Customize Configs

2. Initialize Main Class

2. Extract Features for Video Files

3. Extract Features for Datasets

Clone this wiki locally