Inference inputs multiple modalities other than text at once #21

xxrbudong · 2024-05-24T08:24:26Z

Hello, I would like to ask, the current code seems to support only one modality and text modality at a time of inference, is it possible to input multiple modal data (such as audio, video and text) at a time of inference?

csuhan · 2024-07-08T03:47:56Z

The current model is not trained on joint multimodal data, so it may not perform well at the test time.

Cece1031 · 2024-07-28T06:36:24Z

The current model is not trained on joint multimodal data, so it may not perform well at the test time.
But I see you run the test on Music-AVQA in thesis, could u tell me how you manage to use three modalities to generate answers?Thank u very much!

csuhan · 2024-11-17T07:30:50Z

Hi @Cece1031 , hope the script in #29 can help you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference inputs multiple modalities other than text at once #21

Inference inputs multiple modalities other than text at once #21

xxrbudong commented May 24, 2024

csuhan commented Jul 8, 2024

Cece1031 commented Jul 28, 2024

csuhan commented Nov 17, 2024

Inference inputs multiple modalities other than text at once #21

Inference inputs multiple modalities other than text at once #21

Comments

xxrbudong commented May 24, 2024

csuhan commented Jul 8, 2024

Cece1031 commented Jul 28, 2024

csuhan commented Nov 17, 2024