Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the Channel dim of a color picture #5

Open
Angry-Echo opened this issue Jun 28, 2023 · 2 comments
Open

the Channel dim of a color picture #5

Angry-Echo opened this issue Jun 28, 2023 · 2 comments

Comments

@Angry-Echo
Copy link

Hello~ I am studying your code and i have a question about how the model handle the color image due to I can't find the RGB Channel when frame sequence input into the model.
In the multi_head_attention.py, at the beginning of the call method (after self.wq(q), and i know the self.wq is a conv_layer), your comment says:#(batch_size, num_heads, seq_len_q, rows, cols, depth), where is the channel-dim? The dimension meaning of the six i understand is: seq_len_q is the length of the frame sequence; num_heads × depth = d_model; rows is the H of image; cols is the W of image)

Sincerely hope that you can answer my doubts and if you do not mind, can i ask you for some knowledge about the field of Video Prediction? I am trying to do some research about predicting image sequence with Transformer

@Angry-Echo
Copy link
Author

Oh!I guess the depth is channel, is that right ?

@Angry-Echo
Copy link
Author

But i still have a same question as the another issue, the Conv layer requires [batch_size, rows, cols, depth], how can the additional dim seqlen input into the Conv layer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant