Why is this permutation necessary here (classification task) ? #205

Joao-L-S-Almeida · 2024-10-28T12:55:02Z

terratorch/terratorch/models/heads/classification_head.py

Line 49 in efd84db

x = x.reshape(x.shape[0], x.shape[1], -1).permute(0, 2, 1)

romeokienzler · 2024-10-28T15:27:07Z

paolofraccaro · 2024-10-28T16:03:20Z

I cannot figure out either, but also have a very vague memory that @CarlosGomes98 told me there was some factory that was needing data flipped. But cannot really remember the specifics.

romeokienzler · 2024-10-31T07:18:39Z

@paolo-fraccaro from your findings it seems that NHWC and NCHW are both present in TT correct? depending on the model/factory...

should we set a project wide standard? we could as we control all the factories....if yes, what would happen if we don't do it?

I've read somewhere that NCHW is the standard and better (performance wise) for nVidia

In case we don't set the standard, there is no risk in getting things wrong as the shapes won't match and we will come to know through an error, correct?

We could introduce this as a parameter (with NCHW as default) to the factory(ies)

paolofraccaro · 2024-10-31T09:24:07Z

Yes I think this would be helpful. Sticking to one convention would help. I am happy with NCHW (NLC for transformers). Our classification head at the moment expects NHWC/NCL instead.

romeokienzler · 2024-10-31T11:39:31Z

ok @paolo-fraccaro what would u suggest? change the CH or add a transform neck and have the factories add the neck to the model per default?

paolofraccaro · 2024-10-31T11:55:13Z

this is a way! We have the out_channels attribute from the backbones (which I believe is mandatory) therefore we should be able to permute to any default way we want to use with a neck. Either way this could be turned on/off with an attribute at factory level and clearly stated in the documentation.

CarlosGomes98 · 2024-10-31T11:59:24Z

This code in the head strikes me as probably transformer specific... this transpose bit should be done by a neck instead. I think we can remove this line and try with a transform.

As for the convention, NCHW is the norm, but transformers have N T C (t for tokens) internally. This is needed so that internally they can efficiently perform attention (e.g. applying the same mlp to all tokens) and is why we have the necks that perform permutation if needed (see the permute neck and the neck that goes from tokens to image)

Joao-L-S-Almeida self-assigned this Oct 28, 2024

Joao-L-S-Almeida changed the title ~~Whys is this permutation necessary here (classification task) ?~~ Why is this permutation necessary here (classification task) ? Oct 28, 2024

romeokienzler self-assigned this Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is this permutation necessary here (classification task) ? #205

Why is this permutation necessary here (classification task) ? #205

Joao-L-S-Almeida commented Oct 28, 2024

romeokienzler commented Oct 28, 2024

paolofraccaro commented Oct 28, 2024

romeokienzler commented Oct 31, 2024

paolofraccaro commented Oct 31, 2024

romeokienzler commented Oct 31, 2024

paolofraccaro commented Oct 31, 2024

CarlosGomes98 commented Oct 31, 2024

Why is this permutation necessary here (classification task) ? #205

Why is this permutation necessary here (classification task) ? #205

Comments

Joao-L-S-Almeida commented Oct 28, 2024

romeokienzler commented Oct 28, 2024

paolofraccaro commented Oct 28, 2024

romeokienzler commented Oct 31, 2024

paolofraccaro commented Oct 31, 2024

romeokienzler commented Oct 31, 2024

paolofraccaro commented Oct 31, 2024

CarlosGomes98 commented Oct 31, 2024