Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about your code #43

Open
luciaL opened this issue Mar 25, 2020 · 2 comments
Open

Questions about your code #43

luciaL opened this issue Mar 25, 2020 · 2 comments

Comments

@luciaL
Copy link

luciaL commented Mar 25, 2020

Hello,
Why the models have two fc layers and two outputs? I don't think it's necessary.
Consistency_loss can also be calculated by class_logit and ema_logit.
What's the difference between class_logit and cons_logit?

@dbjhbyun
Copy link

dbjhbyun commented Apr 3, 2020

I also had a similar question to this. Not sure if this is the actual intention of the authors but I think that class_logit is for "correct classification constraint" and class_cons is for "consistency constraint". Making a single fc to achieve both constraints could be challenging so the I think the authors use separate fcs for separate constraints.

@developer0hye
Copy link

developer0hye commented Aug 10, 2020

@luciaL @dbjhbyun

I found the description related to your issue in the paper.

The consistency to teacher predictions may not necessarily be a good proxy for the classification task, especially early in the training. So far our model has strongly coupled these two tasks by using the same output for both. How would
decoupling the tasks change the performance of the algorithm? To investigate, we changed the model to have two top layers and produce two outputs. We then trained one of the outputs for classification and the other for consistency. We also added a mean squared error cost between the output logits, and then varied the weight of this cost, allowing us to control the strength of the coupling. Looking at the results (reported using the EMA version of the classification output), we can see that the strongly coupled version performs well and the too loosely coupled versions do not. On the other hand, a moderate decoupling seems to have the benefit of making the consistency ramp-up redundant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants