-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training added on top of flux_impl #147
base: main
Are you sure you want to change the base?
Conversation
… Start creating generation code for flux.
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
ba2d028
to
f56234e
Compare
Background in #148 @entrpn, I've rebased on flux_lora now and aligned the pipeline with the changes you made to generate_flux.py. Inference is working as expected, but I am a bit suspicious about the training. Please try it out and lets discuss on how to move forward with this. In the meantime, I will prepare another PR where I will add FA for GPUs, similar to how it is done in maxtext. |
Hi @entrpn, have you had a chance to test this PR? I think we can try to merge this soon if you think it looks alright |
I started to take a look at it. The pipeline fails for me during the data pipeline due to memory restraints in my environment. During the text encoding, I get OOM. The code will need to be refactored in a way that this can run on on CPU or at least 32 GB of accelerator memory (preferably 16) since the t5 encoder cannot be sharded atm. I remember doing something similar before by batching the captions in the data pipeline. I can take a look at it next week and try to get that part working. |
Linked to #146
I've added the training code on top of https://github.com/AI-Hypercomputer/maxdiffusion/tree/flux_impl. This PR is meant to be merged after #146.
With the training code, I have also added a pipeline for flux, which can be used for inference as well.