视觉理解模型微调的lora推理不正确的问题 #3000
zhuchen1109
started this conversation in
General
Replies: 3 comments 1 reply
-
mlp.0 mlp.1 这俩应该不用 tp。 |
Beta Was this translation helpful? Give feedback.
0 replies
-
我排查发现,在vision的mlp.fc1层,其推理结果的tensor里包含了大量nan值。想请教下,这可能是什么原因呢?我使用transformer推理没有出现这样的问题。推理代码对应位置: |
Beta Was this translation helpful? Give feedback.
1 reply
-
我梳理了继承于BaseLinear所有layer的is_tp和all_reduce,都修改为False。还是有nan值,想请教下,这个可能是什么原因导致的呢? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
我使用swift微调Qwen2-VL-7B-Instruct模型,微调的 "target_modules": [ "up_proj", "attn.proj", "qkv", "down_proj", "mlp.0", "gate_proj", "k_proj","o_proj", "fc2", "q_proj", "mlp.2", "v_proj", "fc1" ],包含了vision部分的attn.proj、mlp.0、mlp.2。
![image](https://private-user-images.githubusercontent.com/3110128/401151515-2b2b8f85-5c18-4591-a7f4-501d4f37c933.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2MzA4MzEsIm5iZiI6MTczOTYzMDUzMSwicGF0aCI6Ii8zMTEwMTI4LzQwMTE1MTUxNS0yYjJiOGY4NS01YzE4LTQ1OTEtYTdmNC01MDFkNGYzN2M5MzMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxNSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTVUMTQ0MjExWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9M2EwNjhmMWUyNmJiMTE2MzZiNzMwNWM3MmQ3NDJmYTBlMjg1NzY3ZDYzOTkyZGRmZTBhMDkyZDU0Yzk5OGNiYSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.k0tbNgZ8EyVtkaNePKqCHMyGmPUc-odTJ2rx_33biXc)
![image](https://private-user-images.githubusercontent.com/3110128/401152912-dc17e8b3-bd03-47a3-a809-6e0114ca1c78.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2MzA4MzEsIm5iZiI6MTczOTYzMDUzMSwicGF0aCI6Ii8zMTEwMTI4LzQwMTE1MjkxMi1kYzE3ZThiMy1iZDAzLTQ3YTMtYTgwOS02ZTAxMTRjYTFjNzgucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxNSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTVUMTQ0MjExWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OWYxZmUzZDQ1NzRjYThiMGE0YjMzNzJhMjU3MDU2MTM0ZjAzYWFmYjc2ZGNmNGIwYTg0YzRjMWU5MDJjZjk3YyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.onJtBeuA_55-Hsr6MvcmKvrTE_Y5gtgr2Y2MrylujRI)
遇到第一个问题是,patch.py add_adapters方法里,mod.lora_adapters[target_name] = lora,这里target_name不能包含".",我这里修改代码逻辑绕过的,这个逻辑修改能在后面load_lora_weights时正确的加载权重,修改如下截图所示:
遇到第二问题是,visual.merger.mlp这层因没有实现BaseLinear,mlp.0和mlp.2这二层不能加载lora权重,我将原来的nn.Linear修改为BaseLinear实现,修改如下截图所示:
经过上述修改后,我能正常的初始化模型并正常工作,但在我跑验证集的时候,发现结果都是错的。
想请教下,我这修改是有什么问题吗,我还需要做什么工作才能正常工作呢?
Beta Was this translation helpful? Give feedback.
All reactions