Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed #1043

Open
lhwong opened this issue Oct 13, 2024 · 4 comments

Comments

@lhwong
Copy link

lhwong commented Oct 13, 2024

I got the following error when running model Imported from GGUF which is generated from the model fine-tuned with LoRA.

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed

The following are commands used

mlx_lm.lora --train --model meta-llama/Llama-3.2-1B --data ~/Projects/AI/data --iters 1000

mlx_lm.generate --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --prompt "What is biomolecule?"

mlx_lm.fuse --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --export-gguf

Create Modelfile
FROM ./fused_model/ggml-model-f16.gguf

ollama create example -f Modelfile

ollama run example

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed
/Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed
/Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed

@awni
Copy link
Member

awni commented Oct 14, 2024

I would file an issue with the https://github.com/ollama/ollama folks. It's not clear to me this is an issue with MLX..

@lhwong
Copy link
Author

lhwong commented Oct 15, 2024

@awni Could it be due to GGUF exported by mlx_lm is F16 and the comman I used to create the model (ollama create example -f Modelfile) is wrong or certain setting is required?

"Export the fused model to GGUF. Note GGUF support is limited to Mistral, Mixtral, and Llama style models in fp16 precision." Reference: https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md

@hansvdam
Copy link

I have the same. Making a guff after the fuse with llama.cpp does work when running it in ollama:
https://github.com/ggerganov/llama.cpp

python convert_hf_to_gguf.py <path_to>/fused_model --outfile output_file.gguf

then in the ollama MODELFILE, put (with the parameters and template):
FROM output_file.gguf

@hschaeufler
Copy link
Contributor

hschaeufler commented Oct 28, 2024

I got the following error when running model Imported from GGUF which is generated from the model fine-tuned with LoRA.

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed

The following are commands used

mlx_lm.lora --train --model meta-llama/Llama-3.2-1B --data ~/Projects/AI/data --iters 1000

mlx_lm.generate --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --prompt "What is biomolecule?"

mlx_lm.fuse --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --export-gguf

Create Modelfile FROM ./fused_model/ggml-model-f16.gguf

ollama create example -f Modelfile

ollama run example

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed /Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed /Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed

@lhwong @hansvdam
You can fuse the model without gguf-Export in import it in ollama. It currently has only a problem in ollama with the format, which is why you have to downgrade the transformer library first.
See also: ollama/ollama#7167 (comment)

pipenv install transformers==4.44.2 or pip install transformers==4.44.2 (depending on your package manager)

Fuse the model without gguf

mlx_lm.fuse --model "meta-llama/Meta-Llama-3.1-8B-Instruct" \
    --adapter-path "results/llama3_1_8B_instruct_lora/tuning_11/adapters" \
    --save-path "results/llama3_1_8B_instruct_lora/tuning_11/lora_fused_model/"

Model-File

FROM "/Volumes/Extreme SSD/dartgen/results/llama3_1_8B_instruct_lora/tuning_11/lora_fused_model"

PARAMETER temperature 0.6
PARAMETER top_p	 0.9

And import it:
ollama create hschaeufler/dartgen-llama-3.1:8b-instruct-bf16-v11 -f Modelfile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants