metal llama.cpp performs much worse than LM Studio MLX? #171

xuhengdada · 2025-02-20T09:09:11Z

When using similar models, for example, using metal llama.cpp running DeepSeek R1 Distill Llama 70B (Q4_K_M) GGUF, and LM Studio MLX running DeepSeek R1 Distill Llama 70B (4bit) MLX, the later case (~8-9 tokens/s) is 2x faster than former (~3-4tokens/s). So I wander why is that?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metal llama.cpp performs much worse than LM Studio MLX? #171

metal llama.cpp performs much worse than LM Studio MLX? #171

xuhengdada commented Feb 20, 2025

metal llama.cpp performs much worse than LM Studio MLX? #171

metal llama.cpp performs much worse than LM Studio MLX? #171

Comments

xuhengdada commented Feb 20, 2025