You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using similar models, for example, using metal llama.cpp running DeepSeek R1 Distill Llama 70B (Q4_K_M) GGUF, and LM Studio MLX running DeepSeek R1 Distill Llama 70B (4bit) MLX, the later case (~8-9 tokens/s) is 2x faster than former (~3-4tokens/s). So I wander why is that?
The text was updated successfully, but these errors were encountered:
When using similar models, for example, using metal llama.cpp running DeepSeek R1 Distill Llama 70B (Q4_K_M) GGUF, and LM Studio MLX running DeepSeek R1 Distill Llama 70B (4bit) MLX, the later case (~8-9 tokens/s) is 2x faster than former (~3-4tokens/s). So I wander why is that?
The text was updated successfully, but these errors were encountered: