Releases · kvcache-ai/ktransformers · GitHub

18 Feb 14:02

KMSorSMS

v0.2.1.post1 Latest

Latest

fix precision bug imported in 0.2.1, add mmlu/mmlu_pro text, fix server #413

Assets 38

ktransformers-0.2.1.post1+cu121torch23avx2-cp310-cp310-linux_x86_64.whl

26.8 MB 2025-02-18T13:57:22Z
ktransformers-0.2.1.post1+cu121torch23avx2-cp311-cp311-linux_x86_64.whl

26.8 MB 2025-02-18T15:15:38Z
ktransformers-0.2.1.post1+cu121torch23avx512-cp310-cp310-linux_x86_64.whl

26.8 MB 2025-02-18T15:16:47Z
ktransformers-0.2.1.post1+cu121torch23avx512-cp311-cp311-linux_x86_64.whl

26.8 MB 2025-02-18T15:16:51Z
ktransformers-0.2.1.post1+cu121torch23fancy-cp310-cp310-linux_x86_64.whl

26.8 MB 2025-02-18T13:57:28Z
ktransformers-0.2.1.post1+cu121torch23fancy-cp311-cp311-linux_x86_64.whl

26.8 MB 2025-02-18T15:16:54Z
ktransformers-0.2.1.post1+cu121torch24avx2-cp310-cp310-linux_x86_64.whl

26.8 MB 2025-02-18T13:57:35Z
ktransformers-0.2.1.post1+cu121torch24avx2-cp311-cp311-linux_x86_64.whl

26.8 MB 2025-02-18T15:16:59Z
ktransformers-0.2.1.post1+cu121torch24avx512-cp310-cp310-linux_x86_64.whl

26.8 MB 2025-02-18T13:57:39Z
ktransformers-0.2.1.post1+cu121torch24avx512-cp311-cp311-linux_x86_64.whl

26.8 MB 2025-02-18T13:57:44Z
Source code (zip)

2025-02-18T13:48:30Z
Source code (tar.gz)

2025-02-18T13:48:30Z

15 Feb 08:35

UnicornChan

v0.2.1

Update documentation/README. #307 #316
Added Multi-GPU configuration tutorial. #254
Consolidated installation guide. #307
A new Triton MLA Kernel has been introduced for Linux #294

Assets 22

10 Feb 06:02

UnicornChan

v0.2.0

Support Deepseek-R1 and V3 on single (24GB VRAM)/multi gpu and 382G DRAM
Support use dual socket

Assets 66

30 Aug 13:52

UnicornChan

v0.1.4

Bug fix

Fix bug that ktransformers cannot offload whole layer in cpu.
Update DeepseekV2‘s multi gpu yaml examples to evenly allocate layers.
Update Docker file.
Fix bug about Qwen2-57B can not loaded
Fix bug with #66 , add requirements for uvicorn

Assets 69

29 Aug 01:36

UnicornChan

v0.1.3

support internlm2.5 for 1M Prompt under 24GB VRAM and 150GB DRAM(only local_chat)
decrease DeepseekV2's required VRAM from 20G to 10G.
fix bugs as #51 #52 #56

Assets 4

15 Aug 17:39

UnicornChan

v0.1.2

Support windows native. #4
Support multiple GPU. #8
Support llamfile as linear backend.
Support new model: mixtral 8 * 7B and 8 * 22B
Support q2k, q3k, q5k dequant on gpu. #16
Support github action to create pre compile package
Support shared memory in different operator
Fix some bugs on build from source #23

Assets 59

01 Aug 04:40

UnicornChan

v0.1.1

support multiple cpu architecture pre compiled wheel package
pre compile wheel package support multiple TORCH_CUDA_ARCH_LIST as "8.0;8.6;8.7;8.9"
test and support python 3.10
add a dockerfile to build docker image
update README.md to support docker (In Progress: upload docker image)
update version to 0.1.1

Assets 57

29 Jul 13:19

UnicornChan

0.1.0

Complete the submission information for PyPI.
Support for dynamically detecting the current environment of the client. If precompiled packages can be used for installation, download and install the precompiled packages. (Adapted from flash-attn)
Modified the installation process in the README.

Assets 10