TorchSharp.BitsAndBytes

The TorchSharp.BitsAndBytes is a C# binding library for bitsandbytes library from Huggingface. It provides 4Bit and 8Bit quantization for TorchSharp models.

Usage

4Bit Quantization && Dequantization

Note

4Bit quantization is only available for CUDA devices.

var input = torch.rand([dim * 4, dim], dtype: ScalarType.Float32).cuda(); // FP32 tensor, must be on cuda device
string quantizedDType = "fp4"; // Available options: "fp4", "nf4"
int blockSize = 64; // can be [64, 128, 256, 512, 1024]

// Quantize to 4Bit
(var quantizedTensor, var absMax, blockSize, var n) = BitsAndByteUtils.Quantize4Bit(input, quantizedDType, blockSize);

// Dequantize to FP32
var dequantizedTensor = BitsAndByteUtils.Dequantize4Bit(quantiedTensor, absMax, input.dtype, quantizedDType, n, input.shape, blockSize);

For more examples, please refer to the Benchmark section.

Benchmark


BenchmarkDotNet v0.14.0, Windows 11 (10.0.26100.3037)
Intel Core i9-14900K, 1 CPU, 32 logical and 24 physical cores
Memory: 64GB
GPU: RTX4090
.NET SDK 9.0.102
  [Host]     : .NET 8.0.12 (8.0.1224.60305), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.12 (8.0.1224.60305), X64 RyuJIT AVX2

Method	Mean	Error	StdDev
Quantize4Bit	536.35 μs	12.164 μs	35.290 μs
Dequantize4Bit	2,257.89 μs	44.542 μs	51.294 μs
GEMV_4Bit_FP4	84.16 μs	1.673 μs	3.223 μs
GEMV_4Bit_NF4	82.69 μs	4.329 μs	12.629 μs
GEMV_FP32	49.59 μs	0.975 μs	2.035 μs
GEMM_INT8	2,994.86 μs	12.144 μs	11.360 μs
GEMM_FP32	4,495.49 μs	35.264 μs	32.986 μs

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
Redist		Redist
TorchSharp.BitsAndBytes.Benchmark		TorchSharp.BitsAndBytes.Benchmark
TorchSharp.BitsAndBytes.Tests		TorchSharp.BitsAndBytes.Tests
TorchSharp.BitsAndBytes		TorchSharp.BitsAndBytes
nuget		nuget
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
BranchInfo.props		BranchInfo.props
Dependencies.props		Dependencies.props
Directory.Build.props		Directory.Build.props
Directory.Packages.props		Directory.Packages.props
LICENSE		LICENSE
README.md		README.md
TorchSharp.BitsAndBytes.sln		TorchSharp.BitsAndBytes.sln
global.json		global.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TorchSharp.BitsAndBytes

Usage

4Bit Quantization && Dequantization

Benchmark

About

Releases

Packages

Languages

License

LittleLittleCloud/TorchSharp.BitsAndBytes

Folders and files

Latest commit

History

Repository files navigation

TorchSharp.BitsAndBytes

Usage

4Bit Quantization && Dequantization

Benchmark

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages