All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Structured LLM output with JSON Schema and GBNF Grammar Support for model type 'GGUF_CPU'
- Introducing new model type: GGUF_CPU to support running gguf compiled models on CPU with LLM.js
- Bumped up Llama.cpp worker to build commit dd047b4
- Model Caching: Persisting model files in WebWorker's cache to avoid re-downloading model on every load. Thanks to PR from @felladrin.
- Deprecating model types: LLAMA2, DOLLY_V2, GPT_2, GPT_J, GPT_NEO_X, MPT, REPLIT, STARCODER
- Latest GGUF format supported with latest updated llama2-cpp module.
- Added context_size parameter for llama models.
- Rebranded project to "LLM.JS".
- Removed STACK_SIZE flag from build scripts.
- Added support for llama2.c models
- Added tokenizer_url parameter in GGML Wrapper for supporting LLaMa2.
- Upgraded ggml version to 244776a
- ggml.js package
- Added Model Support for Dollyv2, GPT2, GPT J, GPT Neo X, MPT, Replit, StarCoder
- docsify documentation