#

vllm

Here are 87 public repositories matching this topic...

meta-llama / llama-recipes

Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.

python machine-learning ai pytorch llama finetuning llm langchain vllm llama2

Updated Nov 13, 2024
Jupyter Notebook

xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Updated Nov 14, 2024
Python

katanaml / sparrow

Data processing with ML, LLM and Vision LLM

computer-vision machinelearning gpt nlp-machine-learning rag huggingface-transformers llm vllm

Updated Nov 12, 2024
Python

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)

reinforcement-learning raylib transformers deepspeed large-language-models reinforcement-learning-from-human-feedback vllm

Updated Nov 13, 2024
Python

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

sora llm llms vllm llm-inference awesome-llm flash-attention flash-attention-2 tensorrt-llm paged-attention deepseek open-sora flash-attention-3

Updated Nov 13, 2024

runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

language-model llm runpod vllm

Updated Oct 31, 2024
Python

bricks-cloud / BricksLLM

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.

api docker golang open-source security privacy ai azure rest-api postgresql self-hosted artificial-intelligence ycombinator openai gpt llm generative-ai anthropic vllm

Updated Nov 10, 2024
Go

prometheus-eval / prometheus-eval

Evaluate your LLM's response with Prometheus and GPT4 💯

python evaluation gpt4 llm llmops vllm litellm llm-as-a-judge llm-as-evaluator

Updated Sep 9, 2024
Python

microsoft / vidur

A large-scale simulation framework for LLM inference

simulation inference transformer llm vllm

Updated Oct 10, 2024
Python

substratusai / kubeai

Private Open AI on Kubernetes

kubernetes ai k8s whisper autoscaler openai-api llm vllm faster-whisper ollama vllm-operator ollama-operator inference-operator

Updated Nov 14, 2024
Go

containers / ramalama

The goal of RamaLama is to make working with AI boring.

ai local containers inference-server podman llms llamacpp vllm

Updated Nov 14, 2024
Shell

ModelTC / llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Updated Nov 13, 2024
Python

chtmp223 / topicGPT

Official Implementation of TopicGPT: A Prompt-Based Framework for Topic Modeling (NAACL '24)

python nlp openai topic-modeling llm vllm

Updated Nov 11, 2024
Python

OpenCSGs / llm-inference

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

transformer ray deepspeed llama-cpp vllm llm-inference

Updated May 17, 2024
Python

jasonacox / TinyLLM

Setup and run a local LLM and Chatbot using consumer grade hardware.

chatbot artificial-intelligence openai rag large-language-models llm vllm retrieval-augmented-generation llama-cpp-python

Updated Nov 10, 2024
JavaScript

varunvasudeva1 / llm-server-docs

Documentation on setting up an LLM server on Debian from scratch, using Ollama/vLLM, Open WebUI, OpenedAI Speech, and ComfyUI.

linux debian server llm comfyui vllm ollama open-webui openedai-speech

Updated Oct 26, 2024

varunshenoy / super-json-mode

Low latency JSON generation using LLMs ⚡️

openai huggingface-transformers llm vllm

Updated Mar 10, 2024
Jupyter Notebook

NetEase-Media / grps

【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架，支持dynamic batching、streaming模式，支持python/c++双语言，可限制，可拓展，高性能。帮助用户快速地将模型部署到线上，并通过http/rpc接口方式提供服务。

tensorflow torch tensorrt serving triton-inference-server dynamic-batching vllm tensorrt-llm

Updated Nov 8, 2024
C++

yoziru / nextjs-vllm-ui

Fully-featured, beautiful web interface for vLLM - built with NextJS.

typescript ui ai nextjs self-hosted webui tailwindcss openai-api vllm llm-ui llm-webui vllm-ui

Updated Jul 28, 2024
TypeScript

x.infer

dnth / x.infer

Framework agnostic computer vision inference. Run 1000+ models by changing only one line of code. Supports models from transformers, timm, ultralytics, vllm, ollama and your custom model.

computer-vision transformers inference-api ultralytics pytorch-image-models vllm ollama

Updated Nov 14, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the vllm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vllm topic, visit your repo's landing page and select "manage topics."