Inference Awesome
- ollama
- toruchrun
- vLLM Virtual Large Language Model
- PagedAttention
- SGLang
- localai
- NVIDIA/TensorRT-LLM
- Apache-2.0, C++, Python
- trt
- huggingface/text-generation-inference
- Apache-2.0, Python, Rust
- HF TGI
- triton-inference-server/server
- BSD-3, Python, C++
- NVIDIA Triton
- bentoml/BentoML
- Apache-2.0, Python
- Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines
- bentoml/BentoDiffusion
- bentoml/OpenLLM
- Image
- ComfyUI
- AUTOMATIC1111/stable-diffusion-webui
- A1111
- SD
- Audio
- Whisper
- Embeddings
- michaelfeil/infinity
- MIT, Python
- michaelfeil/infinity
- exo-explore/exo
- GPLv3, Python
- Tencent/ncnn
- BSD-3, C/C++
- neural network inference framework optimized for the mobile platform
- InternLM/lmdeploy
- mit-han-lab/nunchaku
- Apache-2.0, Python, C++
- Nunchaku is a high-performance inference engine optimized for 4-bit neural networks
- SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
- theroyallab/tabbyAPI
- AGPLv3, Python
- turboderp-org/exllamav2
- MIT, Python
- turboderp-org/exllamav3
- MIT, Python
- Reading