Skip to main content

llama-swap

  • mostlygeek/llama-swap
    • MIT, Golang
    • 提供反向代理
    • 按需启动 进程
    • 可以使用 proxy 指向上游服务,cmd 可以是 docker 启动
docker pull ghcr.io/mostlygeek/llama-swap:cuda
llama:
image: ghcr.io/mostlygeek/llama-swap:v172-cuda-b7062
container_name: llama
command:
- -watch-config
- -config
- /llama/llama-swap.yaml
volumes:
- ./models:/models
- ./models:/root/.cache/llama.cpp
# the user is app, home is /app, uid=10001
- ./models:/app/.cache/llama.cpp
- ./llama:/llama
ports:
- 11435:8080
environment:
HF_ENDPOINT: https://modelscope.cn
sysctls:
net.ipv6.conf.all.disable_ipv6: 1
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]

API

  • OpenAI API
    • v1/completions
    • v1/chat/completions
    • v1/embeddings
    • v1/audio/speech
    • v1/audio/transcriptions
  • llama-server
    • v1/rerank, v1/reranking, /rerank
    • /infill
    • /completion
  • llama-swap
    • /ui
    • /upstream/:model_id
      • 直接访问上游
    • /models/unload
    • /running
    • /log
    • /health
    • /api/events
    • /logs/stream