llama
- LLaMA-7B, 3.5GB, 6GB
- LLaMA-13B, 6.5GB, 10GB
- LLaMA-30B, 15.8GB, 20GB
- LLaMA-65B, 31.2GB, 40GB
- https://news.ycombinator.com/item?id=35107058
- https://github.com/ZrrSkywalker/LLaMA-Adapter
- https://huggingface.co/blog/stackllama
# py for
apk add \
gcc g++ python3 py3-pip musl-dev cmake make pkgconf build-base \
git openssh-client binutils coreutils util-linux findutils sed grep tar wget curl neofetch \
rust cargo python3-dev openssl-dev linux-headers
# llama.cpp
# =========
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make -j
./main -m ./models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512
./main -m ./models/7B/ggml-model-q4_0.bin --file prompts/alpaca.txt --instruct --ctx_size 2048 --keep -1
./main -m ./models/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7
# https://github.com/ymcui/Chinese-LLaMA-Alpaca
# =========
apk add rust cargo python3-dev openssl-dev cmake linux-headers
pip install git+https://github.com/huggingface/transformers
pip install sentencepiece
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install peft
git clone https://github.com/huggingface/transformers
# musl pthread_attr_setaffinity_np
python ./transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py \
--input_dir /ml/models/LLaMA \
--model_size 7B \
--output_dir /ml/models/LLaMA-hf
ggml
FAQ
Ref
- https://rentry.org/lmg_models
- https://rentry.org/lmg-resources
- https://rentry.org/ayumi_erp_rating
- https://rentry.co/ALLMRR
- http://ayumi.m8geil.de/
- ERP - erotic role playing - 情色角色扮演
#->/lmg/ Model Links and Torrents <-
[TOC2]
Changelog (MDY)
[05-10-2023] - Added WizardLM 13B Uncensored [05-07-2023] - Added Vicuna 13B Cocktail, bluemoonrp-13b & AlpacaDente2 [05-05-2023] - Added CPU quantization variation links [05-02-2023] - Initial Rentry
4-bit GPU Model Requirements
!!! note VRAM Required takes full context (2048) into account. You may be able to load the model on GPU's with slightly lower VRAM, but you will not be able to run at full context. If you do not have enough RAM to load model, it will load into swap. Groupsize models will increase VRAM usage, as will running a LoRA alongside the model.
Model Parameters | VRAM Required | GPU Examples | RAM to Load |
---|---|---|---|
7B | 8GB | RTX 1660, 2060, AMD 5700xt, RTX 3050, RTX 3060, RTX 3070 | 6 GB |
13B | 12GB | AMD 6900xt, RTX 2060 12GB, 3060 12GB, 3080 12GB, A2000 | 12GB |
30B | 24GB | RTX 3090, RTX 4090, A4500, A5000, 6000, Tesla V100 | 32GB |
65B | 42GB | A100 80GB, NVIDIA Quadro RTX 8000, Quadro RTX A6000 | 64GB |
4-bit CPU/llama.cpp RAM Requirements
!!! note 5bit to 8bit Quantized models are becoming more common, and will obviously require more RAM. Will update these with the numbers when I have them.
Model | 4-bit | 5-bit | 8-bit |
---|---|---|---|
7B | 3.9 GB | ||
13B | 7.8 GB | ||
30B | 19.5 GB | ||
65B | 38.5 GB |
Original Weights
LLaMA 16-bit Weights
!!! info
The original LLaMA weights converted to Transformers @ 16bit. A torrent is available as well, but it uses outdated configuration files that will need to be updated. Note that these aren't for general use, as the VRAM requirements are beyond consumer scope.
>Filtering : None
Model | Type | Download |
---|---|---|
7B 16bit | HF Format | HuggingFace |
13B 16bit | HF Format | HuggingFace |
30B 16bit | HF Format | HuggingFace |
65B 16bit | HF Format | HuggingFace |
All the above | HF Format | Torrent Magnet |
LLaMA 4-bit Weights
!!! info
The original LLaMA weights quantized to 4-bit. The GPU CUDA versions have outdated tokenizer and configuration files. It is recommended to either update them with [this](https://rentry.org/544p2) or use the [universal LLaMA tokenizer.](https://github.com/oobabooga/text-generation-webui/blob/main/docs/LLaMA-model.md#option-1-pre-converted-weights)
>Filtering : None
Model | Type | Download |
---|---|---|
7B, 13B, 30B, 65B | CPU | Torrent Magnet |
7B, 13B, 30B, 65B | GPU CUDA (no groupsize) | Torrent Magnet |
7B, 13B, 30B, 65B | GPU CUDA (128gs) | Torrent Magnet |
7B, 13B, 30B, 65B | GPU Triton | Neko Institute of Science HF page |
Models/Finetunes/LoRA's
WizardLM 13B Uncensored (05/10/2023)
!!! info
This is WizardLM trained with a subset of the dataset - responses that contained alignment / moralizing were removed. The intent is to train a WizardLM that doesn't have alignment built-in, so that alignment (of any sort) can be added separately with for example with a RLHF LoRA.
Note that despite being an "uncensored" model, several tests have demonstrated that the model will still refuse to comply with certain requests.
>Filtering : Light
Model | Type | Download |
---|---|---|
13B GGML | CPU | Q5 |
13B | GPU | Q4 CUDA 128gs |
BluemoonRP 13B (05/07/2023)
!!! info
An RP/ERP focused finetune of LLaMA 13B finetuned on BluemoonRP logs. It is designed to simulate a 2-person RP session. Two versions are provided; a standard 13B with 2K context and an experimental 13B with 4K context. It has a non-standard format (LEAD/ASSOCIATE), so ensure that you read the model card and use the correct syntax.
>Filtering : None
Model | Type | Download |
---|---|---|
13B | GPU & CPU | https://huggingface.co/reeducator/bluemoonrp-13b |
Vicuna 13B Cocktail (05/07/2023)
!!! info
Vicuna 1.1 13B finetune incorporating various datasets in addition to the unfiltered ShareGPT. This is an experiment attempting to enhance the creativity of the Vicuna 1.1, while also reducing censorship as much as possible. All datasets have been cleaned. Additionally, only the "instruct" portion of GPTeacher has been used. It has a non-standard format (USER/ASSOCIATE), so ensure that you read the model card and use the correct syntax.
>Filtering : Light
Model | Type | Download |
---|---|---|
13B | GPU & CPU | https://huggingface.co/reeducator/vicuna-13b-cocktail |
GPT4-x-AlpacaDente2-30B (05/05/2023)
!!! info
ChanSung's Alpaca-LoRA-30B-elina merged with Open Assistant's second Finetune. Testing in progress.
>Filtering : Medium
Model | Type | Download |
---|---|---|
30B GGML | CPU | Q5 |
30B | GPU | Q4 CUDA |
https://huggingface.co/askmyteapot/GPT4-x-AlpacaDente2-30b-4bit
Vicuna 13B Free v1.1 (05/01/2023)
!!! info
A work-in-progress, community driven attempt to make an unfiltered version of Vicuna. It currently has an early stopping bug, and a partial workaround has been posted on the repo's model card.
>Filtering : Light
Model | Type | Download |
---|---|---|
13B | GPU & CPU | https://huggingface.co/reeducator/vicuna-13b-free |
Pygmalion/Metharme 7B (04/30/2023)
!!! info
Pygmalion 7B is a dialogue model that uses LLaMA-7B as a base. The dataset includes RP/ERP content. Metharme 7B is an experimental instruct-tuned variation, which can be guided using natural language like other instruct models.
PygmalionAI intend to use the same dataset on the higher parameter LLaMA models. No ETA as of yet.
>Filtering : None
Model | Type | Download |
---|---|---|
7B Pygmalion/Metharme | XOR | https://huggingface.co/PygmalionAI/ |
7B Pygmalion GGML | CPU | Q4, Q5, Q8 |
7B Metharme GGML | CPU | Q4, Q5 |
7B Pygmalion | GPU | Q4 Triton, Q4 CUDA 128gs |
7B Metharme | GPU | Q4 Triton, Q4 CUDA |
GPT4-X-Alpasta 30B (04/29/2023)
!!! info
An attempt at improving Open Assistant's performance as an instruct while retaining its excellent prose. The merge consists of Chansung's GPT4-Alpaca Lora and Open Assistant's native fine-tune.
It is an extremely coherent model for logic based instruct outputs. And while the prose is generally very good, it does suffer from the "Assistant" personality bleedthrough that plagues the OpenAssistant dataset, which can give you dry dialogue for creative writing/chatbot purposes. However, several accounts claim it's nowhere near as bad as OA's finetunes, and that the prose and coherence gains makes up for it.
>Filtering : Medium
Model | Type | Download |
---|---|---|
30B 4bit | CPU & GPU CUDA | https://huggingface.co/MetaIX/GPT4-X-Alpasta-30b-4bit |
OpenAssistant LLaMa 30B SFT 6 (04/23/2023)
!!! info
An open-source alternative to OpenAI’s ChatGPT/GPT 3.5 Turbo. However, it seems to suffer from [overfitting](https://www.datarobot.com/wiki/overfitting/) and is heavily filtered. Not recommended for creative writing or chat bots, given the "assistant" personality constantly bleeds through, giving you dry dialogue.
>Filtering : Heavy
Model | Type | Download |
---|---|---|
30B | XOR | https://huggingface.co/OpenAssistant/oasst-sft-6-llama-30b-xor |
30B GGML | CPU | Q4 |
30B | GPU | Q4 CUDA, Q4 CUDA 128gs |
SuperCOT (04/22/2023)
!!! info
SuperCOT is a LoRA trained with the aim of making LLaMa follow prompts for Langchain better, by infusing chain-of-thought datasets, code explanations and instructions, snippets, logical deductions and Alpaca GPT-4 prompts.
Though designed to improve Langchain, it's quite versatile and works very well for other tasks like creative writing and chatbots. The author also pruned a number of filters from the datasets. As of early May 2023, it's the most recommended model on /lmg/
>Filtering : Light
Model | Type | Download |
---|---|---|
Original LoRA | LoRA | https://huggingface.co/kaiokendev/SuperCOT-LoRA |
13B GGML | CPU | Q4, Q8 |
30B GGML | CPU | Q4, Q5, Q8 |
13B | GPU | Q4 CUDA 128gs |
30B | GPU | Q4 CUDA, Q4 CUDA 128gs |
Previous Model List
!!! info
The old rentry, retained for archiving purposes. Contains older and outdated models.
https://rentry.org/backupmdlist
Models for llama.cpp (ggml format)
LLaMA quantized 4-bit weights (ggml q4_0)
2023-03-31 torrent magnet
!!! info Tutorial link for llama.cpp !!! info Tutorial link for koboldcpp
SHA256 checksums:
2dad53e70ca521fedcf9f9be5c26c15df602487a9c008bdafbb2bf8f946b6bf0 llama-7b-ggml-q4_0/ggml-model-q4_0.bin
9cd4d6c1f5f42d5abf529c51bde3303991fba912ab8ed452adfd7c97a4be77d7 llama-13b-ggml-q4_0/ggml-model-q4_0.bin
daefbc6b1b644a75be0286ef865253ab3786e96a2c1bca8b71216b1751eee63e llama-33b-ggml-q4_0/ggml-model-q4_0.bin
d58a29c8403ecbd14258bbce07d90894fc5a8be25b9d359463c18f9f2ef96eb6 llama-65b-ggml-q4_0/ggml-model-q4_0.bin
ggml model file magic: 0x67676a74
(ggjt
in hex)
ggml model file version: 1
Alpaca quantized 4-bit weights (ggml q4_0)
Model | Download |
---|---|
LLaMA 7B fine-tune from chavinlo/alpaca-native | 2023-03-31 torrent magnet |
LLaMA 7B merged with tloen/alpaca-lora-7b LoRA | 2023-03-31 torrent magnet |
LLaMA 13B merged with chansung/alpaca-lora-13b LoRA | 2023-03-31 torrent magnet |
LLaMA 33B merged with chansung/alpaca-lora-30b LoRA | 2023-03-31 torrent magnet |
!!! info Tutorial link for llama.cpp
Example:
./main --model ggml-model-q4_0.bin --file prompts/alpaca.txt --instruct --ctx_size 2048 --keep -1
!!! info Tutorial link for koboldcpp
SHA256 checksums:
f5e264b10944c55a84810e8073dfdcd653fa8e47ff50ea043ec071051ac7821d alpaca-7b-ggml-q4_0-native-finetune/ggml-model-q4_0.bin
d9777baad5cf6a5d196e70867338d8cc3c7af68c7744e68de839a522983860d7 alpaca-7b-ggml-q4_0-lora-merged/ggml-model-q4_0.bin
3838aa32651c65948e289374abd71f6feab1a62a4921a648e30d979df86a4af3 alpaca-13b-ggml-q4_0-lora-merged/ggml-model-q4_0.bin
2267ed1dc0bf0d6d300ba292c25083c7fa5395f3726c7c68a49b2be19a64b349 alpaca-33b-ggml-q4_0-lora-merged/ggml-model-q4_0.bin
ggml model file magic: 0x67676a74
(ggjt
in hex)
ggml model file version: 1
GPT4All 7B quantized 4-bit weights (ggml q4_0)
2023-03-31 torrent magnet
!!! info Tutorial link for llama.cpp
GPT4All can be used with llama.cpp in the same way as the other ggml
models.
!!! info Tutorial link for koboldcpp
SHA256 checksums:
9f6cd4830a3c45a86147c80a32888e7be8f8a489284c87cdb882a7cfe40940c1 gpt4all-unfiltered-7b-ggml-q4_0-lora-merged/ggml-model-q4_0.bin
de314c5ee155ac40a03ca3b3be85ba2b02aef9e9f083c411c0b4490689dd047e gpt4all-7b-ggml-q4_0-lora-merged/ggml-model-q4_0.bin
ggml model file magic: 0x67676a74
(ggjt
in hex)
ggml model file version: 1
GPT4 x Alpaca 13B quantized 4-bit weights (ggml q4_0)
2023-04-01 torrent magnet
!!! info Tutorial link for llama.cpp
GPT4 x Alpaca can be used with llama.cpp in the same way as the other ggml
models.
Text generation with this version is faster compared to the GPTQ-quantized one.
!!! info Tutorial link for koboldcpp
SHA256 checksum:
e6b77ebf297946949b25b3c4b870f10cdc98fb9fcaa6d19cef4dda9021031580 gpt4-x-alpaca-13b-ggml-q4_0/ggml-model-q4_0.bin
ggml model file magic: 0x67676a74
(ggjt
in hex)
ggml model file version: 1
GPT4 x Alpaca 13B quantized 4-bit weights (ggml q4_1 from GPTQ with groupsize 128)
2023-04-01 torrent magnet
!!! info Tutorial link for llama.cpp
GPT4 x Alpaca can be used with llama.cpp in the same way as the other ggml
models.
!!! info Tutorial link for koboldcpp
SHA256 checksum:
d4a640a1ce33009c244a361c6f87733aacbc2bea90e84d3c304a4c8be2bdf22d gpt4-x-alpaca-13b-ggml-q4_1-from-gptq-4bit-128g/ggml-model-q4_1.bin
ggml model file magic: 0x67676a74
(ggjt
in hex)
ggml model file version: 1
Vicuna 13B quantized 4-bit weights (ggml q4_0)
2023-04-03 torrent magnet
!!! info Tutorial link for llama.cpp
Vicuna can be used with llama.cpp in the same way as the other ggml
models.
!!! info Tutorial link for koboldcpp
SHA256 checksum:
f96689a13c581f53b616887b2efe82bbfbc5321258dbcfdbe69a22076a7da461 vicuna-13b-ggml-q4_0-delta-merged/ggml-model-q4_0.bin
ggml model file magic: 0x67676a74
(ggjt
in hex)
ggml model file version: 1
OpenAssistant LLaMA 13B quantized 4-bit weights (ggml q4_0 & q4_1)
!!! warning Note that this model is work-in-progress.
2023-04-07 torrent magnet | HuggingFace Hub direct download
!!! info Tutorial link for llama.cpp !!! info Tutorial link for koboldcpp
SHA256 checksums:
fe77206c7890ecd0824c7b6b6a6deab92e471366b2e4271c05ece9a686474ef6 ggml-model-q4_0.bin
412da683b6ab0f710ce0adc8bc36db52bb92df96698558c5f2a1399af9bd0a78 ggml-model-q4_1.bin
ggml model file magic: 0x67676a74
(ggjt
in hex)
ggml model file version: 1
Original model source GPTQ-quantized model source Torrent source
Models for HuggingFace 🤗
!!! danger Updated tokenizer and model configuration files can be found here.
Ensure that your models have the appropriate JSON files within the same directory as the weights, otherwise text generation might be impacted by tokenization problems. The issues were addressed here and here, but a manual update of both the transformers
library and your model configuration files is required.
LLaMA float16 weights
2023-03-26 torrent magnet | HuggingFace Hub direct downloads
!!! info Tutorial link for Text generation web UI
Torrent source and SHA256 checksums
Vicuna 13B float16 weights
2023-04-03 torrent magnet
!!! info Tutorial link for Text generation web UI
LLaMA quantized 4-bit weights (GPTQ format without groupsize)
2023-03-26 torrent magnet
!!! info Tutorial link for Text generation web UI
SHA256 checksums:
09841a1c4895e1da3b05c1bdbfb8271c6d43812661e4348c862ff2ab1e6ff5b3 llama-7b-4bit/llama-7b-4bit.safetensors
edfa0b4060aae392b1e9df21fb60a97d78c9268ac6972e3888f6dc955ba0377b llama-13b-4bit/llama-13b-4bit.safetensors
4cb560746fe58796233159612d8d3c9dbdebdf6f0443b47be71643f2f91b8541 llama-30b-4bit/llama-30b-4bit.safetensors
886ce814ed54c4bd6850e2216d5f198c49475210f8690f45dc63365d9aff3177 llama-65b-4bit/llama-65b-4bit.safetensors
Torrent source and more information
LLaMA quantized 4-bit weights (GPTQ format with groupsize 128)
2023-03-26 torrent magnet
!!! info Tutorial link for Text generation web UI
Groupsize 128
is a better choice for the 13B, 33B and 65B models, according to this.
SHA256 checksums:
ed8ec9c9f0ebb83210157ad0e3c5148760a4e9fd2acfb02cf00f8f2054d2743b llama-7b-4bit-128g/llama-7b-4bit-128g.safetensors
d3073ef1a2c0b441f95a5d4f8a5aa3b82884eef45d8997270619cb29bcc994b8 llama-13b-4bit-128g/llama-13b-4bit-128g.safetensors
8b7d75d562938823c4503b956cb4b8af6ac0a5afbce2278566cc787da0f8f682 llama-30b-4bit-128g/llama-30b-4bit-128g.safetensors
f1418091e3307611fb0a213e50a0f52c80841b9c4bcba67abc1f6c64c357c850 llama-65b-4bit-128g/llama-65b-4bit-128g.safetensors
Torrent source and more information
Alpaca quantized 4-bit weights (GPTQ format with groupsize 128)
Model | Download |
---|---|
LLaMA 7B fine-tune from ozcur/alpaca-native-4bit as safetensors | 2023-03-29 torrent magnet |
LLaMA 33B merged with baseten/alpaca-30b LoRA by an anon | 2023-03-26 torrent magnet | extra config files |
!!! info Tutorial link for Text generation web UI
SHA256 checksums:
17d6ba8f83be89f8dfa05cd4720cdd06b4d32c3baed79986e3ba1501b2305530 Alpaca-7B-GPTQ-4bit-128g-native-finetune_2023-03-29/alpaca-7b-4bit-128g-native-finetune.safetensors
a2f8d202ce61b1b612afe08c11f97133c1d56076d65391e738b1ab57c854ee05 Alpaca-30B-4bit-128g/alpaca-30b-hf-4bit.safetensors