LLM Models
Open Source/Weight Models
model | date | ctx | notes |
---|---|---|---|
Qwen2 | 2024-06-07 | 32k,64k,128k | 0.5, 1.5, 7, 57, 72 B by Alibaba |
LLAMA3 | 2024-04-18 | 8K | by Meta |
phi3 | 2024 | by Microsoft | |
gemma | 2024 | by Google DeepMind | |
mistral | 2024 | by Mistral AI | |
LLAMA2 | 2023 | 4K | by Meta |
GPT-3 | 2020 | 2k | 175B |
GPT-2 | 2019 | 1.5B | |
GPT-1 | 2018 | 0.12B |
Proprietary Models
model | date | notes |
---|---|---|
GPT-3.5-turbo | 2022 | 4K |
GPT-3.5-16k | 2022 | 16K |
GPT-3.5 | 2022 | ChatGPT,570GB Text |
GPT-4 | 2023 | |
GPT-4-32k | 2023 | |
GPT-4V | 2023 | |
GPT-4o | 2023 |
- https://ollama.com/library
- 内存占用计算方式
- 参数x精度
- 目前理想精度是 float16, bfloat16 - 1 个参数占用 16bit
- 1B -> 2GB
- 量化参数 - 常见量化 int4
- 1B -> 0.5GB
- https://huggingface.co/datasets/christopherthompson81/quant_exploration
- Q4_0 - worse accuracy but higher speed
- Q4_1 - more accurate but slower
- q4_2, q4_3 - new generations of q4_0 and q4_1, more accurate
- https://github.com/ggerganov/llama.cpp/discussions/406
- 7B - 8GB 内存
- 13B - 16GB 内存
- 70B - 32GB/48G 内存
- 小 context window 适用于 RAG
- Context Window
- LLama-3 8B 8K-1M https://ollama.com/library/llama3-gradient
- 256k context window requires at least 64GB of memory
- 1M+ context window requires significantly more (100GB+)
- LLama-3 8B 8K-1M https://ollama.com/library/llama3-gradient
- Leader board
- Visual
- microsoft/Florence-2-large
- MIT
- base 0.23B, large 0.77B
- Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
- microsoft/Florence-2-large
- google-deepmind/gemma
- Apache-2.0, Flax, JAX
- by Google DeepMind
- Ultra, Pro, Flash, Nano
- 2B, 7B
- llama2
- 7B, 13B, 70B
- uncensored
- llama2-uncensored
- 7B, 70B
- wizard-vicuna-uncensored
- 7B, 13B, 70B
- wizardlm-uncensored
- 13B
- https://erichartford.com/uncensored-models
- https://www.pixiv.net/novel/show.php?id=21039830
- llama2-uncensored
- microsoft/BitNet
- MIT, C++, Python
- by Microsoft
- HN
- vicuna
- mistral
- mixtral
- Flan
- Alpaca
- GPT4All
- Chinese LLaMA
- Vigogne (French)
- LLaMA
- Databricks Dolly 2.0
- https://huggingface.co/stabilityai/stable-diffusion-2
- togethercomputer/OpenChatKit
- Alpaca
- 基于 LLaMA + 指令训练
- FlagAI-Open/FlagAI
- hpcaitech/ColossalAI
- BlinkDL/ChatRWKV
- ChatGPT like
- RWKV (100% RNN)
- nebuly-ai/nebullvm
- FMInference/FlexGen
- EssayKillerBrain/WriteGPT
- GPT-2
- ymcui/Chinese-LLaMA-Alpaca
- https://www.promptingguide.ai/zh/models/collection
- Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models
- RedPajama-Data-v2
- hysts/ControlNet-v1-1
- ggml
- ggerganov/ggml
- MIT, C
- ggerganov/ggml
- .pth - PyTorch
- checklist.chk - MD5
- params.json -
{"dim": 4096, "multiple_of": 256, "n_heads": 32, "n_layers": 32, "norm_eps": 1e-06, "vocab_size": -1}
- Saving & Loading Models
- https://medium.com/geekculture/list-of-open-sourced-fine-tuned-large-language-models-llm-8d95a2e0dc76
- https://erichartford.com/uncensored-models
- https://huggingface.co/spaces/facebook/seamless_m4t
- https://github.com/LinkSoul-AI/Chinese-Llama-2-7b
- Jina AI 8k text embedding
# AVX = 1 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
grep avx /proc/cpuinfo --color # x86_64