Skip to main content

Models

Open Weights/Transformer

DateModelSizeContext WindowCreatortags
2025-06-11Magistralsmall 24B39KMistral AIReasoning, Multilingual
2025-06-07Comma v0.17BEleutherAIFull OSS, English
2025-06-05Qwen3-Embedding0.6b, 4b, 8b32KAlibabaEmbedding, Reranking, Multilingual(100+), Instruction Aware, MRL(1024, 2560, 4096)
2025-05-28DeepSeek R1 0528DeepSeek AIUpdate
2025-05-26QwenLong-L132b120KAlibabatext
2025-05-20Gemma3n5b-e2b, 8b-e4bGoogleEdge, PLE
2025-04-29Qwen30.6b, 1.7b, 4b, 8b, 14b, 30b, 32b, 235b, 30b-a3b, 235b-a22b40KAlibabaMoE, Reasoning
2025-04-05Llama 4scout 109b-a17b ,marverik 400b-a17b, 2T1M, 10MMetaMoE, Vision
2025-03-26Qwen2.5-Omni3B, 7BAlibabatext, audio, image, video, speech
2025-03-12Gemma31b, 4b, 12b, 27b128KGoogle DeepMindVision
2025-02-25Wan 2.11.3b,14bAlibabat2v, 480P, 720p
2025-02-24smollm2135m, 360m, 1.7b8KHuggingFaceTB
2025-01-28Qwen2.5-VL3b, 7b, 32b, 72b125KAlibabaVision
2025-01-28Qwen2.50.5b, 1.5b, 3b, 7b, 14b, 32b, 72b32K,1MAlibaba
2025-01-20DeepSeek R11.5b, 7b, 8b, 14b, 32b, 70b, 671b128KDeepSeek AIReasoning
2024-12-07Llama 3.370B128KMeta
2024-12Phi-414b128KMicrosoftmini-reasoning,reasoning,multimodal
2024-11-21LTX-Video2b, 13bLightricksT2V
2024-10-05LLaVA7b, 13b, 34b4K, 32KVision
2024-09-25Llama 3.21B, 3B, 11B, 90B128KMeta
2024-07-23Llama 3.18B, 70.6B, 405B128KMeta
2024-06-27Gemma 29b, 27.2b8KGoogle DeepMind
2024-06-07Qwen20.5b, 1.5b, 7b, 57b (A14b), 72b32K, 64K, 128KAlibaba
2024-04-23Phi-33.8b , 7b , 14b4K, 128KMicrosoft
2024-04-18Llama 38b, 70.6b8K, 128KMeta
2024-02-21Gemma2b, 7b8KGoogle DeepMind
2023-12-11Mistral7b, 46.7b (8x7B MoE)33KMistral AI
2023-07-18Llama 26.7b, 13b, 69b4KMeta
2023-02-24LLaMA6.7B, 13B, 32.5B, 65.2B2KMeta
2020-06-11GPT-3175b2KOpenAI
2019-02-14GPT-21.5b1KOpenAI
2018-06-11GPT-1117m512OpenAI
datemodelauthornotes
2025-06MobileNet V5Google256x256, 512x512, 768x768, CNN,Gemma 3n
2025-02-19YOLOv12
2024-08SAM v2Meta
2024-04MobileNet V4Google
2023-04SAMMeta
2019-05MobileNet V3Google
2019-03MobileNet V2Google
2017-04MobileNetGoogle

Proprietary Models

releasemodeloutputinput priceauthornotes
2025-05-20Imagen 4Googlet2i
2025-06Kling 2.1$0.28/s
2025-05Veo 30.50/s,audio0.50/s, audio 0.75/sGooglet2v
2025-06Gemini 2.5 Pro$10.00/1M$1.25/1M
2025-06Gemini 2.5 Flash$2.50/1M0.30/1M,audio0.30/1M, audio 1.00/1MGoogle
2025-06Gemini 2.5 Flash-Lite$0.40/1M0.10/1M,audio0.10/1M, audio 0.50/1MGoogle
2025-05FLUX.1 Kontext max/proBlack Forest Labst2i
2025-04-17Gemini 2.5 FlashGoogle
2025-04-16Seedream 3.0Bytedancet2i
2025-04-14GPT-4.1, mini, nanoOpenAI
2025-03-25Gemini 2.5 ProGoogle2M
2025-02-05Gemini 2.0 FlashGoogleaudio, video
2025-02-01Gemini 2.0 Flash-LiteGoogle
2025-01-10o3, o3-miniOpenAIReasoning
2024-12-17o1OpenAI
2024-12Veo 2$0.35/sGooglet2v
2024-10Recraft V3Recraft
2024-09-12o1-previewOpenAIReasoning
2024-08Imagen 3Googlet2i
2024-07-18GPT-4o miniOpenAI
2024-05-13GPT-4oOpenAItext, audio, image
2024-03-04Claude 3 Haiku, Sonnet, OpusAnthropic200K
2024-02-15Gemini 1.5 ProGoogle突破性的100万token超长上下文窗口
2023-12-06Gemini 1.0 ProGoogle原生多模态模型家族
2023-11-21Claude 2.1Anthropic200K
2023-11-06GPT-4VOpenAI128K, Vision
2023-11-06GPT-4 TurboOpenAI128K
2023-07-11Claude 2Anthropic100K
2023-06-27GPT-3.5-16kOpenAI16K
2023-03-14GPT-4OpenAI8K, 32K , image
2023-03-01GPT-3.5-turboOpenAI4K
2022-11-30GPT-3.5OpenAI4K
abbr.stand formeaning
MRLMatryoshka Representation Learning
R2Vreference-to-video
MV2Vmasked video-to-video
V2Vvideo-to-video
MoEMixture of Experts混合专家模型
VACEVideo Animation, Composition, and Editing
  • MRL - Matryoshka Representation Learning
  • VACE: All-in-One Video Creation and Editing
  • R2V - reference-to-video
  • MV2V - masked video-to-video
  • *-pt - Pre-Training - 预训练模型
    • 在大规模数据集上进行初始训练,学习语言模式和结构。
    • 该模型适合作为基础模型,供开发者在特定任务上进行进一步的微调。
  • *-ft
    • Fine-tuned
  • *-it - Instruction Tuning - 经过指令微调的模型
    • 在预训练模型的基础上,进一步针对特定任务或指令进行了微调。
    • 此版本更适合直接应用于实际任务,因为它已经针对特定用途进行了优化。


按照 商业公司分类 模型之间关联性高,模型有连续性。虽然会扩展调整各种能力,但是 Base 模型的发展和用到的技术会相对连续。

# AVX = 1 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
grep avx /proc/cpuinfo --color # x86_64

VLM

Computer Agent

中文

Fine-tuning

TTS

casedescnotesmodels
虚拟助手AI助手提供自然语音应答,优化交互自然度高、低延迟、多情感XTTS-v2, MeloTTS, F5-TTS, ChatTTS
无障碍解决方案为视觉及学习障碍者提供语音内容清晰度高、易懂、稳定性好MeloTTS, Bark
内容创作生成播客、有声书等专业配音音色多样、情感丰富、韵律自然XTTSv2, F5-TTS, GPT-SoVITS-v2
自动化客服IVR系统赋能高效自动化客服清晰稳定、可定制性强Piper, ParlerTTS, XTTSv2
语音自助终端自助终端的交互式语音应答响应快速、清晰易懂Piper, MeloTTS

STT

  • STT - Speech to Text - 语音转文本
  • ASR - Automatic Speech Recognition - 自动语音识别
  • modelscope/FunASR

Music

MLLM

  • Multimodal Large Language Model - 多模态大语言模型
  • 结构: 视觉编码器 + 投影器 + 语言模型
  • Vision Model
    • ViT
  • Language Model
  • Projector / Vision-Language Adapter
    • 将视觉模型提取出的图像特征与语言模型的表示空间对齐
    • Cross-Attention Module - 交叉注意力模块

Vision

  • Document OCR - 文档 OCR
  • Handwriting OCR - 手写 OCR
  • Visual QA / Image QA - 图片 QA
  • Visual Reasoning - 图像推理
  • Image Classification - 图片分类
  • Document Understanding - 文档理解
  • Video Understanding - 视频理解
  • Object Detection - 对象识别
  • Object Counting - 对象计数
  • Agent - 屏幕理解操作
  • Object Grounding - 物体定位

Coding

Video

Image Generation

问题领域

  • Prompt adherence(提示词遵循度)
  • Generation quality(生成质量)
  • Instructiveness(可指导性)
  • Consistency of styles, characters, settings, etc.(风格、角色、设置的一致性)
  • Deliberate and exact intentional posing of characters and set pieces(角色和场景元素的精确姿态和故意摆放)
  • Compositing different images or layers together(将不同图像或图层组合在一起)
  • Relighting(重新打光)
  • Posing built into the model. No ControlNet hacks.(姿态控制内置于模型中,无需ControlNet等“黑科技”)
  • References built into the model. No IPAdapter, no required character/style LoRAs, etc.(参考功能内置于模型中,无需IPAdapter、角色/风格LoRA等)
  • Ability to address objects, characters, mannequins, etc. for deletion / insertion.(能够针对物体、角色、人体模型等进行删除/插入操作)
  • Ability to pull sources from across multiple images with or without "innovation" / change to their pixels.(能够从多张图片中提取来源,无论是否对其像素进行“创新”/更改)
  • Fine-tunable (so we can get higher quality and precision)(可微调,以获得更高的质量和精度)

Media Generation

Generative Marketing

媒体

  • 可灵 (Kling) 2.1:
    • 根据图像生成视频,效果不错
    • 动态表情、大动态运镜、精确手势控制、演唱口型
    • 旋转机位运镜和口型演出
  • Veo 3:
    • 根据文本提示生成视频
    • 实拍效果模拟
  • Sora:
    • 将现有视频转换为新风格
  • Pika:
    • 在场景中切换或添加内容
  • Runway:
    • 引用人物、地点或风格(Gen-3)
  • Luma:
    • 将视频重新调整为新的宽高比
  • Hedra:
    • 让角色说话(口型同步)
  • 即梦 (Instant Dream):
    • 网上很多视频就是他做的
    • 即梦 Omnihuman: 擅长静态口型
  • Vidu:
    • 二次元动漫演绎
  • Viggle:
    • 将角色添加到视频表情包中(角色动作迁移)
  • Higgsfield
    • 使用好莱坞级视觉效果
  • 剪映专业版:
    • 功能强大,素材特效丰富,剪辑视频必装软件
  • Krea
    • 使用 Wan 或 Hunyuan 等开源模型
  • 美图秀秀:
    • 直接绘画,大家用的惯

文本

  • 豆包:
    • 专注情感,生活场景必备
  • Kimi:
    • 专业长文,就是不怕内容多
  • Deepseek:
    • 写代码完全不出错,强的可怕
  • 知乎:
    • 喜欢知乎文章的朋友必备
  • gamma:
    • 全球最牛的 PPT,根据你的文章直接定制化生成
  • MindShow:
    • 输入文字大纲,自动整理成思维导图,还能一键转换成演示文稿

设计

  • 稿定设计:
    • 涵盖平面设计、电商设计等,提供超多可编辑模板,满足各种设计需求
  • 易企秀:
    • 能快速做 H5 页面,模板种类多,适合活动宣传、产品推广

检索

  • https://felo.ai/
    • 全网最好用的小红书搜索工具,不知道的绝对是一大遗憾

Avatar

  • 数字人

  • HunyuanVideo-Avatar

  • tencent-ailab/IP-Adapter

    • Text-to-Image
  • Pony

    • finetune on SDXL
    • trained on 2.5 million furry/anthro/cartoon/anime images
    • 能直接识别很多动漫角色,不需要 lora

Diffusion

  • I2I Image to Image
  • Image Edit
  • Image Generation
  • Image Inpainting
  • Image Upscale
  • Image Variation
  • T2I Text to Image
  • T2V Text to Video
  • Video Generation
  • In/Out-Painting
  • Structural Conditioning
datemodelsizeauthornotes
2025-05-29FLUX.1 Kontextdev,max,pro
2024-10-22SD 3.5turbo, large, medium ,2.5B, 8BStability AI
2024-08-01FLUX.1dev, schnell, 12B, proBlack Forest Labs
2024-02SD 3.0800M, 8BStability AI
2023-11SDXL TurboStability AI
2023-07SDXL 1.03.5B
2022-12SD v2.1
2022-11SD v2.0
2022-10SD 1.5983MRunwayML
2022-08SD 1.1 1.2 1.3 1.4CompVis
base modeldate
Aura Flow
CogVideoX
Flux .1 D, flux.1-dev2024-08-01
Flux .1 Kontext2025-05-29
Flux .1 S, flux.1-schnell2024-08-01
HiDream
Hunyuan 1
Hunyuan Video
Illustrious
Imagen 42025-05-20
Kolors
LTXV
Lumina
Mochi
NoobAI
ODOR
Open AI
Other
PixArt Σ
PixArt Α
Playground V2
Pony
SD 1.4
SD 1.5
SD 1.5 Hyper
SD 1.5 LCM
SD 2.0
SD 2.0 768
SD 2.1
SD 2.1 768
SD 2.1 Unclip
SD 3
SD 3.5
SD 3.5 Large
SD 3.5 Large Turbo
SD 3.5 Medium
SDXL 0.9
SDXL 1.0
SDXL 1.0 LCM
SDXL Distilled
SDXL Hyper
SDXL Lightning
SDXL Turbo
SVD
SVD XT
Stable Cascade
WAN Video
Wan Video 1.3B T2v
Wan Video 14B I2v 480p
Wan Video 14B I2v 720p
Wan Video 14B T2v
Mode Typecn
Aesthetic Gradient美学渐变
Checkpoint检查点
Controlnet控制网
Detection检测
DoRADoRA
Hypernetwork超网络
LoRALoRA
LyCORISLyCORIS
Motion动态
Other其他
Poses姿势
Embedding嵌入
Upscaler超分辨率
VAE变分自编码器
Wildcards通配符
Workflows工作流
Checkpoint Type
Merge
Trained
File Type
Core ML
Diffusers
GGUF
ONNX
Other
Pickle Tensor
Safe Tensor
Pt
Categorycn
Action动作
Aesthetic美学
Architecture建筑
Animal动物
Assets资产
Background背景
Base Model基础模型
Buildings建筑物
Celebrity名人
Character角色
Clothing服装
Concept概念
Objects物体
Poses姿势
Style风格
Tool工具
Vehicle交通工具

  • OpenAI CLIP
    • Contrastive Language-Image Pre-training - 对比语言-图像预训练
    • -L Vision Transformer, ViT, Large
    • 零样本图像分类 (Zero-shot Image Classification)
    • ViT-L/14
  • OpenCLIP
    • 对 OpenAI CLIP 的开源复现和扩展, 提高透明度和性能
    • LAION数据集
    • bigG - ViT-bigG/14
  • Google T5-XXL
    • Text-to-Text Transfer Transformer(文本到文本迁移Transformer)
    • 将所有自然语言处理(NLP)任务都统一为一种“文本到文本”的格式
    • 在输入文本前加上一个 任务前缀 来告诉模型需要做什么
      • translate English to German: That is good.
      • summarize: [一篇很长的文章]
      • cola sentence: The course is jumping well.
    • FLAN-T5

Prompting

[人物描述] [场景构建] [摄影参数] [氛围强化] [细节补充]

Resolution

  • 1:1
    • 512x512
    • 768x768
    • 1024x1024
  • 4:3
  • 16:9
    • 1216x704
  • 9:16
    • 704x1216
  • Portrait
    • 832x1216
  • Landscape
    • 1216x832

Negative

text
watermark
camera
out of frame, lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature,
(worst quality, low quality:1.4), (ugly:1.2), (stitching:1.2),
bad anatomy, deformed, disfigured, malformed limbs, extra limbs, fused limbs,
poorly drawn face, distorted face, malformed face, asymmetric eyes,
poorly drawn hands, extra fingers, fused fingers, malformed hands,
text, error, signature, watermark, username

动态姿势

服装细节

trendy off-shoulder top
oversized cozy sweater
t-shirt with a cute cat print

质量

(shot on Sony A7 IV, 50mm f/1.8 lens)
photorealistic
ultra detailed
natural skin texture
soft film grain
8k uhd

氛围强化

  • 氛围
serene, tranquil, intimate, modern elegance
  • 色彩
Warm neutrals with pops of soft pastels in window light
  • 动态
Subtle light reflection on sweater fabric, gentle light diffusion across the room

摄影参数

  • 视觉风格
  • 光影处理

Chroma

Realisian

  • SD 1.5
  • steps 12 (8 ≈ 16)
  • DPM++ SDE Karras
  • Hires Fix Required
    • On
    • Upscaler: Latent (bicubic antialised)
    • Hires Steps: 5 (4 ≈ 10)
    • Denoising Strenght: 0.55 (0.4 ≈ 0.7)
    • Upscale by: 2
  • CFG Scale 3 (2 ≈ 5)
  • Clip Skip 1 (1 ≈ 2)

Negative

embedding:realisianNeg.Z5yh, Realisian-Neg

Juggernaut XL

  • SDXL 1.0

  • Juggernaut Ragnarok
    • 专注于提升照片写实感、数字绘画、人物姿势、手部和脚部等方面的表现。
    • 该模型以 Jug XII 为基础,首先通过摄影数据集训练,进一步使用 Booru 标签进行重标注,并以 SDXL 作为底座进行训练。随后,作者又以 Lustify by Coyotte 为基础对同一数据集再次训练,并将其以一定比例合并,作为输出的稳定器。由于数据集采用 Booru 标签标注,Booru 风格的提示词和 X–XII 版本的描述方式在 Ragnarok 上都能很好地工作。
    • 适合用于追求高质量写实风格的图像生成项目,但作为 SDXL 模型,仍存在如远距离人脸、文本渲染等方面的局限。推荐将其作为生成管道中的一环(如 FluxDev / Pixelwave / Jug Flux Pro → Juggernaut Ragnarok)以获得更佳效果。模型完全开源,支持自由合并、微调和商用。
Base ModelSDXL 1.0
Resolution832x1216 for Portrait
SamplerDPM++ 2M SDE
Steps30-40
CFG3-6 (less is a bit more realistic)
VAE
HiRes4xNMKD-Siax_200k with 15 Steps and 0.3 Denoise + 1.5 Upscale

CyberRealistic Pony

CyberRealistic Pony 是将 Pony Diffusion 的可爱风格与 CyberRealistic 的写实质感结合的模型。

Base ModelPony
Resolution896x1152 / 832x1216
SamplerDPM++ SDE Karras / DPM++ 2M Karras / Euler a
Steps30+ Steps
CFG5
Clip Skip2

Positive

score_9, score_8_up, score_7_up, (SUBJECT),

Negative

score_6, score_5, score_4, (worst quality:1.2), (low quality:1.2), (normal quality:1.2), lowres, bad anatomy, bad hands, signature, watermarks, ugly, imperfect eyes, skewed eyes, unnatural face, unnatural body, error, extra limb, missing limbs
score_6, score_5, score_4, simplified, abstract, unrealistic, impressionistic, low resolution, lowres, bad anatomy, bad hands, missing fingers, worst quality, low quality, normal quality, cartoon, anime, drawing, sketch, illustration, artificial, poor quality

ADetailer

Adetailer model: face_yolov9c.pt
If you only want the main face being refined set 'Mask only the top k largest' to 1.

Metric

abbr.stand forbettermeaningnotes
WERWord Error Rate⬇️ L词错误率STT
RTFxReal-Time Factor⬆️ H实时因子STT
CERCharacter Error Rate字符错误率STT
PERPhoneme Error Rate音素错误率STT

Datasets