Skip to main content

Models

DateModelSizeContext WindowCreatortags
2025-06-11Magistralsmall 24B39KMistral AIReasoning, Multilingual
2025-06-07Comma v0.17BEleutherAIFull OSS, English
2025-06-05Qwen3-Embedding0.6b, 4b, 8b32KAlibabaEmbedding, Reranking, Multilingual(100+), Instruction Aware, MRL(1024, 2560, 4096)
Phi-414b128KMicrosoftmini-reasoning,reasoning,multimodal
2025-05-28DeepSeek R1 0528DeepSeek AIUpdate
2025-05-26QwenLong-L132b120KAlibabatext
2025-05-20Gemma3n8b-e2b, 8b-e4bGoogleEdge, PLE
2025-04-29Qwen30.6b, 1.7b, 4b, 8b, 14b, 30b, 32b, 235b, 30b-a3b, 235b-a22b40KAlibabaMoE, Reasoning
2025-04-05Llama 4scout 109b-a17b ,marverik 400b-a17b, 2T1M, 10MMetaMoE, Vision
2025-03-26Qwen2.5-Omni3B, 7BAlibabatext, audio, image, video, speech
2025-03-12Gemma31b, 4b, 12b, 27b128KGoogle DeepMindVision
2025-02-26Wan 2.11.3b,14bAlibabat2v, 480P, 720p
2025-02-24smollm2135m, 360m, 1.7b8KHuggingFaceTB
2025-01-28Qwen2.5-VL3b, 7b, 32b, 72b125KAlibabaVision
2025-01-28Qwen2.50.5b, 1.5b, 3b, 7b, 14b, 32b, 72b32K,1MAlibaba
2025-01-20DeepSeek R11.5b, 7b, 8b, 14b, 32b, 70b, 671b128KDeepSeek AIReasoning
2024-12-07Llama 3.370B128KMeta
2024-10-05LLaVA7b, 13b, 34b4K, 32KVision
2024-09-25Llama 3.21B, 3B, 11B, 90B128KMeta
2024-07-23Llama 3.18B, 70.6B, 405B128KMeta
2024-06-27Gemma 29b, 27.2b8KGoogle DeepMind
2024-06-07Qwen20.5b, 1.5b, 7b, 57b (A14b), 72b32K, 64K, 128KAlibaba
2024-04-23Phi-33.8b , 7b , 14b4K, 128KMicrosoft
2024-04-18Llama 38b, 70.6b8K, 128KMeta
2024-02-21Gemma2b, 7b8KGoogle DeepMind
2023-12-11Mistral7b, 46.7b (8x7B MoE)33KMistral AI
2023-07-18Llama 26.7b, 13b, 69b4KMeta
2023-02-24LLaMA6.7B, 13B, 32.5B, 65.2B2KMeta
2020-06-11GPT-3175b2KOpenAI
2019-02-14GPT-21.5b1KOpenAI
2018-06-11GPT-1117m512OpenAI

Proprietary Models

releasemodelauthornotes
2025-04-17Gemini 2.5 FlashGoogle
2025-04-14GPT-4.1, mini, nanoOpenAI
2025-03-25Gemini 2.5 ProGoogle2M
2025-02-05Gemini 2.0 FlashGoogleaudio, video
2025-02-01Gemini 2.0 Flash-LiteGoogle
2025-01-10o3, o3-miniOpenAIReasoning
2024-12-17o1OpenAI
2024-09-12o1-previewOpenAIReasoning
2024-07-18GPT-4o miniOpenAI
2024-05-13GPT-4oOpenAItext, audio, image
2024-03-04Claude 3 Haiku, Sonnet, OpusAnthropic200K
2024-02-15Gemini 1.5 ProGoogle突破性的100万token超长上下文窗口
2023-12-06Gemini 1.0 ProGoogle原生多模态模型家族
2023-11-21Claude 2.1Anthropic200K
2023-11-06GPT-4VOpenAI128K, Vision
2023-11-06GPT-4 TurboOpenAI128K
2023-07-11Claude 2Anthropic100K
2023-06-27GPT-3.5-16kOpenAI16K
2023-03-14GPT-4OpenAI8K, 32K , image
2023-03-01GPT-3.5-turboOpenAI4K
2022-11-30GPT-3.5OpenAI4K
abbr.stand formeaning
MRLMatryoshka Representation Learning
R2Vreference-to-video
MV2Vmasked video-to-video
V2Vvideo-to-video
MoEMixture of Experts混合专家模型
VACEVideo Animation, Composition, and Editing
  • MRL - Matryoshka Representation Learning
  • VACE: All-in-One Video Creation and Editing
  • R2V - reference-to-video
  • MV2V - masked video-to-video
  • *-pt - Pre-Training - 预训练模型
    • 在大规模数据集上进行初始训练,学习语言模式和结构。
    • 该模型适合作为基础模型,供开发者在特定任务上进行进一步的微调。
  • *-ft
    • Fine-tuned
  • *-it - Instruction Tuning - 经过指令微调的模型
    • 在预训练模型的基础上,进一步针对特定任务或指令进行了微调。
    • 此版本更适合直接应用于实际任务,因为它已经针对特定用途进行了优化。


按照 商业公司分类 模型之间关联性高,模型有连续性。虽然会扩展调整各种能力,但是 Base 模型的发展和用到的技术会相对连续。

# AVX = 1 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
grep avx /proc/cpuinfo --color # x86_64

Computer Agent

中文

Fine-tuning

Audio

casedescnotesmodels
虚拟助手AI助手提供自然语音应答,优化交互自然度高、低延迟、多情感XTTS-v2, MeloTTS, F5-TTS, ChatTTS
无障碍解决方案为视觉及学习障碍者提供语音内容清晰度高、易懂、稳定性好MeloTTS, Bark
内容创作生成播客、有声书等专业配音音色多样、情感丰富、韵律自然XTTSv2, F5-TTS, GPT-SoVITS-v2
自动化客服IVR系统赋能高效自动化客服清晰稳定、可定制性强Piper, ParlerTTS, XTTSv2
语音自助终端自助终端的交互式语音应答响应快速、清晰易懂Piper, MeloTTS

STT

  • STT - Speech to Text - 语音转文本
  • ASR - Automatic Speech Recognition - 自动语音识别
  • modelscope/FunASR

MLLM

  • Multimodal Large Language Model - 多模态大语言模型
  • 结构: 视觉编码器 + 投影器 + 语言模型
  • Vision Model
    • ViT
  • Language Model
  • Projector / Vision-Language Adapter
    • 将视觉模型提取出的图像特征与语言模型的表示空间对齐
    • Cross-Attention Module - 交叉注意力模块

Vision

  • Document OCR - 文档 OCR
  • Handwriting OCR - 手写 OCR
  • Visual QA / Image QA - 图片 QA
  • Visual Reasoning - 图像推理
  • Image Classification - 图片分类
  • Document Understanding - 文档理解
  • Video Understanding - 视频理解
  • Object Detection - 对象识别
  • Object Counting - 对象计数
  • Agent - 屏幕理解操作
  • Object Grounding - 物体定位

Coding

Video

Generation Media

问题领域

  • Prompt adherence(提示词遵循度)
  • Generation quality(生成质量)
  • Instructiveness(可指导性)
  • Consistency of styles, characters, settings, etc.(风格、角色、设置的一致性)
  • Deliberate and exact intentional posing of characters and set pieces(角色和场景元素的精确姿态和故意摆放)
  • Compositing different images or layers together(将不同图像或图层组合在一起)
  • Relighting(重新打光)
  • Posing built into the model. No ControlNet hacks.(姿态控制内置于模型中,无需ControlNet等“黑科技”)
  • References built into the model. No IPAdapter, no required character/style LoRAs, etc.(参考功能内置于模型中,无需IPAdapter、角色/风格LoRA等)
  • Ability to address objects, characters, mannequins, etc. for deletion / insertion.(能够针对物体、角色、人体模型等进行删除/插入操作)
  • Ability to pull sources from across multiple images with or without "innovation" / change to their pixels.(能够从多张图片中提取来源,无论是否对其像素进行“创新”/更改)
  • Fine-tunable (so we can get higher quality and precision)(可微调,以获得更高的质量和精度)

Generative Marketing

好的,这是去掉“AI”关键词后的版本:

媒体

  • 可灵 (Kling) 2.1:
    • 根据图像生成视频,效果不错
    • 动态表情、大动态运镜、精确手势控制、演唱口型
    • 旋转机位运镜和口型演出
  • Veo 3:
    • 根据文本提示生成视频
    • 实拍效果模拟
  • Sora:
    • 将现有视频转换为新风格
  • Pika:
    • 在场景中切换或添加内容
  • Runway:
    • 引用人物、地点或风格(Gen-3)
  • Luma:
    • 将视频重新调整为新的宽高比
  • Hedra:
    • 让角色说话(口型同步)
  • 即梦 (Instant Dream):
    • 网上很多视频就是他做的
    • 即梦 Omnihuman: 擅长静态口型
  • Vidu:
    • 二次元动漫演绎
  • Viggle:
    • 将角色添加到视频表情包中(角色动作迁移)
  • Higgsfield
    • 使用好莱坞级视觉效果
  • 剪映专业版:
    • 功能强大,素材特效丰富,剪辑视频必装软件
  • Krea
    • 使用 Wan 或 Hunyuan 等开源模型
  • 美图秀秀:
    • 直接绘画,大家用的惯

文本

  • 豆包:
    • 专注情感,生活场景必备
  • Kimi:
    • 专业长文,就是不怕内容多
  • Deepseek:
    • 写代码完全不出错,强的可怕
  • 知乎:
    • 喜欢知乎文章的朋友必备
  • gamma:
    • 全球最牛的 PPT,根据你的文章直接定制化生成
  • MindShow:
    • 输入文字大纲,自动整理成思维导图,还能一键转换成演示文稿

设计

  • 稿定设计:
    • 涵盖平面设计、电商设计等,提供超多可编辑模板,满足各种设计需求
  • 易企秀:
    • 能快速做 H5 页面,模板种类多,适合活动宣传、产品推广

检索

  • https://felo.ai/
    • 全网最好用的小红书搜索工具,不知道的绝对是一大遗憾

Avatar

  • 数字人

  • HunyuanVideo-Avatar

  • tencent-ailab/IP-Adapter

    • Text-to-Image
  • Pony

    • finetune on SDXL
    • trained on 2.5 million furry/anthro/cartoon/anime images
    • 能直接识别很多动漫角色,不需要 lora

Diffusion Models

[人物描述] [场景构建] [摄影参数] [氛围强化] [细节补充]
  • 场景构建
    • 服装细节
    • 动态姿势
    • 光影氛围

Resolution

  • 1:1
    • 512x512
    • 768x768
    • 1024x1024
  • 4:3
  • 16:9
    • 1216x704
  • Portrait
    • 832x1216
  • Landscape
    • 1216x832

Negative

text
watermark
camera
out of frame, lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature,
(worst quality, low quality:1.4), (ugly:1.2), (stitching:1.2),
bad anatomy, deformed, disfigured, malformed limbs, extra limbs, fused limbs,
poorly drawn face, distorted face, malformed face, asymmetric eyes,
poorly drawn hands, extra fingers, fused fingers, malformed hands,
text, error, signature, watermark, username

动态姿势

服装细节

trendy off-shoulder top
oversized cozy sweater
t-shirt with a cute cat print

质量

(shot on Sony A7 IV, 50mm f/1.8 lens)
photorealistic
ultra detailed
natural skin texture
soft film grain
8k uhd

氛围强化

  • 氛围
serene, tranquil, intimate, modern elegance
  • 色彩
Warm neutrals with pops of soft pastels in window light
  • 动态
Subtle light reflection on sweater fabric, gentle light diffusion across the room

摄影参数

  • 视觉风格
  • 光影处理

Juggernaut XL

  • SDXL 1.0

  • Juggernaut Ragnarok
    • 专注于提升照片写实感、数字绘画、人物姿势、手部和脚部等方面的表现。
    • 该模型以 Jug XII 为基础,首先通过摄影数据集训练,进一步使用 Booru 标签进行重标注,并以 SDXL 作为底座进行训练。随后,作者又以 Lustify by Coyotte 为基础对同一数据集再次训练,并将其以一定比例合并,作为输出的稳定器。由于数据集采用 Booru 标签标注,Booru 风格的提示词和 X–XII 版本的描述方式在 Ragnarok 上都能很好地工作。
    • 适合用于追求高质量写实风格的图像生成项目,但作为 SDXL 模型,仍存在如远距离人脸、文本渲染等方面的局限。推荐将其作为生成管道中的一环(如 FluxDev / Pixelwave / Jug Flux Pro → Juggernaut Ragnarok)以获得更佳效果。模型完全开源,支持自由合并、微调和商用。
Base ModelSDXL 1.0
Resolution832x1216 for Portrait
SamplerDPM++ 2M SDE
Steps30-40
CFG3-6 (less is a bit more realistic)
VAE
HiRes4xNMKD-Siax_200k with 15 Steps and 0.3 Denoise + 1.5 Upscale

CyberRealistic Pony

CyberRealistic Pony 是将 Pony Diffusion 的可爱风格与 CyberRealistic 的写实质感结合的模型。

Base ModelPony
Resolution896x1152 / 832x1216
SamplerDPM++ SDE Karras / DPM++ 2M Karras / Euler a
Steps30+ Steps
CFG5
Clip Skip2

Positive

score_9, score_8_up, score_7_up, (SUBJECT),

Negative

score_6, score_5, score_4, (worst quality:1.2), (low quality:1.2), (normal quality:1.2), lowres, bad anatomy, bad hands, signature, watermarks, ugly, imperfect eyes, skewed eyes, unnatural face, unnatural body, error, extra limb, missing limbs
score_6, score_5, score_4, simplified, abstract, unrealistic, impressionistic, low resolution, lowres, bad anatomy, bad hands, missing fingers, worst quality, low quality, normal quality, cartoon, anime, drawing, sketch, illustration, artificial, poor quality

ADetailer

Adetailer model: face_yolov9c.pt
If you only want the main face being refined set 'Mask only the top k largest' to 1.

Metric

abbr.stand forbettermeaningnotes
WERWord Error Rate⬇️ L词错误率STT
RTFxReal-Time Factor⬆️ H实时因子STT
CERCharacter Error Rate字符错误率STT
PERPhoneme Error Rate音素错误率STT