QwenLM
- QwenLM
- QwenLM/Qwen2.5-VL
- Collection Qwen2.5-VL
- 3B, 7B, 32B, 72B
- Collection Qwen2.5-VL
- QvQ
- visual reasoning
- Qwen
- 大模型
- Omni
- text, audio, image, video, natural speech interaction
VL
<box></box>
- <|box_start|>
- <|box_end|>
<|object_ref_start|><|object_ref_end|>
- min_pixels = 2562828
- max_pixels = 12802828
- FineTune
FAQ
macOS Dimension out of range
model_path = "Qwen/Qwen2.5-VL-3B-Instruct"
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
attn_implementation="eager", # 修改这个
device_map="mps"
)
min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct", min_pixels=min_pixels, max_pixels=max_pixels)