Qwen2.5-VL-72B-Instruct

模型描述

Qwen2.5-VL-72B-Instruct represents a significant upgrade in the Qwen family of vision-language models. Building on feedback from developers, it excels in visual recognition (objects, text, charts, layouts), acts as a visual agent for tool-based reasoning, and processes long videos (1+ hours) with precise event localization. It supports object detection via bounding boxes/points and generates structured outputs (e.g., invoices, tables) for finance/commerce. Architectural improvements include dynamic FPS training for video understanding, optimized ViT with window attention/SwiGLU, and temporal mRoPE enhancements. Available in 3B/7B/72B variants, this 72B instruction-tuned model balances speed and performance.

🔔如何使用

全文结束

推荐模型

o4-mini-2025-04-16

我们更快、成本效益高的推理模型在数学、编码和视觉方面表现出色。

o3

我们最强大的推理模型,在编码、数学、科学和视觉方面表现出色。

gemini-2.0-flash

Gemini 2.0 Flash 提供了下一代功能和改进的能力,包括更快的速度、原生工具使用、多模态生成和 1M 令牌上下文窗口。