Qwen2.5-VL-72B-Instruct

2025-01-28
Chat, Vision
By Qwen

Input: ￥6.00 / M tokens Output: ￥12.00 / M tokens
Features: Image Input, Streaming, Text Input, Text Output
Context Window: 128K

Input: ￥6.00 / M tokens Output: ￥12.00 / M tokens
Features: Image Input, Streaming, Text Input, Text Output
Context Window: 128K

Model Description

Qwen2.5-VL-72B-Instruct represents a significant upgrade in the Qwen family of vision-language models. Building on feedback from developers, it excels in visual recognition (objects, text, charts, layouts), acts as a visual agent for tool-based reasoning, and processes long videos (1+ hours) with precise event localization. It supports object detection via bounding boxes/points and generates structured outputs (e.g., invoices, tables) for finance/commerce. Architectural improvements include dynamic FPS training for video understanding, optimized ViT with window attention/SwiGLU, and temporal mRoPE enhancements. Available in 3B/7B/72B variants, this 72B instruction-tuned model balances speed and performance.

Recommend Models

o3-mini

Chat, Reasoning
OpenAI

o3-mini is our newest small reasoning model, providing high intelligence at the same cost and latency targets of o1-mini. o3-mini supports key developer features, like Structured Outputs, function calling, and Batch API.

2025-01-31

DeepSeek-V3-0324

Chat
DeepSeek

DeepSeek-V3-0324 is an upgraded AI model with enhanced reasoning, coding, Chinese writing, and web search capabilities, outperforming GPT-4.5 in certain tasks while maintaining 128K context support and open-source MIT licensing.

2025-03-24

gemini-2.0-flash

Chat, Vision
Gemini

Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, multimodal generation, and a 1M token context window.

2025-02-14