Qwen3 235B VL Instruct
Multi-modal reasoning with text, images, and video snippets powered by Qwen3 235B Vision-Language intelligence
Backend online
0 characters
•
Input 0.50 /M tokens • Output 2.5 /M tokens
🔌 API Access
Stream multi-modal responses from Qwen3 235B VL Instruct using our REST endpoint.
🔑 API Keys
Include an active API key with every request. Manage your API keys →
POST
/api/v1/generate
Supply text, images, or videos in the messages array. The studio will automatically extract representative frames from uploaded videos before dispatching the request.
💰 Cost
Input 0.50 credits / 1M tokens • Output 2.50 credits / 1M tokens
Request (cURL)
Request (JavaScript / Node)
Request (Python)
Parameters
model |
"Qwen/Qwen3-VL-235B-A22B-Instruct" (default) |
messages |
Array of chat objects. Each content item may include { type: "text" }, { type: "image_url" }, or extracted video frames. |
temperature |
0 – 2.0 (default 0.7) |
max_tokens |
1 – 8192 (default 2048) |
stream |
true for Server-Sent Events streaming responses. |
attachments |
Optional S3 URLs when uploading via the studio UI. For direct API calls, provide HTTPS or base64 media links in-line. |