Qwen3 235B VL Instruct

Multi-modal reasoning with text, images, and video snippets powered by Qwen3 235B Vision-Language intelligence

Backend online

Parameters

Controls creativity vs determinism
Maximum length of the generated answer

Vision Inputs

  • Images: PNG, JPG, WEBP (≤ 10MB)
  • Videos: MP4, MOV, WEBM, OGV (≤ 50MB)
0 characters Input 0.50 /M tokens • Output 2.5 /M tokens

🔌 API Access

Integrate Qwen3 235B VL Instruct into your multi-modal workflows.

🔑 API Keys

Include an active API key with every request. Manage your API keys →

POST /api/v1/generate; /api/v1/chat/completions

Chat with Qwen3 235B VL

Send text, image, or short video snippets (as extracted frames) inside the messages array. The studio automatically captures representative frames when you upload video clips.

Cost: 0.50 credits / 1M input tokens • 2.50 credits / 1M output tokens

Request (cURL)

Request (Python)

Request (JavaScript/Node.js)

Parameters

Parameter Type Required Description
model string ✅ Yes Use qwen3-235b-vl (default)
messages array ✅ Yes Conversation turns where each content item can include {"type":"text"} or {"type":"image_url"} entries (video frames can be supplied the same way).
temperature number Optional Controls creativity (0–2). Default 0.7.
max_tokens number Optional Maximum response tokens (1–8192, default 2048).
stream boolean Optional Default false. Set to true for SSE streaming responses.
attachments array Optional When using the studio UI, the backend stores S3 references here. Direct API calls can inline HTTPS/base64 media instead.