Qwen3 235B VL Instruct

Multi-modal reasoning with text, images, and video snippets powered by Qwen3 235B Vision-Language intelligence

Backend online

Parameters

Controls creativity vs determinism
Maximum length of the generated answer

Vision Inputs

  • Images: PNG, JPG, WEBP (≤ 10MB)
  • Videos: MP4, MOV, WEBM, OGV (≤ 50MB)
0 characters Input 0.50 /M tokens • Output 2.5 /M tokens

🔌 API Access

Stream multi-modal responses from Qwen3 235B VL Instruct using our REST endpoint.

🔑 API Keys

Include an active API key with every request. Manage your API keys →

POST /api/v1/generate

Supply text, images, or videos in the messages array. The studio will automatically extract representative frames from uploaded videos before dispatching the request.

💰 Cost

Input 0.50 credits / 1M tokens • Output 2.50 credits / 1M tokens

Request (cURL)

Request (JavaScript / Node)

Request (Python)

Parameters

model "Qwen/Qwen3-VL-235B-A22B-Instruct" (default)
messages Array of chat objects. Each content item may include { type: "text" }, { type: "image_url" }, or extracted video frames.
temperature 0 – 2.0 (default 0.7)
max_tokens 1 – 8192 (default 2048)
stream true for Server-Sent Events streaming responses.
attachments Optional S3 URLs when uploading via the studio UI. For direct API calls, provide HTTPS or base64 media links in-line.