Qwen3 235B VL Instruct

Multi-modal reasoning with text, images, and video snippets powered by Qwen3 235B Vision-Language intelligence

Backend online

Parameters

Controls creativity vs determinism
Maximum length of the generated answer

Vision Inputs

  • Images: PNG, JPG, WEBP (≤ 10MB)
  • Videos: MP4, MOV, WEBM, OGV (≤ 50MB)
0 characters Input 0.50 /M tokens • Output 2.5 /M tokens

🔌 API Access

Integrate Qwen3 235B VL Instruct into your multi-modal workflows.

🔑 API Keys

Include an active API key with every request. Manage your API keys →

POST /api/v1/generate; /api/v1/chat/completions

Chat with Qwen3 235B VL

Send text, image, or short video snippets (as extracted frames) inside the messages array. The studio automatically captures representative frames when you upload video clips.

Cost: 0.50 credits / 1M input tokens • 2.50 credits / 1M output tokens

Request (cURL)

Request (Python)

Request (JavaScript/Node.js)

Parameters

Parameter Type Required Description
model string ✅ Yes Use qwen3-235b-vl (default)
messages array ✅ Yes Conversation turns where each content item can include {"type":"text"} or {"type":"image_url"} entries (video frames can be supplied the same way).
temperature number Optional Controls creativity (0–2). Default 0.7.
max_tokens number Optional Maximum response tokens (1–8192, default 2048).
stream boolean Optional Default false. Set to true for SSE streaming responses.
attachments array Optional When using the studio UI, the backend stores S3 references here. Direct API calls can inline HTTPS/base64 media instead.
POST /api/v1/completions

Completions API (Raw Prompt)

Use the completions endpoint when you need full control over the prompt format. This is useful for custom templating or when working with special vision tokens like <|vision_start|><|image_pad|><|vision_end|>.

Cost: 0.50 credits / 1M input tokens • 2.50 credits / 1M output tokens

Request (cURL)

Request (Python)

Request (JavaScript/Node.js)

Parameters

Parameter Type Required Description
model string ✅ Yes Use qwen3-235b-vl
prompt string ✅ Yes Raw prompt string with vision tokens. Each <|vision_start|><|image_pad|><|vision_end|> corresponds to one image in the images array.
images array Optional Array of image URLs (HTTP/HTTPS or data URIs). Must match the number of vision tokens in the prompt.
temperature number Optional Controls creativity (0–2). Default 0.7.
max_tokens number Optional Maximum response tokens (1–8192, default 2048).
stop array or string Optional Stop sequences like ["<|im_end|>", "<|endoftext|>"].
stream boolean Optional Default false. Set to true for SSE streaming responses.

Important Notes

  • The number of <|vision_start|><|image_pad|><|vision_end|> tokens must match the length of the images array.
  • Images can be provided as HTTPS URLs or base64 data URIs.
  • Duplicate image URLs are allowed if you want to reference the same image multiple times.
  • Set "stream": true to receive Server-Sent Events (SSE) for real-time token streaming.

Streaming Response Format

When stream: true, responses are sent as Server-Sent Events (SSE):

data: {"id":"abc123","object":"text_completion","created":1234567890,"model":"qwen3-235b-vl","choices":[{"index":0,"text":" white","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}

data: {"id":"abc123","object":"text_completion","created":1234567890,"model":"qwen3-235b-vl","choices":[{"index":0,"text":" dragon","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}

data: {"id":"abc123","object":"text_completion","created":1234567890,"model":"qwen3-235b-vl","choices":[{"index":0,"text":"","logprobs":null,"finish_reason":"stop","matched_stop":151645}],"usage":null}

data: [DONE]

Each data: line contains a JSON object with incremental text in choices[0].text. The stream ends with data: [DONE].