Vision Models
Analyze images, extract text, and answer visual questions with Assisters Vision, our multimodal AI model.Assisters Vision v1
Our advanced vision model that understands images and can answer questions about visual content with high accuracy.| Specification | Value |
|---|---|
| Model ID | assisters-vision-v1 |
| Context Window | 128,000 tokens |
| Max Output | 8,192 tokens |
| Input Price | $0.05 / million tokens |
| Output Price | $0.10 / million tokens |
| Latency | ~300ms first token |
Capabilities
- Image Understanding: Describe, analyze, and interpret images
- OCR: Extract text from images, documents, and screenshots
- Visual Q&A: Answer questions about image content
- Object Detection: Identify and locate objects in images
- Chart Analysis: Understand charts, graphs, and diagrams
- Multiple Images: Analyze multiple images in a single request
Example Usage
With Base64 Image
Multiple Images
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
messages | array | required | Conversation with image content |
temperature | float | 0.7 | Randomness (0-2) |
max_tokens | int | 1024 | Maximum output length |
stream | bool | false | Enable streaming |
Use Cases
Document OCR
Document OCR
Extract text from documents, receipts, and screenshots:
Product Analysis
Product Analysis
Analyze product images for e-commerce:
Chart Understanding
Chart Understanding
Analyze charts and extract data:
Accessibility
Accessibility
Generate alt text for images:
Best Practices
Use Specific Prompts
Be specific about what you want to know about the image
Optimize Image Size
Resize large images to reduce latency and cost
Lower Temperature for OCR
Use temperature 0.1-0.3 for text extraction tasks
Multiple Images
Compare images or provide context with multiple images
Supported Image Formats
| Format | Support |
|---|---|
| JPEG/JPG | ✅ Full support |
| PNG | ✅ Full support |
| GIF | ✅ First frame only |
| WebP | ✅ Full support |
| BMP | ✅ Full support |