Vision Models

Analyze images, extract text, and answer visual questions with Assisters Vision, our multimodal AI model.

Assisters Vision v1

Our advanced vision model that understands images and can answer questions about visual content with high accuracy.

Specification	Value
Model ID	`assisters-vision-v1`
Context Window	128,000 tokens
Max Output	8,192 tokens
Input Price	$0.05 / million tokens
Output Price	$0.10 / million tokens
Latency	~300ms first token

Capabilities

Image Understanding: Describe, analyze, and interpret images
OCR: Extract text from images, documents, and screenshots
Visual Q&A: Answer questions about image content
Object Detection: Identify and locate objects in images
Chart Analysis: Understand charts, graphs, and diagrams
Multiple Images: Analyze multiple images in a single request

Example Usage

from openai import OpenAI

client = OpenAI(
    base_url="https://api.assisters.dev/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="assisters-vision-v1",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/image.jpg"}
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

With Base64 Image

import base64

# Read local image
with open("image.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="assisters-vision-v1",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image in detail"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{image_data}"
                    }
                }
            ]
        }
    ]
)

Multiple Images

response = client.chat.completions.create(
    model="assisters-vision-v1",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Compare these two images"},
                {"type": "image_url", "image_url": {"url": "https://example.com/image1.jpg"}},
                {"type": "image_url", "image_url": {"url": "https://example.com/image2.jpg"}}
            ]
        }
    ]
)

Parameters

Parameter	Type	Default	Description
`messages`	array	required	Conversation with image content
`temperature`	float	0.7	Randomness (0-2)
`max_tokens`	int	1024	Maximum output length
`stream`	bool	false	Enable streaming

Use Cases

Document OCR

Extract text from documents, receipts, and screenshots:

response = client.chat.completions.create(
    model="assisters-vision-v1",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Extract all text from this document"},
            {"type": "image_url", "image_url": {"url": document_url}}
        ]
    }],
    temperature=0.1  # Lower for more accurate extraction
)

Product Analysis

Analyze product images for e-commerce:

response = client.chat.completions.create(
    model="assisters-vision-v1",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this product and identify its key features"},
            {"type": "image_url", "image_url": {"url": product_image_url}}
        ]
    }]
)

Chart Understanding

Analyze charts and extract data:

response = client.chat.completions.create(
    model="assisters-vision-v1",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze this chart. What are the key trends?"},
            {"type": "image_url", "image_url": {"url": chart_url}}
        ]
    }]
)

Accessibility

Generate alt text for images:

response = client.chat.completions.create(
    model="assisters-vision-v1",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Write a concise alt text for this image for accessibility"},
            {"type": "image_url", "image_url": {"url": image_url}}
        ]
    }],
    max_tokens=100
)

Best Practices

Use Specific Prompts

Be specific about what you want to know about the image

Optimize Image Size

Resize large images to reduce latency and cost

Lower Temperature for OCR

Use temperature 0.1-0.3 for text extraction tasks

Multiple Images

Compare images or provide context with multiple images

Supported Image Formats

Format	Support
JPEG/JPG	✅ Full support
PNG	✅ Full support
GIF	✅ First frame only
WebP	✅ Full support
BMP	✅ Full support

Assisters Chat v1

Text-only conversational AI

Assisters Image v1

Generate images from text prompts

Model Catalog

​Vision Models

​Assisters Vision v1

​Capabilities

​Example Usage

​With Base64 Image

​Multiple Images

​Parameters

​Use Cases

​Best Practices

Use Specific Prompts

Optimize Image Size

Lower Temperature for OCR

Multiple Images

​Supported Image Formats

​Related Models

Assisters Chat v1

Assisters Image v1

Vision Models

Assisters Vision v1

Capabilities

Example Usage

With Base64 Image

Multiple Images

Parameters

Use Cases

Best Practices

Supported Image Formats

Related Models