Skip to main content

Vision Models

Analyze images, extract text, and answer visual questions with Assisters Vision, our multimodal AI model.

Assisters Vision v1

Our advanced vision model that understands images and can answer questions about visual content with high accuracy.
SpecificationValue
Model IDassisters-vision-v1
Context Window128,000 tokens
Max Output8,192 tokens
Input Price$0.05 / million tokens
Output Price$0.10 / million tokens
Latency~300ms first token

Capabilities

  • Image Understanding: Describe, analyze, and interpret images
  • OCR: Extract text from images, documents, and screenshots
  • Visual Q&A: Answer questions about image content
  • Object Detection: Identify and locate objects in images
  • Chart Analysis: Understand charts, graphs, and diagrams
  • Multiple Images: Analyze multiple images in a single request

Example Usage

from openai import OpenAI

client = OpenAI(
    base_url="https://api.assisters.dev/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="assisters-vision-v1",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/image.jpg"}
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

With Base64 Image

import base64

# Read local image
with open("image.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="assisters-vision-v1",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image in detail"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{image_data}"
                    }
                }
            ]
        }
    ]
)

Multiple Images

response = client.chat.completions.create(
    model="assisters-vision-v1",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Compare these two images"},
                {"type": "image_url", "image_url": {"url": "https://example.com/image1.jpg"}},
                {"type": "image_url", "image_url": {"url": "https://example.com/image2.jpg"}}
            ]
        }
    ]
)

Parameters

ParameterTypeDefaultDescription
messagesarrayrequiredConversation with image content
temperaturefloat0.7Randomness (0-2)
max_tokensint1024Maximum output length
streamboolfalseEnable streaming

Use Cases

Extract text from documents, receipts, and screenshots:
response = client.chat.completions.create(
    model="assisters-vision-v1",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Extract all text from this document"},
            {"type": "image_url", "image_url": {"url": document_url}}
        ]
    }],
    temperature=0.1  # Lower for more accurate extraction
)
Analyze product images for e-commerce:
response = client.chat.completions.create(
    model="assisters-vision-v1",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this product and identify its key features"},
            {"type": "image_url", "image_url": {"url": product_image_url}}
        ]
    }]
)
Analyze charts and extract data:
response = client.chat.completions.create(
    model="assisters-vision-v1",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze this chart. What are the key trends?"},
            {"type": "image_url", "image_url": {"url": chart_url}}
        ]
    }]
)
Generate alt text for images:
response = client.chat.completions.create(
    model="assisters-vision-v1",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Write a concise alt text for this image for accessibility"},
            {"type": "image_url", "image_url": {"url": image_url}}
        ]
    }],
    max_tokens=100
)

Best Practices

Use Specific Prompts

Be specific about what you want to know about the image

Optimize Image Size

Resize large images to reduce latency and cost

Lower Temperature for OCR

Use temperature 0.1-0.3 for text extraction tasks

Multiple Images

Compare images or provide context with multiple images

Supported Image Formats

FormatSupport
JPEG/JPG✅ Full support
PNG✅ Full support
GIF✅ First frame only
WebP✅ Full support
BMP✅ Full support