Skip to main content

Speech-to-Text Models

Transcribe audio files with high accuracy using Assisters Whisper, our advanced speech recognition model.

Assisters Whisper v1

Our state-of-the-art speech recognition model that transcribes audio in 100+ languages with exceptional accuracy.
SpecificationValue
Model IDassisters-whisper-v1
Languages100+
Max Audio Length25 minutes
Price$0.01 / minute
Latency~1x real-time

Capabilities

  • Multilingual: Transcribe 100+ languages automatically
  • High Accuracy: State-of-the-art word error rate
  • Speaker Diarization: Identify different speakers (coming soon)
  • Timestamps: Word and segment-level timestamps
  • Translation: Translate audio to English

Supported Formats

MP3, MP4, M4A, WAV, WEBM, FLAC, OGG, and more.

Example Usage

from openai import OpenAI

client = OpenAI(
    base_url="https://api.assisters.dev/v1",
    api_key="your-api-key"
)

# Transcribe audio file
with open("audio.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="assisters-whisper-v1",
        file=audio_file
    )

print(response.text)

With Timestamps

response = client.audio.transcriptions.create(
    model="assisters-whisper-v1",
    file=audio_file,
    response_format="verbose_json",
    timestamp_granularities=["word", "segment"]
)

# Access timestamps
for segment in response.segments:
    print(f"[{segment.start:.2f} - {segment.end:.2f}] {segment.text}")

Translation to English

# Translate non-English audio to English
response = client.audio.translations.create(
    model="assisters-whisper-v1",
    file=audio_file
)

print(response.text)  # English translation

With Language Hint

# Specify the language for better accuracy
response = client.audio.transcriptions.create(
    model="assisters-whisper-v1",
    file=audio_file,
    language="es"  # Spanish
)

Parameters

ParameterTypeDefaultDescription
filefilerequiredAudio file to transcribe
modelstringrequiredModel ID (assisters-whisper-v1)
languagestringautoISO-639-1 language code
promptstringnullGuide the model’s style
response_formatstring”json”Output format
temperaturefloat0Sampling temperature
timestamp_granularitiesarraynullTimestamp detail level

Response Formats

FormatDescription
jsonSimple JSON with text
textPlain text only
srtSubRip subtitle format
verbose_jsonDetailed JSON with timestamps
vttWebVTT subtitle format

Use Cases

Transcribe meetings and calls:
response = client.audio.transcriptions.create(
    model="assisters-whisper-v1",
    file=meeting_audio,
    response_format="verbose_json"
)

# Format as meeting notes
for segment in response.segments:
    print(f"[{format_time(segment.start)}] {segment.text}")
Create subtitles for videos:
# Get SRT format directly
response = client.audio.transcriptions.create(
    model="assisters-whisper-v1",
    file=video_audio,
    response_format="srt"
)

# Save as subtitle file
with open("subtitles.srt", "w") as f:
    f.write(response)
Transcribe podcasts for search and accessibility:
response = client.audio.transcriptions.create(
    model="assisters-whisper-v1",
    file=podcast_file,
    response_format="verbose_json",
    timestamp_granularities=["segment"]
)

# Create searchable index
for segment in response.segments:
    index_content(segment.text, segment.start, segment.end)
Convert voice memos to text:
response = client.audio.transcriptions.create(
    model="assisters-whisper-v1",
    file=voice_note,
    response_format="text"
)

# Simple text output
save_note(response)

Supported Languages

Assisters Whisper v1 supports 100+ languages including: Major Languages: English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese (Simplified & Traditional), Japanese, Korean, Arabic, Hindi, and more. Regional Languages: Catalan, Welsh, Icelandic, Latvian, Lithuanian, Slovenian, and many others.

Best Practices

Use Language Hints

Specify the language when known for better accuracy

Clean Audio

Higher quality audio produces better transcriptions

Chunk Long Audio

Split files longer than 25 minutes into chunks

Use Prompts

Guide the model with context-specific terminology