Speech-to-Text Models

Transcribe audio files with high accuracy using Assisters Whisper, our advanced speech recognition model.

Assisters Whisper v1

Our state-of-the-art speech recognition model that transcribes audio in 100+ languages with exceptional accuracy.

Specification	Value
Model ID	`assisters-whisper-v1`
Languages	100+
Max Audio Length	25 minutes
Price	$0.01 / minute
Latency	~1x real-time

Capabilities

Multilingual: Transcribe 100+ languages automatically
High Accuracy: State-of-the-art word error rate
Speaker Diarization: Identify different speakers (coming soon)
Timestamps: Word and segment-level timestamps
Translation: Translate audio to English

Supported Formats

MP3, MP4, M4A, WAV, WEBM, FLAC, OGG, and more.

Example Usage

from openai import OpenAI

client = OpenAI(
    base_url="https://api.assisters.dev/v1",
    api_key="your-api-key"
)

# Transcribe audio file
with open("audio.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="assisters-whisper-v1",
        file=audio_file
    )

print(response.text)

With Timestamps

response = client.audio.transcriptions.create(
    model="assisters-whisper-v1",
    file=audio_file,
    response_format="verbose_json",
    timestamp_granularities=["word", "segment"]
)

# Access timestamps
for segment in response.segments:
    print(f"[{segment.start:.2f} - {segment.end:.2f}] {segment.text}")

Translation to English

# Translate non-English audio to English
response = client.audio.translations.create(
    model="assisters-whisper-v1",
    file=audio_file
)

print(response.text)  # English translation

With Language Hint

# Specify the language for better accuracy
response = client.audio.transcriptions.create(
    model="assisters-whisper-v1",
    file=audio_file,
    language="es"  # Spanish
)

Parameters

Parameter	Type	Default	Description
`file`	file	required	Audio file to transcribe
`model`	string	required	Model ID (`assisters-whisper-v1`)
`language`	string	auto	ISO-639-1 language code
`prompt`	string	null	Guide the model’s style
`response_format`	string	”json”	Output format
`temperature`	float	0	Sampling temperature
`timestamp_granularities`	array	null	Timestamp detail level

Response Formats

Format	Description
`json`	Simple JSON with text
`text`	Plain text only
`srt`	SubRip subtitle format
`verbose_json`	Detailed JSON with timestamps
`vtt`	WebVTT subtitle format

Use Cases

Meeting Transcription

Transcribe meetings and calls:

response = client.audio.transcriptions.create(
    model="assisters-whisper-v1",
    file=meeting_audio,
    response_format="verbose_json"
)

# Format as meeting notes
for segment in response.segments:
    print(f"[{format_time(segment.start)}] {segment.text}")

Subtitle Generation

Create subtitles for videos:

# Get SRT format directly
response = client.audio.transcriptions.create(
    model="assisters-whisper-v1",
    file=video_audio,
    response_format="srt"
)

# Save as subtitle file
with open("subtitles.srt", "w") as f:
    f.write(response)

Podcast Processing

Transcribe podcasts for search and accessibility:

response = client.audio.transcriptions.create(
    model="assisters-whisper-v1",
    file=podcast_file,
    response_format="verbose_json",
    timestamp_granularities=["segment"]
)

# Create searchable index
for segment in response.segments:
    index_content(segment.text, segment.start, segment.end)

Voice Notes

Convert voice memos to text:

response = client.audio.transcriptions.create(
    model="assisters-whisper-v1",
    file=voice_note,
    response_format="text"
)

# Simple text output
save_note(response)

Supported Languages

Assisters Whisper v1 supports 100+ languages including: Major Languages: English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese (Simplified & Traditional), Japanese, Korean, Arabic, Hindi, and more. Regional Languages: Catalan, Welsh, Icelandic, Latvian, Lithuanian, Slovenian, and many others.

Best Practices

Use Language Hints

Specify the language when known for better accuracy

Clean Audio

Higher quality audio produces better transcriptions

Chunk Long Audio

Split files longer than 25 minutes into chunks

Use Prompts

Guide the model with context-specific terminology

Assisters TTS v1

Convert text to natural speech

Assisters Chat v1

Process transcribed text with AI

Model Catalog

​Speech-to-Text Models

​Assisters Whisper v1

​Capabilities

​Supported Formats

​Example Usage

​With Timestamps

​Translation to English

​With Language Hint

​Parameters

​Response Formats

​Use Cases

​Supported Languages

​Best Practices