vget v0.12.0 Released: AI Speech-to-Text Feature

vget
ai
speech-to-text
subtitles
transcribe
vget v0.12.0 introduces AI capabilities with high-quality speech-to-text, supporting Markdown and SRT subtitle output

AI Features Are Here

vget v0.12.0 introduces a brand new AI module, with Speech-to-Text as the first feature. After extensive testing, the transcription quality is excellent - whether it's podcasts, meeting recordings, or video narration, it handles them all with high accuracy.

Command Line Usage

Use the vget ai transcribe command in the CLI for speech-to-text conversion:

# Basic usage: transcribe audio file, outputs Markdown by default
vget ai transcribe ./recording.mp3

# Specify language: use -l parameter to set the audio language
vget ai transcribe -l zh ./interview.mp3

# Output subtitle file: use -o parameter to output as SRT format
vget ai transcribe -l zh ./podcast.mp3 -o podcast.srt

# Transcribe video files: automatically extracts audio before transcription
vget ai transcribe -l en ./lecture.mp4 -o lecture.srt

Docker Web Interface

In the Docker-deployed web interface, click the AI icon in the left navigation bar to access the speech-to-text feature:

  1. Select File - Choose from files in /home/vget/downloads directory, or upload local files directly
  2. Set Language - Select the language of the audio for best recognition accuracy
  3. Choose Format - Supports Markdown text or SRT subtitle format output
  4. Start Transcription - Click the button to begin, download results when complete

Supported File Formats

TypeSupported Formats
AudioMP3, WAV, M4A, FLAC, OGG, AAC
VideoMP4, MKV, MOV, AVI, WebM

For video files, vget automatically extracts the audio track before transcription - no manual conversion needed.

Supported Languages

vget AI supports speech recognition in multiple languages, specified via the -l parameter:

CodeLanguage
enEnglish (default)
zhChinese
jaJapanese
koKorean
esSpanish
frFrench
deGerman

Output Format Details

Markdown Format (Default)

Ideal for reading and further editing. Transcription results are organized into paragraphs for easy post-processing.

SRT Subtitle Format

Standard subtitle file format with timeline information, ready for use in video players or editing software:

1
00:00:00,000 --> 00:00:03,500
Hello everyone, welcome to this episode

2
00:00:03,500 --> 00:00:07,200
Today we'll be discussing AI development

Use Cases

What's Next

Feel free to submit feedback and suggestions on GitHub!