vget v0.12.0 Released: AI Speech-to-Text Feature

vget

speech-to-text

subtitles

transcribe

vget v0.12.0 introduces AI capabilities with high-quality speech-to-text, supporting Markdown and SRT subtitle output

AI Features Are Here

vget v0.12.0 introduces a brand new AI module, with Speech-to-Text as the first feature. After extensive testing, the transcription quality is excellent - whether it's podcasts, meeting recordings, or video narration, it handles them all with high accuracy.

Command Line Usage

Use the vget ai transcribe command in the CLI for speech-to-text conversion:

# Basic usage: transcribe audio file, outputs Markdown by default
vget ai transcribe ./recording.mp3

# Specify language: use -l parameter to set the audio language
vget ai transcribe -l zh ./interview.mp3

# Output subtitle file: use -o parameter to output as SRT format
vget ai transcribe -l zh ./podcast.mp3 -o podcast.srt

# Transcribe video files: automatically extracts audio before transcription
vget ai transcribe -l en ./lecture.mp4 -o lecture.srt

Docker Web Interface

In the Docker-deployed web interface, click the AI icon in the left navigation bar to access the speech-to-text feature:

Select File - Choose from files in /home/vget/downloads directory, or upload local files directly
Set Language - Select the language of the audio for best recognition accuracy
Choose Format - Supports Markdown text or SRT subtitle format output
Start Transcription - Click the button to begin, download results when complete

Supported File Formats

Type	Supported Formats
Audio	MP3, WAV, M4A, FLAC, OGG, AAC
Video	MP4, MKV, MOV, AVI, WebM

For video files, vget automatically extracts the audio track before transcription - no manual conversion needed.

Supported Languages

vget AI supports speech recognition in multiple languages, specified via the -l parameter:

Code	Language
en	English (default)
zh	Chinese
ja	Japanese
ko	Korean
es	Spanish
fr	French
de	German

Output Format Details

Markdown Format (Default)

Ideal for reading and further editing. Transcription results are organized into paragraphs for easy post-processing.

SRT Subtitle Format

Standard subtitle file format with timeline information, ready for use in video players or editing software:

1
00:00:00,000 --> 00:00:03,500
Hello everyone, welcome to this episode

2
00:00:03,500 --> 00:00:07,200
Today we'll be discussing AI development

Use Cases

Podcast Transcription - Convert podcast content to text for easy searching and citation
Meeting Notes - Quickly generate meeting minutes
Video Subtitles - Automatically generate subtitle files for videos
Study Notes - Convert lecture recordings into editable text notes

What's Next

More AI features (translation, summarization, etc.)
Improved performance for long audio files
Batch transcription support

Feel free to submit feedback and suggestions on GitHub!