vget v0.12.0 Released: AI Speech-to-Text Feature
AI Features Are Here
vget v0.12.0 introduces a brand new AI module, with Speech-to-Text as the first feature. After extensive testing, the transcription quality is excellent - whether it's podcasts, meeting recordings, or video narration, it handles them all with high accuracy.
Command Line Usage
Use the vget ai transcribe command in the CLI for speech-to-text conversion:
# Basic usage: transcribe audio file, outputs Markdown by default
vget ai transcribe ./recording.mp3
# Specify language: use -l parameter to set the audio language
vget ai transcribe -l zh ./interview.mp3
# Output subtitle file: use -o parameter to output as SRT format
vget ai transcribe -l zh ./podcast.mp3 -o podcast.srt
# Transcribe video files: automatically extracts audio before transcription
vget ai transcribe -l en ./lecture.mp4 -o lecture.srt
Docker Web Interface
In the Docker-deployed web interface, click the AI icon in the left navigation bar to access the speech-to-text feature:
- Select File - Choose from files in
/home/vget/downloadsdirectory, or upload local files directly - Set Language - Select the language of the audio for best recognition accuracy
- Choose Format - Supports Markdown text or SRT subtitle format output
- Start Transcription - Click the button to begin, download results when complete
Supported File Formats
| Type | Supported Formats |
|---|---|
| Audio | MP3, WAV, M4A, FLAC, OGG, AAC |
| Video | MP4, MKV, MOV, AVI, WebM |
For video files, vget automatically extracts the audio track before transcription - no manual conversion needed.
Supported Languages
vget AI supports speech recognition in multiple languages, specified via the -l parameter:
| Code | Language |
|---|---|
| en | English (default) |
| zh | Chinese |
| ja | Japanese |
| ko | Korean |
| es | Spanish |
| fr | French |
| de | German |
Output Format Details
Markdown Format (Default)
Ideal for reading and further editing. Transcription results are organized into paragraphs for easy post-processing.
SRT Subtitle Format
Standard subtitle file format with timeline information, ready for use in video players or editing software:
1
00:00:00,000 --> 00:00:03,500
Hello everyone, welcome to this episode
2
00:00:03,500 --> 00:00:07,200
Today we'll be discussing AI development
Use Cases
- Podcast Transcription - Convert podcast content to text for easy searching and citation
- Meeting Notes - Quickly generate meeting minutes
- Video Subtitles - Automatically generate subtitle files for videos
- Study Notes - Convert lecture recordings into editable text notes
What's Next
- More AI features (translation, summarization, etc.)
- Improved performance for long audio files
- Batch transcription support
Feel free to submit feedback and suggestions on GitHub!