Whisper Speech-to-Text Hosting in Romania
Whisper Large-v3 is the most accurate open-source speech-to-text model available, handling 99 languages and producing word-level timestamps with diarization when paired with WhisperX. Running it well requires real GPU capacity - CPU-only Whisper Large takes 4-6 minutes per minute of audio, which makes any production transcription workflow painful. AnubizHost ships Faster-Whisper plus WhisperX on dedicated RTX 4090 in Romania, with no KYC and crypto payment.
Need this done for your project?
We implement, you ship. Async, documented, done in days.
VRAM and Speed for Whisper Models
Whisper Large-v3 FP16: 6GB VRAM, runs at ~12x realtime on 4090 with Faster-Whisper. Medium FP16: 3GB, 25x realtime. Turbo (Whisper-Turbo): 2GB, 60x realtime. Tiny: 0.5GB, 200x+ realtime.
WhisperX (Whisper + wav2vec2 forced alignment + pyannote diarization): 8-10GB total. Adds word-level timestamps and speaker diarization. ~8x realtime on 4090.
For batch transcription of podcasts or meeting recordings, the 4090 handles a 1-hour file in 5-7 minutes with Whisper Large-v3 + WhisperX.
Faster-Whisper + WhisperX Stack
Pre-installed: Faster-Whisper (CTranslate2 backend, ~4x faster than vanilla Whisper at same accuracy), WhisperX with pyannote 3.x diarization, optional Demucs for vocal separation before transcription, optional Silero VAD for cleaner segmentation.
The 4090's 24GB easily hosts the full WhisperX stack plus a vocal-separation pre-processor without offloading. Multi-speaker podcasts with overlapping speech transcribe accurately.
API Patterns for Transcription Services
Pattern 1: Batch transcription API. POST audio file to /transcribe, get back JSON with text plus word-level timestamps plus speaker labels. We pre-install a FastAPI wrapper around Faster-Whisper.
Pattern 2: Real-time streaming. Whisper-streaming or whisper-live for live captioning. Higher latency than commercial APIs (~1.5s) but no data leaves your box.
Pattern 3: Personal use - upload a meeting recording, get back searchable transcript. Open-source UI options include Whishper or our own minimal Flask interface.
Language and Accuracy Considerations
Whisper Large-v3 handles 99 languages. Word error rate on English is industry-leading among open models. For Russian, Polish, Mandarin, Arabic, and other major non-English languages, accuracy is competitive with paid commercial transcription services.
Hallucination is a known Whisper failure mode on long silences. Faster-Whisper plus Silero VAD pre-segmentation reduces hallucinations significantly.
Order
$179/mo. Pay BTC, XMR, LN, USDT. Provision 15-20 minutes. Faster-Whisper API exposed via TLS on auto-subdomain plus optional .onion.
Related: AI hosting, RVC voice clone alternative, anonymous VPS, live pricing.
Related Services
Why Anubiz Host
Ready to get started?
Skip the research. Tell us what you need, and we'll scope it, implement it, and hand it back — fully documented and production-ready.