Running WhisperX Locally on Windows
Thanks to ChatGPT and a lot of trial and error I got this working...
1) Confirm you have an NVIDIA GPU
-
Press Win key, type Device Manager, open it.
-
Expand Display adapters → you should see NVIDIA … (e.g., “NVIDIA GeForce …”).
-
If you don’t have an NVIDIA GPU, you can still use WhisperX on CPU, but this guide is for CUDA (GPU).
-
2) Install NVIDIA GPU driver (latest)
-
Go to NVIDIA GeForce / Studio drivers (or your OEM site) and install the latest driver.
-
After install, restart Windows.
-
Current PyTorch GPU wheels work with modern drivers and are backward-compatible with their CUDA runtime (you do not need matching full toolkits for the PyTorch wheels themselves).
-
3) Install the CUDA Toolkit (Toolkit 12.8 as WhisperX asks)
WhisperX’s README explicitly tells Windows users to install the CUDA Toolkit 12.8 before WhisperX if you want GPU acceleration.
-
Download CUDA Toolkit 12.8 for Windows from NVIDIA’s CUDA downloads page.
-
Run the installer → accept defaults → finish.
-
Restart Windows.
If you want the official step-by-step install reference for Windows CUDA: NVIDIA’s CUDA Installation Guide for Microsoft Windows.
4) Install Python (64-bit) – recommended 3.10 or 3.11
-
Go to python.org → Downloads → Windows.
-
Download Python 3.11.x (64-bit).
-
Run installer → check “Add Python to PATH” → choose Install Now → finish.
WhisperX supports recent Python versions; its repo includes a
.python-version
and standard wheels support 3.8–3.11. (We’ll use 3.11 for broad library compatibility.)
5) Install FFmpeg (required by Whisper/WhisperX)
Option A (manual, no package managers):
-
Download a Windows FFmpeg build (e.g., from ffmpeg.org or a reputable mirror), unzip to
C:\FFmpeg\
. -
Add
C:\FFmpeg\bin
to your PATH:-
Press Win → search “Edit the system environment variables” → Environment Variables…
-
Under System variables, select Path → Edit → New → add
C:\FFmpeg\bin
→ OK out of all dialogs.
-
-
Close and reopen Terminal/PowerShell so PATH updates.
6) Create an isolated Python environment
-
Open Windows Terminal (or PowerShell).
-
Create a new virtual environment folder (any folder is fine). Example:
-
Activate it:
-
Upgrade pip:
7) Install PyTorch with CUDA support (cu121 wheels)
On Windows, the official PyTorch CUDA wheels ship with the needed CUDA runtime (cuDNN etc.). The most common, stable choice today is CUDA 12.1 wheels (labelled cu121).
Important: Installing PyTorch this way ensurestorch
is GPU-enabled; letting other packages pull a CPU-only torch is a common mistake.
-
In the same activated venv, run:
-
Verify CUDA is detected:
-
You should see
NMS OK and CUDA available: True
and your GPU name.
-
If CUDA isn’t available here, stop and fix this first (driver, toolkit, or wrong torch wheel). Users often report that installing WhisperX before a proper CUDA-enabled torch causes CPU-only torch to be installed.
8) Install WhisperX
-
Still inside the same venv:
9) Quick GPU test (no diarization)
-
Put a small audio file next to your terminal (e.g.,
C:\audio\clip.wav
). -
Run:
-
This should create outputs (
.srt
,.txt
, etc.) and use your GPU. -
These usage flags and models are from the WhisperX README examples. GitHub
-
10) (Optional) Enable Speaker Diarization
WhisperX uses pyannote models that require you to accept licenses on Hugging Face and use a token.
-
Create a (free) Hugging Face account → generate a read token.
-
On the model pages noted by WhisperX, click “Accept” the user agreements:
Segmentation model
- Segmentation-3.0 model
-
Speaker-Diarization-3.1 model
-
Run diarization with your token:
(Exact diarization flags and the need to accept models are stated in WhisperX’s README.
11) (Optional) Create a Batch File
Confirm the location of your virtual environment (venv) directory
Confirm your hugging face token value
Create the following batch file, updating the venv directory and the hugging face token key which are highlighted.
0 Comments:
Post a Comment
<< Home