Tuesday, October 14, 2025

Running WhisperX Locally on Windows

Thanks to ChatGPT and a lot of trial and error I got this working... 

1) Confirm you have an NVIDIA GPU

  1. Press Win key, type Device Manager, open it.

  2. Expand Display adapters → you should see NVIDIA … (e.g., “NVIDIA GeForce …”).

    • If you don’t have an NVIDIA GPU, you can still use WhisperX on CPU, but this guide is for CUDA (GPU).

2) Install NVIDIA GPU driver (latest)

  1. Go to NVIDIA GeForce / Studio drivers (or your OEM site) and install the latest driver.

  2. After install, restart Windows.

    • Current PyTorch GPU wheels work with modern drivers and are backward-compatible with their CUDA runtime (you do not need matching full toolkits for the PyTorch wheels themselves). 

3) Install the CUDA Toolkit (Toolkit 12.8 as WhisperX asks)

WhisperX’s README explicitly tells Windows users to install the CUDA Toolkit 12.8 before WhisperX if you want GPU acceleration.

  1. Download CUDA Toolkit 12.8 for Windows from NVIDIA’s CUDA downloads page.

  2. Run the installer → accept defaults → finish.

  3. Restart Windows.

If you want the official step-by-step install reference for Windows CUDA: NVIDIA’s CUDA Installation Guide for Microsoft Windows.

4) Install Python (64-bit) – recommended 3.10 or 3.11

  1. Go to python.org → Downloads → Windows.

  2. Download Python 3.11.x (64-bit).

  3. Run installer → check “Add Python to PATH” → choose Install Now → finish.

WhisperX supports recent Python versions; its repo includes a .python-version and standard wheels support 3.8–3.11. (We’ll use 3.11 for broad library compatibility.)

5) Install FFmpeg (required by Whisper/WhisperX)

Option A (manual, no package managers):

  1. Download a Windows FFmpeg build (e.g., from ffmpeg.org or a reputable mirror), unzip to C:\FFmpeg\.

  2. Add C:\FFmpeg\bin to your PATH:

    • Press Win → search “Edit the system environment variables”Environment Variables…

    • Under System variables, select PathEditNew → add C:\FFmpeg\binOK out of all dialogs.

  3. Close and reopen Terminal/PowerShell so PATH updates. 

6) Create an isolated Python environment

  1. Open Windows Terminal (or PowerShell).

  2. Create a new virtual environment folder (any folder is fine). Example:

    py -3.11 -m venv C:\whisperx
  3. Activate it:

    C:\whisperx\Scripts\activate
  4. Upgrade pip:

    python -m pip install --upgrade pip

7) Install PyTorch with CUDA support (cu121 wheels)

On Windows, the official PyTorch CUDA wheels ship with the needed CUDA runtime (cuDNN etc.). The most common, stable choice today is CUDA 12.1 wheels (labelled cu121).
Important: Installing PyTorch this way ensures torch is GPU-enabled; letting other packages pull a CPU-only torch is a common mistake. 

  1. In the same activated venv, run:

    pip install "torch==2.7.1" "torchvision==0.22.1" "torchaudio==2.7.1" --index-url https://download.pytorch.org/whl/cu128
  2. Verify CUDA is detected:

    python - << 'PY' import torch, torchvision print("Torch:", torch.__version__, "CUDA available:", torch.cuda.is_available(), "CUDA runtime:", torch.version.cuda) from torchvision.ops import nms print("TorchVision:", torchvision.__version__, "NMS OK") if torch.cuda.is_available(): print("GPU:", torch.cuda.get_device_name(0)) PY
    • You should see NMS OK and CUDA available: True and your GPU name.

If CUDA isn’t available here, stop and fix this first (driver, toolkit, or wrong torch wheel). Users often report that installing WhisperX before a proper CUDA-enabled torch causes CPU-only torch to be installed.

8) Install WhisperX

  1. Still inside the same venv:

    pip install whisperx 

9) Quick GPU test (no diarization)

  1. Put a small audio file next to your terminal (e.g., C:\audio\clip.wav).

  2. Run:

    whisperx C:\audio\clip.wav --model small --device cuda --batch_size 4
    • This should create outputs (.srt, .txt, etc.) and use your GPU.

    • These usage flags and models are from the WhisperX README examples. GitHub

10) (Optional) Enable Speaker Diarization

WhisperX uses pyannote models that require you to accept licenses on Hugging Face and use a token.

  1. Create a (free) Hugging Face account → generate a read token.

  2. On the model pages noted by WhisperX, click “Accept” the user agreements:

  3. Run diarization with your token:

    whisperx C:\audio\clip.wav --model large-v2 --diarize --hf_token YOUR_HF_TOKEN --device cuda --highlight_words True

    (Exact diarization flags and the need to accept models are stated in WhisperX’s README.

11) (Optional) Create a Batch File

To simplify your use of WhisperX you can create a batch file that lets you process more than one file at a time. .
  1. Confirm the location of your virtual environment (venv) directory

  2. Confirm your hugging face token value

  3. Create the following batch file, updating the venv directory and the hugging face token key which are highlighted.

@echo off

call "VENVDIRECTORY\Scripts\activate.bat"

echo Processing %~n0...

REM Loop through all dropped files
for %%F in (%*) do (
    echo Transcribing: "%%~nxF"
whisperx "%%~fF" --model small --device cuda --batch_size 4 --diarize --output_format=txt --language=en --hf_token=hf_YOURHFTOKEN >nul 2>&1
)