Introducing
WheelForge

Stop compiling
CUDA wheels by hand

WheelForge builds and tests GPU-heavy Python packages – like FlashAttention, xFormers, bitsandbytes and Triton – into ready-to-install wheels for your stack.

bash
$ wf install flash-attn==2.5
Detecting environment: CUDA 12.1, PyTorch 2.3, SM_90 (H100)
Found optimized wheel in global cache
Installed flash_attn-2.5.6+cu121torch2.3-cp310.whl (240MB)
Time: 1.2s
Current status: GPU build preview

WheelForge is live as an internal GPU builder and can already build and test a handful of CUDA-heavy packages against Python 3.10 / CUDA 12.1. We’re now hardening the build matrix, adding more packages, and designing the public API.

What Works Right Now

WheelForge is running as a single GPU builder (A40) that targets a tight, realistic matrix.

🏗️

Proven Builds

We build FlashAttention 2.5, xFormers 0.0.27, bitsandbytes 0.43.1, and Triton 3.0.0 from source.

🎯

Specific Matrix

Targeting Python 3.10 + CUDA 12.1 + PyTorch 2.4.1 on Ampere-class GPUs (sm_80/86).

Verified Artifacts

Every build runs GPU smoke tests (tensor ops, small kernels) before emitting a wheel + metadata.

More Than Just a Pip Fix

WheelForge is a complete Binary Infrastructure Platform.

Build

On-demand GPU build farm. WheelForge will dispatch GPU runners to compile binary artifacts for the long-tail of hardware that PyPI ignores—producing wheels where possible, and source-built CUDA extensions where necessary.

  • Wheel builds & source-only CUDA ops
  • Multi-arch compilation (SM_80, SM_90)
  • Custom CUDA + PyTorch versions

Optimize

Hardware-native optimization. We are building compilers that target the specific Streaming Multiprocessor architecture of your deployment hardware.

  • SM-native wheels
  • PGO optimization
  • Cluster profiling

Secure

Enterprise-grade security. All automated builds will include binary scanning and provenance.

  • Private PyPI mirrors
  • Binary provenance
  • Retention policies

Who This Is For

If you’ve ever thought “I don’t want to debug another flash-attn build again”, you’re the target user.

GPU Teams

Teams running GPU-heavy Python workloads (training or inference) who need reliable, optimized binaries.

Frustrated Engineers

Engineers sick of NVCC errors, missing headers, and random compile failures on new machines.

Platform Builders

Tooling / platform teams who want to standardise on a known set of GPU wheels across many environments.

Why WheelForge?

Feature PyPI / Pip Docker WheelForge
GPU Build Farm ✅ Native
Optimized Binaries ❌ Generic ❌ Generic ✅ Hardware-Specific
Storage Limits Limited Large Layers ✅ Petabyte Scale
Security Scanning Basic Image only ✅ Deep Binary Scan
CI Integration ⚠️ Flaky ⚠️ Heavy ✅ Instant

What We're Building Towards

WheelForge is becoming a BinOps layer for GPU AI workloads.

Vision

Trusted Catalogue

A small, trusted catalogue of prebuilt CUDA / PyTorch wheels for annoying packages.

Standard

Reproducible Builds

Every wheel tied to an explicit Python / CUDA / Torch / arch matrix. Source → wheel repack for libraries that only ship sdists.

Tooling

API + CLI

So your tooling can say: wf build flash-attn==2.5 --python 3.10 --cuda 12.1 --arch sm_80 ...and just get back a tested wheel.

Ready to stop fighting dependencies?

Join the waitlist for early access to the build farm.