A GPU Build Farm that
compiles what pip can’t.
Stop compiling locally. Stop fighting version conflicts. WheelForge is a managed build system for the most painful GPU libraries.
We run
a dedicated GPU build farm that compiles, tests, and packages hardware-native binary artifacts
for your specific deployment targets.
Whether it's FlashAttention, xFormers, or Triton—we
handle the compilation so you don't have to.
Single GPU Builder
We currently run a dedicated A40 build node designed to crush heavy compilations.
You
request a binary artifact, and if it's not pre-built, we queue it for compilation in a
clean, isolated environment.
Current Capabilities:
- Builds flash-attn, xformers, bitsandbytes, triton
- Targets Ampere (sm_80/86) architectures
- Isolated venv per build
- Outputs wheels + metadata
WheelForge handles the nasty compilation steps so your CI doesn't have to.
SM-Native Compilers
Generic wheels leave performance on the table.
WheelForge is being built to compile
specifically for the Streaming Multiprocessor (SM) architecture of your target GPU.
Why it matters:
FlashAttention compiled for A100 will never run optimally
on H100.
We manage this complexity for you.
Verified Artifacts
We don't just compile; we verify. Every wheel produced by WheelForge undergoes a suite of GPU smoke tests before it's accepted.
- Tensor operations check
- Small forward/backward pass
- CUDA availability check
- Metadata verification
If a build fails validation, it never reaches your machine.
We store every verified build with its full build log and test results.
torch==2.3.0
flash-attn==2.5.6
# Resolved: cu121, cp310, sm_90
Strict Matrix
Compatibility hell is usually caused by mismatched versions.
WheelForge enforces a strict,
realistic build matrix to ensure everything works together.
Current Target Matrix:
- Python 3.10
- CUDA 12.1
- PyTorch 2.4.1
- Ampere GPUs (sm_80/86)
We explicitly target this combination because it is the current sweet spot for stability and performance.
Security Layer
Trust is critical for binary distribution.
WheelForge will implement:
- Malware scanning on all builds
- Secret scanning
- SBOM generation (Software Bill of Materials)
- Build provenance from source to silicon
All automated builds will include binary scanning and provenance.
Initial Library Support
These are the first libraries being integrated into the build engine. Some produce wheels, others are source-built CUDA extensions.
| Library | Description | Status |
|---|---|---|
| flash-attn | Fast and memory-efficient exact attention | 🟦 In Progress (Wheel) |
| vLLM | High-throughput LLM serving with Triton kernels | 🟦 In Progress (Wheel) |
| xformers | Transformers building blocks (source-built CUDA ops) | 🟦 In Progress (Source Build) |
| bitsandbytes | 8-bit optimisers and quantisation routines | 🟨 Planned (Wheel) |
| AutoGPTQ | LLM quantisation toolkit (CUDA kernels) | 🟨 Planned (Wheel) |
Note: Not all GPU libraries ship pre-compiled wheels. WheelForge supports both wheel builds and source-built CUDA extension builds, depending on the library's packaging model.
Ready to scale your builds?
Join the waitlist for early access to the GPU build engine.