Platform Overview

A GPU Build Farm that
compiles what pip can’t.

Stop compiling locally. Stop fighting version conflicts. WheelForge is a managed build system for the most painful GPU libraries.

We run a dedicated GPU build farm that compiles, tests, and packages hardware-native binary artifacts for your specific deployment targets.
Whether it's FlashAttention, xFormers, or Triton—we handle the compilation so you don't have to.

bash
$ wf build --arch=sm_90 --cuda=12.1
Queued build job #8291 on H100 cluster...
Build complete (45s)
→ Pushing to private registry... Done.

Single GPU Builder

We currently run a dedicated A40 build node designed to crush heavy compilations.
You request a binary artifact, and if it's not pre-built, we queue it for compilation in a clean, isolated environment.

Current Capabilities:

  • Builds flash-attn, xformers, bitsandbytes, triton
  • Targets Ampere (sm_80/86) architectures
  • Isolated venv per build
  • Outputs wheels + metadata

WheelForge handles the nasty compilation steps so your CI doesn't have to.

Status: Live (Internal)
Git Push
WheelForge Build Farm
Optimized Wheel
H100 (SM_90)
A100 (SM_80)
V100 (SM_70)
L40S (SM_89)

SM-Native Compilers

Generic wheels leave performance on the table.
WheelForge is being built to compile specifically for the Streaming Multiprocessor (SM) architecture of your target GPU.

Why it matters:
FlashAttention compiled for A100 will never run optimally on H100.
We manage this complexity for you.

Status: Prototype stage

Verified Artifacts

We don't just compile; we verify. Every wheel produced by WheelForge undergoes a suite of GPU smoke tests before it's accepted.

  • Tensor operations check
  • Small forward/backward pass
  • CUDA availability check
  • Metadata verification

If a build fails validation, it never reaches your machine.

We store every verified build with its full build log and test results.

Status: Live
User A
Cache Write
User B
Cache Hit
# requirements.txt
torch==2.3.0
flash-attn==2.5.6
# Resolved: cu121, cp310, sm_90

Strict Matrix

Compatibility hell is usually caused by mismatched versions.
WheelForge enforces a strict, realistic build matrix to ensure everything works together.

Current Target Matrix:

  • Python 3.10
  • CUDA 12.1
  • PyTorch 2.4.1
  • Ampere GPUs (sm_80/86)

We explicitly target this combination because it is the current sweet spot for stability and performance.

Status: Enforced

Security Layer

Trust is critical for binary distribution.

WheelForge will implement:

  • Malware scanning on all builds
  • Secret scanning
  • SBOM generation (Software Bill of Materials)
  • Build provenance from source to silicon

All automated builds will include binary scanning and provenance.

Status: Not yet available
🛡️
Scan Passed

Initial Library Support

These are the first libraries being integrated into the build engine. Some produce wheels, others are source-built CUDA extensions.

Library Description Status
flash-attn Fast and memory-efficient exact attention 🟦 In Progress (Wheel)
vLLM High-throughput LLM serving with Triton kernels 🟦 In Progress (Wheel)
xformers Transformers building blocks (source-built CUDA ops) 🟦 In Progress (Source Build)
bitsandbytes 8-bit optimisers and quantisation routines 🟨 Planned (Wheel)
AutoGPTQ LLM quantisation toolkit (CUDA kernels) 🟨 Planned (Wheel)

Note: Not all GPU libraries ship pre-compiled wheels. WheelForge supports both wheel builds and source-built CUDA extension builds, depending on the library's packaging model.

Ready to scale your builds?

Join the waitlist for early access to the GPU build engine.