Platform Overview

A GPU Build Farm that
compiles what pip can’t.

Stop compiling locally. Stop fighting version conflicts. WheelForge is a managed build system for the most painful GPU libraries.

We run a dedicated GPU build farm that compiles, tests, and packages hardware-native binary artifacts for your specific deployment targets.
Whether it's FlashAttention, xFormers, or Triton—we handle the compilation so you don't have to.

$ wf build --arch=sm_90 --cuda=12.1

ℹ Queued build job #8291 on H100 cluster...

✓ Build complete (45s)

→ Pushing to private registry... Done.

Single GPU Builder

We currently run a dedicated A40 build node designed to crush heavy compilations.
You request a binary artifact, and if it's not pre-built, we queue it for compilation in a clean, isolated environment.

Current Capabilities:

Builds flash-attn, xformers, bitsandbytes, triton
Targets Ampere (sm_80/86) architectures
Isolated venv per build
Outputs wheels + metadata

WheelForge handles the nasty compilation steps so your CI doesn't have to.

Status: Live (Internal)

Git Push

↓

WheelForge Build Farm

↓

Optimized Wheel

H100 (SM_90)

A100 (SM_80)

V100 (SM_70)

L40S (SM_89)

SM-Native Compilers

Generic wheels leave performance on the table.
WheelForge is being built to compile specifically for the Streaming Multiprocessor (SM) architecture of your target GPU.

Why it matters:
FlashAttention compiled for A100 will never run optimally on H100.
We manage this complexity for you.

Status: Prototype stage

Verified Artifacts

We don't just compile; we verify. Every wheel produced by WheelForge undergoes a suite of GPU smoke tests before it's accepted.

Tensor operations check
Small forward/backward pass
CUDA availability check
Metadata verification

If a build fails validation, it never reaches your machine.

We store every verified build with its full build log and test results.

Status: Live

User A

→

Cache Write

User B

←

Cache Hit

                            # requirements.txt

                            torch==2.3.0

                            flash-attn==2.5.6

                            # Resolved: cu121, cp310, sm_90

Strict Matrix

Compatibility hell is usually caused by mismatched versions.
WheelForge enforces a strict, realistic build matrix to ensure everything works together.

Current Target Matrix:

Python 3.10
CUDA 12.1
PyTorch 2.4.1
Ampere GPUs (sm_80/86)

We explicitly target this combination because it is the current sweet spot for stability and performance.

Status: Enforced

Security Layer

Trust is critical for binary distribution.

WheelForge will implement:

Malware scanning on all builds
Secret scanning
SBOM generation (Software Bill of Materials)
Build provenance from source to silicon

All automated builds will include binary scanning and provenance.

Status: Not yet available

🛡️

Scan Passed

Library	Description	Status
flash-attn	Fast and memory-efficient exact attention	🟦 In Progress (Wheel)
vLLM	High-throughput LLM serving with Triton kernels	🟦 In Progress (Wheel)
xformers	Transformers building blocks (source-built CUDA ops)	🟦 In Progress (Source Build)
bitsandbytes	8-bit optimisers and quantisation routines	🟨 Planned (Wheel)
AutoGPTQ	LLM quantisation toolkit (CUDA kernels)	🟨 Planned (Wheel)

Note: Not all GPU libraries ship pre-compiled wheels. WheelForge supports both wheel builds and source-built CUDA extension builds, depending on the library's packaging model.

Ready to scale your builds?

Join the waitlist for early access to the GPU build engine.

A GPU Build Farm thatcompiles what pip can’t.