What Works Right Now
WheelForge is running as a single GPU builder (A40) that targets a tight, realistic matrix.
Proven Builds
We build FlashAttention 2.5, xFormers 0.0.27, bitsandbytes 0.43.1, and Triton 3.0.0 from source.
Specific Matrix
Targeting Python 3.10 + CUDA 12.1 + PyTorch 2.4.1 on Ampere-class GPUs (sm_80/86).
Verified Artifacts
Every build runs GPU smoke tests (tensor ops, small kernels) before emitting a wheel + metadata.
More Than Just a Pip Fix
WheelForge is a complete Binary Infrastructure Platform.
Build
On-demand GPU build farm. WheelForge will dispatch GPU runners to compile binary artifacts for the long-tail of hardware that PyPI ignores—producing wheels where possible, and source-built CUDA extensions where necessary.
- Wheel builds & source-only CUDA ops
- Multi-arch compilation (SM_80, SM_90)
- Custom CUDA + PyTorch versions
Optimize
Hardware-native optimization. We are building compilers that target the specific Streaming Multiprocessor architecture of your deployment hardware.
- SM-native wheels
- PGO optimization
- Cluster profiling
Secure
Enterprise-grade security. All automated builds will include binary scanning and provenance.
- Private PyPI mirrors
- Binary provenance
- Retention policies
Who This Is For
If you’ve ever thought “I don’t want to debug another flash-attn build again”, you’re the target user.
GPU Teams
Teams running GPU-heavy Python workloads (training or inference) who need reliable, optimized binaries.
Frustrated Engineers
Engineers sick of NVCC errors, missing headers, and random compile failures on new machines.
Platform Builders
Tooling / platform teams who want to standardise on a known set of GPU wheels across many environments.
Why WheelForge?
| Feature | PyPI / Pip | Docker | WheelForge |
|---|---|---|---|
| GPU Build Farm | ❌ | ❌ | ✅ Native |
| Optimized Binaries | ❌ Generic | ❌ Generic | ✅ Hardware-Specific |
| Storage Limits | Limited | Large Layers | ✅ Petabyte Scale |
| Security Scanning | Basic | Image only | ✅ Deep Binary Scan |
| CI Integration | ⚠️ Flaky | ⚠️ Heavy | ✅ Instant |
What We're Building Towards
WheelForge is becoming a BinOps layer for GPU AI workloads.
Trusted Catalogue
A small, trusted catalogue of prebuilt CUDA / PyTorch wheels for annoying packages.
Reproducible Builds
Every wheel tied to an explicit Python / CUDA / Torch / arch matrix. Source → wheel repack for libraries that only ship sdists.
API + CLI
So your tooling can say: wf build flash-attn==2.5 --python 3.10 --cuda 12.1 --arch sm_80
...and just get back a tested wheel.
Ready to stop fighting dependencies?
Join the waitlist for early access to the build farm.