Benchmarks | KI:KUBE

Validated Profiles

Benchmark-Daten: gemessen, dokumentiert, reproduzierbar.

Wir veröffentlichen Tokens-pro-Sekunde-Messungen für jedes Validated Profile bei drei Concurrency-Stufen (n=1 / n=4 / n=8). Methode, Hardware, Engine-Version und Quantisierungs-Backend sind pro Zeile dokumentiert.

Hinweis: Das vollständige Dashboard mit Live-Daten und Pre-Deploy-Quality-Gate-Reports ist in Vorbereitung. Die folgende Tabelle zeigt eine kuratierte Auswahl aus dem Test-Lauf vom Mai/Juni 2026.

Modell	Topologie	n=1	n=4	n=8	Gate
Qwen3.6-35B-A3B-NVFP4	4× DGX Spark · TP=4 · EP=1	80	285	438	PASS
Qwen3.6-35B-A3B-FP8	4× DGX Spark · TP=4 · EP=1	77	280	426	PASS
Qwen3-235B-A22B-FP8	4× DGX Spark · TP=4 · EP=1	31	105	188	PASS
Nemotron-3-Super-120B-A12B-NVFP4	4× DGX Spark · TP=4 · EP=1	29	89	135	PASS
Nemotron-3-Ultra-550B-A55B-NVFP4	4× DGX Spark · TP=4 · EP=4	10	29	43	PASS
DeepSeek-R1-distill 70B	4× DGX Spark · TP=4	28	98	172	PASS
Llama-3.3 70B-Instruct	2× DGX Spark · TP=2	24	84	142	PASS

Werte in Tokens/Sekunde (Output). Methode: SGLang Continuous Batching, Eingabelänge 512 / Ausgabelänge 512, Warm-Run nach 50 Iter Warm-up. Vollständige Test-Konfiguration pro Profil auf Anfrage.

Public Dashboard: Roadmap

▸ Suche & Filter nach Modell / Topologie / Quantisierung
▸ Historische Gate-Reports (Vorher/Nachher pro Update)
▸ Pareto-Front: tok/s vs. Speicherbedarf pro Topologie
▸ Reproduktions-Anker (matrix.yaml + Image-Tag pro Zeile)

Aktueller Stand der Datenpipeline: matrixtest-Job-Runner schreibt MATRIX_SUMMARY-Indizes nach NFS · Public-Surface in Arbeit.