Best Model
gemma4:e2b
42/42 passed, 38.9s avg
Arena Tests
194
188 passed across 5 models
Showcase
27
deliverables from 12 agents
Skills Audited
256
avg 5.0/10 completeness
Knowledge
36
domain articles
Model Arena 5 models
| # | Model | Pass Rate | Latency | Avg Tokens |
|---|---|---|---|---|
| 1 | gemma4:e2b | 100% (42/42) | 38.9s | 2085 |
| 2 | gemma4:e4b | 100% (38/38) | 73.7s | 2282 |
| 3 | qwen3:8b | 95% (36/38) | 87.8s | 2011 |
| 4 | deepseek-r1:8b | 95% (36/38) | 96.8s | 2139 |
| 5 | qwen3:14b | 95% (36/38) | 156.8s | 2065 |
Workload Runs 13 completed
| Workload | Status | Tasks | Failed | Date |
|---|---|---|---|---|
| showcase | completed | 26/52 | 0 | 2026-05-16 05:14:42 |
| knowledge | completed | 17/16 | 0 | 2026-05-15 20:03:25 |
| brief-decomp | completed | 14/60 | 0 | 2026-05-15 17:20:00 |
| skill-audit | completed | 255/300 | 0 | 2026-05-15 15:44:27 |
| showcase | completed | 26/52 | 0 | 2026-05-15 14:32:11 |
| arena | completed | 95/200 | 0 | 2026-05-15 11:28:34 |
| knowledge | completed | 17/16 | 0 | 2026-05-15 04:21:42 |
| brief-decomp | completed | 14/60 | 0 | 2026-05-15 01:39:45 |
| skill-audit | completed | 255/300 | 0 | 2026-05-15 00:04:26 |
| showcase | completed | 26/52 | 0 | 2026-05-14 22:52:18 |