Best Model
gemma4:e2b
23/23 passed, 44s avg
Arena Tests
99
97 passed across 5 models
Showcase
27
deliverables from 12 agents
Skills Audited
128
avg 5.0/10 completeness
Knowledge
18
domain articles
Model Arena 5 models
| # | Model | Pass Rate | Latency | Avg Tokens |
|---|---|---|---|---|
| 1 | gemma4:e2b | 100% (23/23) | 44s | 2118 |
| 2 | gemma4:e4b | 100% (19/19) | 80.4s | 2268 |
| 3 | qwen3:8b | 100% (19/19) | 88.2s | 2018 |
| 4 | deepseek-r1:8b | 95% (18/19) | 96.6s | 2173 |
| 5 | qwen3:14b | 95% (18/19) | 155.7s | 2055 |
Workload Runs 7 completed
| Workload | Status | Tasks | Failed | Date |
|---|---|---|---|---|
| knowledge | completed | 17/16 | 0 | 2026-05-15 04:21:42 |
| brief-decomp | completed | 14/60 | 0 | 2026-05-15 01:39:45 |
| skill-audit | completed | 255/300 | 0 | 2026-05-15 00:04:26 |
| showcase | completed | 26/52 | 0 | 2026-05-14 22:52:18 |
| arena | completed | 1/2 | 0 | 2026-05-14 19:58:34 |
| arena | completed | 1/2 | 0 | 2026-05-14 19:49:17 |
| arena | completed | 95/200 | 0 | 2026-05-14 19:43:14 |