MoE Routing Visualization

How to read this visualization: Color intensity shows computational budget (darker = more compute). Click tokens for details about which experts were activated across layers. Compare the same passage across two different expert configurations. Subscript numbers (e.g., '₁'₂) indicate multi-byte characters split by the tokenizer—each byte is a separate token with its own routing.
Note: Both models were trained using the same random seed (1223).
Less compute
Light Dark
More compute
Text excerpt from "The Art of Living According to Joe Beef"
Loading visualizations...

5:1 Split (2560/512)

23:1 Split (2944/128)