How to read this visualization: Color intensity shows computational budget (darker = more compute). Click tokens for details about which experts were activated across layers. Compare the same passage across two different expert configurations. Subscript numbers (e.g., '₁'₂) indicate multi-byte characters split by the tokenizer—each byte is a separate token with its own routing.
Note: Both models were trained using the same random seed (1223).
Less compute
More compute
Loading visualizations...