LPLB: An early research stage MoE load balancer based on linear programming
created: Nov. 19, 2025, 1:14 p.m. | updated: Nov. 25, 2025, 8:38 p.m.
Linear-Programming-Based Load Balancer (LPLB)LPLB is a parallel load balancer that leverages linear programming to optimize expert parallel workload distribution for MoE (Mixture-of-Experts) models.
LPLB is currently in the early research stage, and performance improvements are still under evaluation.
cuda () planner = Planner ( r2o , n_logical_experts + n_redundants_per_rank * ep_size , n_logical_experts , group = ep_group , ) # Initialize from a DeepEP `buffer` (optional) # planner.init_from_deep_ep(buffer) N_SMS = 100 # Logical expert indices selected by the model indices = ... # Planner returns physical expert indices redirected_indices = planner .
run ( indices , avail_counter , N_SMS )How LPLB WorksLPLB extends EPLB (Expert Parallelism Load Balancer) to address dynamic load imbalance in Mixture-of-Experts (MoE) training.
Redundant Experts: Each redundant expert is linked to an original expert, forming edges between GPUs.
3 weeks ago: Hacker News