Image missing.
LPLB: An early research stage MoE load balancer based on linear programming

created: Nov. 19, 2025, 1:14 p.m. | updated: Nov. 25, 2025, 8:38 p.m.

Linear-Programming-Based Load Balancer (LPLB)LPLB is a parallel load balancer that leverages linear programming to optimize expert parallel workload distribution for MoE (Mixture-of-Experts) models. LPLB is currently in the early research stage, and performance improvements are still under evaluation. cuda () planner = Planner ( r2o , n_logical_experts + n_redundants_per_rank * ep_size , n_logical_experts , group = ep_group , ) # Initialize from a DeepEP `buffer` (optional) # planner.init_from_deep_ep(buffer) N_SMS = 100 # Logical expert indices selected by the model indices = ... # Planner returns physical expert indices redirected_indices = planner . run ( indices , avail_counter , N_SMS )How LPLB WorksLPLB extends EPLB (Expert Parallelism Load Balancer) to address dynamic load imbalance in Mixture-of-Experts (MoE) training. Redundant Experts: Each redundant expert is linked to an original expert, forming edges between GPUs.

3 weeks ago: Hacker News