Moonlight-16B-A3B training fails in DeepseekV3MoE.forward with UnboundLocalError when shared experts are enabled
Root Cause
In DeepseekV3MoE.forward, the variable y was assigned only in the inference branch:
if not self.training:
y = self.moe_infer(hidden_states, topk_idx, topk_weight).view(*orig_shape)
if self.config.n_shared_experts is not None:
y = y + self.shared_experts(identity)
return y
During training, execution skipped the if not self.training branch, then attempted to use y in the shared-expert accumulation step.
That caused:
UnboundLocalError: cannot access local variable 'y' where it is not associated with a value
Impact
Inference can proceed.
Fine-tuning/training fails on the first forward pass.
This is an upstream model-code bug, not a dataset or trainer configuration issue.
Local Fix Applied
Patched file:
.cache/huggingface/modules/transformers_modules/moonshotai/Moonlight_hyphen_16B_hyphen_A3B/476b36a473d4467f94469414bef6cee75c9c8172/modeling_deepseek.py
Fix applied:
Added a training-mode routed-expert computation path in DeepseekV3MoE.forward.
Ensured y is always initialized before shared experts are added.
Kept the inference path using moe_infer(...) unchanged.