Thank you, Siro. One more question: Do we need to modify the attention interface to use Ring-Attention when we import a built-in Transformer implementation like Qwen2ForCausalLM from the Transformers package, or does Accelerate’s maybe_context_parallel handle this?