Created on December 09, 2025
2025
Our work on checkpointing and failure recovery for LLM training is accepted to IEEE Infocom 2026