Announcement_4

Our work on checkpointing and failure recovery for LLM training is accepted to IEEE Infocom 2026 :sparkles: :smile: