Home

Search Pilot

About v0.11.0
Login
Copy to Clipboard

Training Material Metadata

Title Checkpointing
Abstract When running a job for any substantial length of time, you hope that nothing interrupts it so that you won't have to start over. But no matter the environment, interruption is always a possibility. Checkpoint/Restart (C/R) solutions allow you to resume a job from approximately where it left off. Checkpointing strategies also have additional benefits beyond fault tolerance, so learning about these strategies is likely to benefit your own work. This roadmap provides an overview of the checkpointing/restart strategy and a survey of different types of C/R solutions.
Authors ['Brandon Barker']
Expertise Level None
Learning Outcome None
Learning Resource Type asynchronous online training
Target Group ['Researchers', 'Research groups', 'Student']
Keywords ['checkpointing', 'fault tolerance', 'best practices', 'containers', 'virtualization']
Cost None
Duration 240
Language en
License None
Resource URL Type URL
Start Datetime None
URL https://cvw.cac.cornell.edu/checkpoint
Version Date 2023-01
Provider ID urn:ogf.org:glue2:access-ci.org:resource:cider:infrastructure.organizations:898
Rating None

Globus Search Metadata

subject urn%3Aogf.org%3Aglue2%3Aaccess-ci.org%3Aresource%3Acider%3Ainfrastructure.organizations%3A898%3ACVW%3Acheckpoint