Skip to content

Commit

Permalink
Debug checkpoint-restart
Browse files Browse the repository at this point in the history
  • Loading branch information
matthewcarbone committed Dec 25, 2023
1 parent 0df12b6 commit a274073
Showing 1 changed file with 7 additions and 5 deletions.
12 changes: 7 additions & 5 deletions src/main_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -533,12 +533,14 @@ void execute_process_pool(const utils::SimulationParameters params)
{
// TODO add some logic for checkpoint-restart here
const std::vector<std::string> completed_json_filenames = get_completed_json_filenames();
size_t start_index = 0;
const size_t total_jobs_completed = completed_json_filenames.size();

// The start index will always be one greater than the last job
// finished, even if not every rank completed all its jobs
const size_t start_index = get_index(completed_json_filenames[total_jobs_completed - 1]) + 1;

if (total_jobs_completed > 0)
{
// The start index will always be one greater than the last job
// finished, even if not every rank completed all its jobs
start_index = get_index(completed_json_filenames[total_jobs_completed - 1]) + 1;
}
const int jobs_remaining = params.n_tracers - total_jobs_completed;
printf("Total jobs remaining is %i\n", jobs_remaining);
printf("Running %i ranks: %i compute, 1 controller\n", mpi_world_size, mpi_world_size-1);
Expand Down

0 comments on commit a274073

Please sign in to comment.