-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Summary
When max_time is exceeded in runArraySimulation(), the function saves a result file that appears valid but contains no usable simulation results. This contradicts the documentation's promise that "any evaluations completed before the cluster is terminated can be saved."
Expected Behavior
According to the documentation (lines 93-100 in R/runArraySimulation.R):
max_timespecifies the maximum time allowed for a single simulation condition to execute... In general, this input should be set to somewhere around 80-90% of the true termination time so that any evaluations completed before the cluster is terminated can be saved.
When max_time is exceeded, the saved file should contain:
- All successfully completed replications in
stored_results - Summary statistics computed from those partial results
- A clear indicator of actual replications completed (e.g.,
REPLICATIONSshould show the actual count, not the target)
Actual Behavior
When max_time is exceeded, the saved RDS file contains:
- Empty
stored_results(list()) - No summary statistics (only design conditions and metadata columns)
- Misleading
REPLICATIONScount (shows target count, e.g., 10000, not actual completions) - No way to determine how many replications actually completed
Example from simulation where some designs exceeded the time limit
Successful completion (Design 11):
str(readRDS("results/sim_res-11.rds"))
# SimDesgn [1 × 79] - Contains all 66 summary statistics
# $ REPLICATIONS : num 10000
# $ stored_results: tibble [10,000 × 23] # All replications presentFailed due to max_time (Design 12):
str(readRDS("results/sim_res-12.rds"))
# SimDesgn [1 × 11] - Only 7 design + 4 metadata columns
# $ REPLICATIONS : int 10000 # THIS IS THE TARGET, NOT ACTUAL
# $ SIM_TIME : num 3000 # Hit the time limit
# $ stored_results: list() # EMPTY - no replications savedImpact
This is a critical issue for HPC users because:
- Silent failure: The file exists, so
SimCheck()may not flag it as problematic - No diagnostic information: Cannot determine if 0, 247, or 9,999 replications completed
- Wasted computation: All completed replications are lost, forcing complete re-runs
- Impossible to optimize: Cannot make informed decisions about time allocation adjustments
Root Cause Analysis (By Claude Code)
Looking at the code flow:
-
lapply_timer()(R/util.R:376-402) correctly handles timeouts and returns partial results with a message:if(time_left <= 0){ message(sprintf("Simulation terminated due to max_time constraint (%i/%i replications evaluated)."), i, length(ret)) ret <- ret[1L:i] # Return partial results break }
-
However, when partial results are returned to the analysis workflow (
R/analysis.R):obs_reps <- length(results)(line 205) correctly captures the actual countret <- c(sim_results, 'REPLICATIONS'=obs_reps, ...)(line 231) should save it- But if
summarise()fails or the workflow terminates early, this never happens
-
runSimulation()falls back to the input parameter (R/runSimulation.R:1872):REPLICATIONS=replications # Uses target, not actual
-
runArraySimulation()saves whatever is returned (R/runArraySimulation.R:376):saveRDS(ret, filename.u) # Saves incomplete/empty result
Suggested Fix
The fix should ensure that when max_time is exceeded:
- Save actual replication count: Even if
summarise()isn't called, store the actual number of completed replications - Save partial results: Ensure
stored_resultscontains all completed replications - Add completion status flag: Include a field like
INCOMPLETE = TRUEorTIMEOUT = TRUE - Call summarise on partial data: Compute summary statistics from whatever completed, even if incomplete
Example structure for incomplete results:
list(
REPLICATIONS_TARGET = 10000,
REPLICATIONS_COMPLETED = 247,
INCOMPLETE = TRUE,
TIMEOUT_REASON = "max_time",
stored_results = <partial results>,
<summary statistics from partial data>
)