Skip to content

Update impute.R to 1) clarify impute_gmt_titers parameters and 2) enforce reproducibility#3

Open
TKMarkCheng wants to merge 3 commits intoshwilks:mainfrom
TKMarkCheng:TKMarkCheng-patch-1
Open

Update impute.R to 1) clarify impute_gmt_titers parameters and 2) enforce reproducibility#3
TKMarkCheng wants to merge 3 commits intoshwilks:mainfrom
TKMarkCheng:TKMarkCheng-patch-1

Conversation

@TKMarkCheng
Copy link

@TKMarkCheng TKMarkCheng commented Mar 19, 2025

  1. update documentation to clarify required result for impute_gmt_titers(result,titer) is from titertools::gmt() rather than circular dependency.

  2. enforces a random seed variable with a default to help with reproducibility of imputation results for less experienced users

Added a random.seed option with a default to assist with reproducibility when set.seed() is not set by the user explicitly during run.

random_seed variable is added (with a default of the year 2025) for posterity requirements e.g. monte-carlo simulations.
@TKMarkCheng TKMarkCheng changed the title Update impute.R to clarify impute_gmt_titers Update impute.R to 1) clarify impute_gmt_titers parameters and 2) enforce reproducibility Mar 19, 2025
@shwilks
Copy link
Owner

shwilks commented Mar 19, 2025

Thanks for these changes, the changes to the documentation look good, thank you for doing that. For the addition of random seeds within functions, typically the method in R packages is simply to use set.seed() at the top of your script, outside of the functions to ensure reproducibility, or if you really want, to set it in your script directly before the function is called.

Unless there is a good reason to set the seed within a function this is generally avoided, since it creates some confusion for the user about what the seed is at any given point. If you absolutely do want to change the seed within a function for some reason, it should also be restored again once the function (e.g. runif()) has been called, otherwise it will mess up the seed for other things the user runs as part of the script in ways they might not be expecting.

Happy to hear if there is a use case where you couldn't simply do the following for reproducibility though

set.seed(100)
impute_gmt_titers(...)

@TKMarkCheng
Copy link
Author

Thanks Sam! I've added the within function set.seed() as a failsafe/reproducibility guarantee for new users who are unaware of set.seed(), or are running them in a more iterative way e.g. in a .rmd since set.seed() calls the seed at run time rather than stores it as a global variable.

I agree with your point that it can create some confusion by changing the random seed mid-script and may produce more issues down the line. Agree with keeping it simple. Happy for just the comment amendment to be pulled.

remove previous commit changes on enforcing a random seed for code simplicity and potential for confusion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants