Skip to content

Comments

[Feature] get_simulated#1172

Open
DominiqueMakowski wants to merge 14 commits intomainfrom
get_simulated
Open

[Feature] get_simulated#1172
DominiqueMakowski wants to merge 14 commits intomainfrom
get_simulated

Conversation

@DominiqueMakowski
Copy link
Member

No description provided.

Copilot AI review requested due to automatic review settings February 20, 2026 15:07
@gemini-code-assist
Copy link

Summary of Changes

Hello @DominiqueMakowski, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new get_simulated function, significantly expanding the package's capabilities by providing a standardized way to simulate response values from various statistical models. This feature allows for more robust model diagnostics, uncertainty quantification, and predictive analysis across different model types, enhancing the utility for researchers and practitioners working with diverse statistical frameworks.

Highlights

  • New get_simulated function introduced: A new generic function get_simulated has been added to allow users to simulate response values from various fitted statistical models.
  • Support for multiple model types: Specific methods for get_simulated have been implemented for lm, betareg, glmmTMB, and merMod (including lmerMod) model objects, enabling simulation for these common model classes.
  • Enhanced data handling for simulation: The get_simulated function supports an optional data argument for lm and betareg methods, allowing simulations on new data or data grids, and includes a pipe-friendly data.frame method.
  • Comprehensive S3 method registration: The NAMESPACE file has been updated to register get_simulated as an S3 generic and export its methods for a wide range of statistical model classes, with a default method to handle unimplemented types gracefully.
  • Unit tests for new functionality: New unit tests have been added to verify the correct behavior of get_simulated for lm, betareg, glmmTMB, and merMod models, as well as its data.frame dispatch and default error handling.
Changelog
  • NAMESPACE
    • Added S3 method registrations for get_simulated across numerous model classes.
    • Exported the new get_simulated generic function.
  • R/get_simulated.R
    • Created the get_simulated generic function.
    • Implemented get_simulated.lm for linear and generalized linear models (Gaussian family).
    • Implemented get_simulated.betareg for beta regression models.
    • Implemented get_simulated.glmmTMB for generalized linear mixed models from the glmmTMB package.
    • Implemented get_simulated.merMod (and get_simulated.lmerMod alias) for mixed-effects models from the lme4 package.
    • Added get_simulated.default to provide an informative error for unsupported model classes.
    • Included get_simulated.data.frame for pipe-friendly usage.
    • Aliased many model classes to get_simulated.default.
  • man/get_simulated.Rd
    • Added comprehensive documentation for the get_simulated function, including its purpose, arguments, return value, and examples.
  • tests/testthat/test-get_simulated.R
    • Added unit tests for get_simulated.lm to ensure correct simulation and RNG state handling.
    • Added unit tests for get_simulated.data.frame dispatch.
    • Added unit tests for get_simulated.default error handling.
    • Added unit tests for get_simulated.betareg.
    • Added unit tests for get_simulated.glmmTMB, including checks for unsupported data arguments.
    • Added unit tests for get_simulated.merMod, including simulation with new data and allow.new.levels.
Activity
  • The pull request introduces a new feature, get_simulated, with initial implementations and comprehensive tests.
  • No specific review comments or further activity have been recorded yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces the get_simulated() generic and several S3 methods to simulate responses from fitted models (lm, betareg, glmmTMB, merMod). The implementation is well-structured and follows the package's architecture. However, there are several adherence issues with the repository's style guide, specifically regarding the placement of curly braces in function definitions, the mandatory use of the :: operator for functions from external packages (especially stats), and the preference for insight::format_error() over stop(). Additionally, optional dependencies should be checked using insight::check_if_installed() before use.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a new get_simulated() function to the insight package that provides a unified interface for simulating response values from fitted statistical models. The implementation mirrors the design pattern of existing get_* functions in the package (like get_predicted()) and adds support for multiple model types including lm/glm, betareg, glmmTMB, and merMod models.

Changes:

  • Added new generic function get_simulated() with methods for lm, betareg, glmmTMB, merMod, data.frame, and a default method for unsupported models
  • Implemented comprehensive test suite covering the main supported model types
  • Added documentation with usage examples for the new functionality

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File Description
R/get_simulated.R Core implementation of the get_simulated generic and all its methods, including RNG state management and model-specific simulation logic
tests/testthat/test-get_simulated.R Test suite covering lm, betareg, glmmTMB, merMod models, and error cases
man/get_simulated.Rd Auto-generated documentation describing the function interface and parameters
NAMESPACE Export declarations for the new function and all its S3 methods

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@DominiqueMakowski
Copy link
Member Author

DominiqueMakowski commented Feb 21, 2026

My feeling is that get_simulated should be useable from within get_predicted (which would enable seemless integration at higher levels, modelbased etc). The question is how to do it in a conceptually consistent and logical manner.

@bwiernik: "specifying bootstrap should be in ci_method not predict"

Current behaviour:

  • (default) get_predicted.lm(m, iterations=NULL, ci_method=NULL): predict.lm + analytical CI
  • get_predicted.lm(m, iterations=100, ci_method=NULL): boostrapped predict.lm + CI from these draws
  • get_predicted.lm(m, iterations=NULL, ci_method="bootstrapped"): not existing

(Note: for LMs, predict type doesn't matter here)

New proposal:

  • default doesn't change
  • get_predicted.lm(m, predict="expectation", iterations=100, ci_method=NULL): bootstrapped link-predictions + analytical CI (NOT using these draws for CI, just analytically compute so the CI
    • Alternative: automatically set the ci_method to "boostrapped" if NULL, but allow the user to force it to be non-bootstrapped (Wald or whatnot)
  • get_predicted.lm(m, predict="prediction", iterations=100, ci_method=NULL): get_simulated() + CI from these draws
  • get_predicted.lm(m, predict="expectation", iterations=NULL, ci_method="bootstrapped"): predict.lm + CI computed from draws with iterations get set to a default number (e.g., 1000). I.e., CI is based on draws but predictions are still deterministic
  • get_predicted.lm(m, predict="prediction", iterations=100, ci_method="bootstrapped"): 🤷 one option is to decouple the predictions draws (based on simulate) from the CI computation (based on boostrap), but bootstrap what? expectations? predictions?

@mattansb @strengejacke

@strengejacke
Copy link
Member

I wouldn't add "bootstrapped" as ci_method option, what option / what kind of CIs would that be?

@strengejacke
Copy link
Member

The logic is that we have draws / samples, and then have different ways to extract the 95% quantiles via the method (https://easystats.github.io/bayestestR/reference/ci.html).

@DominiqueMakowski
Copy link
Member Author

DominiqueMakowski commented Feb 21, 2026

I suppose what you say also makes sense. Formally, the hiatus is between these two positions:

  • The CI refers to the credibility of the draws when multiple draws are present, regardless of what these draws are
  • The CI refers to the CI of the estimation, which can be in some cases decoupled bfrom the predictions (eg bootstrapped CI + non-bootsrapped predictions)

@strengejacke
Copy link
Member

The CI refers to the CI of the estimation, which can be in some cases decoupled bfrom the predictions (eg bootstrapped CI + non-bootsrapped predictions)

Are there situations where you really want these? They're probably no good "dispersion" or "uncertainty" estimates and - in extreme cases - may not even include the point estimate (predicted value)?

@strengejacke
Copy link
Member

Why do we have all these

#' @export
get_simulated.polr <- get_simulated.default

?

@DominiqueMakowski
Copy link
Member Author

Why do we have all these

Placeholders to remember to add specific methods I think (I asked to add placeholders to all the classes supported by get_predicted)

Are there situations where you really want these?

Yeah, probably not really in practice. But does it make sense logically? don't know either 🤷

@strengejacke
Copy link
Member

Maybe this?

  • default doesn't change
  • get_predicted.lm(m, predict="expectation", iterations=100, ci_method=NULL): bootstrapped link-predictions + ETI
  • get_predicted.lm(m, predict="expectation", iterations=100, ci_method="HDI"): bootstrapped link-predictions + HDI (or whatever method)
  • get_predicted.lm(m, predict="prediction", iterations=NULL, ci_method=NULL): predicted values + PI
  • get_predicted.lm(m, predict="prediction", iterations=100, ci_method=NULL): get_simulated() + ETI
  • get_predicted.lm(m, predict="expectation", iterations=NULL, ci_method="bootstrapped"): We don't have / support this option
  • get_predicted.lm(m, predict="prediction", iterations=100, ci_method="bootstrapped"): We don't have / support this option

Or we could just change the following: when predict = "prediction", we always rely on get_simulated() and require the iterations argument (or it defaults to 1000, if NULL). That's my personal favorite. So everything stays as it is right now, and we only change the behaviour for predict = "prediction".

@strengejacke
Copy link
Member

We should check whether we have binary, ordinal or categorical outcomes. In this case, simulate() doesn't work well.

library(insight)
model <- glm(vs ~ am + wt, data = mtcars, family = "binomial")

# fails, because we call `simulate()` on the entire model,
# but we do not filter data
out <- get_simulated(
  model,
  iterations = 2,
  seed = 123,
  data = insight::get_datagrid(model, "am")
)
#> Error in `dim(val) <- c(n, iterations)`:
#> ! dims [product 4] do not match the length of object [64]

# works, but anyway - simulated values are only 0 or 1, we can't calculate
# a probability or useful intervals
out <- get_simulated(
  model,
  iterations = 2,
  seed = 123
)
out
#>                     iter_1 iter_2
#> Mazda RX4                0      0
#> Mazda RX4 Wag            0      0
#> Datsun 710               1      1
#> Hornet 4 Drive           0      1
#> Hornet Sportabout        1      1
#> Valiant                  0      0
#> Duster 360               0      0
#> Merc 240D                0      1
#> Merc 230                 1      1
#> Merc 280                 0      0
#> Merc 280C                1      0
#> Merc 450SE               0      0
#> Merc 450SL               0      0
#> Merc 450SLC              0      0
#> Cadillac Fleetwood       0      0
#> Lincoln Continental      0      0
#> Chrysler Imperial        0      0
#> Fiat 128                 1      0
#> Honda Civic              1      1
#> Toyota Corolla           1      1
#> Toyota Corona            1      1
#> Dodge Challenger         1      0
#> AMC Javelin              0      0
#> Camaro Z28               1      0
#> Pontiac Firebird         0      0
#> Fiat X1-9                1      1
#> Porsche 914-2            1      0
#> Lotus Europa             1      1
#> Ford Pantera L           0      0
#> Ferrari Dino             0      0
#> Maserati Bora            0      0
#> Volvo 142E               1      0

Created on 2026-02-21 with reprex v2.1.1

@strengejacke
Copy link
Member

Ok, should work for binomial now. But we have to add internal simulate-function for other glm-families, too (to make it work with data grids)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants