[Feature] get_simulated by DominiqueMakowski · Pull Request #1172 · easystats/insight

DominiqueMakowski · 2026-02-20T15:07:24Z

No description provided.

gemini-code-assist · 2026-02-20T15:07:42Z

Summary of Changes

Hello @DominiqueMakowski, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new get_simulated function, significantly expanding the package's capabilities by providing a standardized way to simulate response values from various statistical models. This feature allows for more robust model diagnostics, uncertainty quantification, and predictive analysis across different model types, enhancing the utility for researchers and practitioners working with diverse statistical frameworks.

Highlights

New get_simulated function introduced: A new generic function get_simulated has been added to allow users to simulate response values from various fitted statistical models.
Support for multiple model types: Specific methods for get_simulated have been implemented for lm, betareg, glmmTMB, and merMod (including lmerMod) model objects, enabling simulation for these common model classes.
Enhanced data handling for simulation: The get_simulated function supports an optional data argument for lm and betareg methods, allowing simulations on new data or data grids, and includes a pipe-friendly data.frame method.
Comprehensive S3 method registration: The NAMESPACE file has been updated to register get_simulated as an S3 generic and export its methods for a wide range of statistical model classes, with a default method to handle unimplemented types gracefully.
Unit tests for new functionality: New unit tests have been added to verify the correct behavior of get_simulated for lm, betareg, glmmTMB, and merMod models, as well as its data.frame dispatch and default error handling.

Changelog

NAMESPACE
- Added S3 method registrations for get_simulated across numerous model classes.
- Exported the new get_simulated generic function.
R/get_simulated.R
- Created the get_simulated generic function.
- Implemented get_simulated.lm for linear and generalized linear models (Gaussian family).
- Implemented get_simulated.betareg for beta regression models.
- Implemented get_simulated.glmmTMB for generalized linear mixed models from the glmmTMB package.
- Implemented get_simulated.merMod (and get_simulated.lmerMod alias) for mixed-effects models from the lme4 package.
- Added get_simulated.default to provide an informative error for unsupported model classes.
- Included get_simulated.data.frame for pipe-friendly usage.
- Aliased many model classes to get_simulated.default.
man/get_simulated.Rd
- Added comprehensive documentation for the get_simulated function, including its purpose, arguments, return value, and examples.
tests/testthat/test-get_simulated.R
- Added unit tests for get_simulated.lm to ensure correct simulation and RNG state handling.
- Added unit tests for get_simulated.data.frame dispatch.
- Added unit tests for get_simulated.default error handling.
- Added unit tests for get_simulated.betareg.
- Added unit tests for get_simulated.glmmTMB, including checks for unsupported data arguments.
- Added unit tests for get_simulated.merMod, including simulation with new data and allow.new.levels.

Activity

The pull request introduces a new feature, get_simulated, with initial implementations and comprehensive tests.
No specific review comments or further activity have been recorded yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces the get_simulated() generic and several S3 methods to simulate responses from fitted models (lm, betareg, glmmTMB, merMod). The implementation is well-structured and follows the package's architecture. However, there are several adherence issues with the repository's style guide, specifically regarding the placement of curly braces in function definitions, the mandatory use of the :: operator for functions from external packages (especially stats), and the preference for insight::format_error() over stop(). Additionally, optional dependencies should be checked using insight::check_if_installed() before use.

R/get_simulated.R

Copilot

Pull request overview

This pull request introduces a new get_simulated() function to the insight package that provides a unified interface for simulating response values from fitted statistical models. The implementation mirrors the design pattern of existing get_* functions in the package (like get_predicted()) and adds support for multiple model types including lm/glm, betareg, glmmTMB, and merMod models.

Changes:

Added new generic function get_simulated() with methods for lm, betareg, glmmTMB, merMod, data.frame, and a default method for unsupported models
Implemented comprehensive test suite covering the main supported model types
Added documentation with usage examples for the new functionality

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File	Description
R/get_simulated.R	Core implementation of the get_simulated generic and all its methods, including RNG state management and model-specific simulation logic
tests/testthat/test-get_simulated.R	Test suite covering lm, betareg, glmmTMB, merMod models, and error cases
man/get_simulated.Rd	Auto-generated documentation describing the function interface and parameters
NAMESPACE	Export declarations for the new function and all its S3 methods

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/testthat/test-get_simulated.R

R/get_simulated.R

man/get_simulated.Rd

R/get_simulated.R

man/get_simulated.Rd

R/get_simulated.R

DominiqueMakowski · 2026-02-21T08:58:36Z

My feeling is that get_simulated should be useable from within get_predicted (which would enable seemless integration at higher levels, modelbased etc). The question is how to do it in a conceptually consistent and logical manner.

@bwiernik: "specifying bootstrap should be in ci_method not predict"

Current behaviour:

(default) get_predicted.lm(m, iterations=NULL, ci_method=NULL): predict.lm + analytical CI
get_predicted.lm(m, iterations=100, ci_method=NULL): boostrapped predict.lm + CI from these draws
get_predicted.lm(m, iterations=NULL, ci_method="bootstrapped"): not existing

(Note: for LMs, predict type doesn't matter here)

New proposal:

default doesn't change
get_predicted.lm(m, predict="expectation", iterations=100, ci_method=NULL): bootstrapped link-predictions + analytical CI (NOT using these draws for CI, just analytically compute so the CI
- Alternative: automatically set the ci_method to "boostrapped" if NULL, but allow the user to force it to be non-bootstrapped (Wald or whatnot)
get_predicted.lm(m, predict="prediction", iterations=100, ci_method=NULL): get_simulated() + CI from these draws
get_predicted.lm(m, predict="expectation", iterations=NULL, ci_method="bootstrapped"): predict.lm + CI computed from draws with iterations get set to a default number (e.g., 1000). I.e., CI is based on draws but predictions are still deterministic
get_predicted.lm(m, predict="prediction", iterations=100, ci_method="bootstrapped"): 🤷 one option is to decouple the predictions draws (based on simulate) from the CI computation (based on boostrap), but bootstrap what? expectations? predictions?

@mattansb @strengejacke

strengejacke · 2026-02-21T09:29:08Z

I wouldn't add "bootstrapped" as ci_method option, what option / what kind of CIs would that be?

strengejacke · 2026-02-21T09:30:48Z

The logic is that we have draws / samples, and then have different ways to extract the 95% quantiles via the method (https://easystats.github.io/bayestestR/reference/ci.html).

DominiqueMakowski · 2026-02-21T09:44:30Z

I suppose what you say also makes sense. Formally, the hiatus is between these two positions:

The CI refers to the credibility of the draws when multiple draws are present, regardless of what these draws are
The CI refers to the CI of the estimation, which can be in some cases decoupled bfrom the predictions (eg bootstrapped CI + non-bootsrapped predictions)

strengejacke · 2026-02-21T10:11:10Z

The CI refers to the CI of the estimation, which can be in some cases decoupled bfrom the predictions (eg bootstrapped CI + non-bootsrapped predictions)

Are there situations where you really want these? They're probably no good "dispersion" or "uncertainty" estimates and - in extreme cases - may not even include the point estimate (predicted value)?

strengejacke · 2026-02-21T10:12:20Z

Why do we have all these

#' @export
get_simulated.polr <- get_simulated.default

?

DominiqueMakowski · 2026-02-21T10:32:50Z

Why do we have all these

Placeholders to remember to add specific methods I think (I asked to add placeholders to all the classes supported by get_predicted)

Are there situations where you really want these?

Yeah, probably not really in practice. But does it make sense logically? don't know either 🤷

strengejacke · 2026-02-21T10:51:00Z

Maybe this?

default doesn't change
get_predicted.lm(m, predict="expectation", iterations=100, ci_method=NULL): bootstrapped link-predictions + ETI
get_predicted.lm(m, predict="expectation", iterations=100, ci_method="HDI"): bootstrapped link-predictions + HDI (or whatever method)
get_predicted.lm(m, predict="prediction", iterations=NULL, ci_method=NULL): predicted values + PI
get_predicted.lm(m, predict="prediction", iterations=100, ci_method=NULL): get_simulated() + ETI
get_predicted.lm(m, predict="expectation", iterations=NULL, ci_method="bootstrapped"): We don't have / support this option
get_predicted.lm(m, predict="prediction", iterations=100, ci_method="bootstrapped"): We don't have / support this option

Or we could just change the following: when predict = "prediction", we always rely on get_simulated() and require the iterations argument (or it defaults to 1000, if NULL). That's my personal favorite. So everything stays as it is right now, and we only change the behaviour for predict = "prediction".

strengejacke · 2026-02-21T11:06:27Z

We should check whether we have binary, ordinal or categorical outcomes. In this case, simulate() doesn't work well.

library(insight)
model <- glm(vs ~ am + wt, data = mtcars, family = "binomial")

# fails, because we call `simulate()` on the entire model,
# but we do not filter data
out <- get_simulated(
  model,
  iterations = 2,
  seed = 123,
  data = insight::get_datagrid(model, "am")
)
#> Error in `dim(val) <- c(n, iterations)`:
#> ! dims [product 4] do not match the length of object [64]

# works, but anyway - simulated values are only 0 or 1, we can't calculate
# a probability or useful intervals
out <- get_simulated(
  model,
  iterations = 2,
  seed = 123
)
out
#>                     iter_1 iter_2
#> Mazda RX4                0      0
#> Mazda RX4 Wag            0      0
#> Datsun 710               1      1
#> Hornet 4 Drive           0      1
#> Hornet Sportabout        1      1
#> Valiant                  0      0
#> Duster 360               0      0
#> Merc 240D                0      1
#> Merc 230                 1      1
#> Merc 280                 0      0
#> Merc 280C                1      0
#> Merc 450SE               0      0
#> Merc 450SL               0      0
#> Merc 450SLC              0      0
#> Cadillac Fleetwood       0      0
#> Lincoln Continental      0      0
#> Chrysler Imperial        0      0
#> Fiat 128                 1      0
#> Honda Civic              1      1
#> Toyota Corolla           1      1
#> Toyota Corona            1      1
#> Dodge Challenger         1      0
#> AMC Javelin              0      0
#> Camaro Z28               1      0
#> Pontiac Firebird         0      0
#> Fiat X1-9                1      1
#> Porsche 914-2            1      0
#> Lotus Europa             1      1
#> Ford Pantera L           0      0
#> Ferrari Dino             0      0
#> Maserati Bora            0      0
#> Volvo 142E               1      0

^{Created on 2026-02-21 with reprex v2.1.1}

strengejacke · 2026-02-21T11:29:00Z

Ok, should work for binomial now. But we have to add internal simulate-function for other glm-families, too (to make it work with data grids)

initial

a8040ad

Copilot AI review requested due to automatic review settings February 20, 2026 15:07

Copilot started reviewing on behalf of DominiqueMakowski February 20, 2026 15:07 View session

gemini-code-assist bot reviewed Feb 20, 2026

View reviewed changes

Copilot AI reviewed Feb 20, 2026

View reviewed changes

strengejacke and others added 8 commits February 20, 2026 16:47

address comments

b7b4da5

rename vars

75c0a57

address comments

f4dcff2

fix

c3abec1

address comments

211c3ad

fix Gemini instructions

8760360

address comments

6c3a6d0

add glm test

bba0849

minor code style

d43e700

strengejacke added 3 commits February 21, 2026 12:06

stop on categorical outcomes

aaa23ca

code readability

a3b7c99

fix for binomial and data grid

c116614

works for cbind-glm

d7cc6af

Uh oh!

Comments

Conversation

DominiqueMakowski commented Feb 20, 2026

Uh oh!

gemini-code-assist bot commented Feb 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DominiqueMakowski commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

strengejacke commented Feb 21, 2026

Uh oh!

strengejacke commented Feb 21, 2026

Uh oh!

DominiqueMakowski commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

strengejacke commented Feb 21, 2026

Uh oh!

strengejacke commented Feb 21, 2026

Uh oh!

DominiqueMakowski commented Feb 21, 2026

Uh oh!

strengejacke commented Feb 21, 2026

Uh oh!

strengejacke commented Feb 21, 2026

Uh oh!

strengejacke commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DominiqueMakowski commented Feb 21, 2026 •

edited

Loading

DominiqueMakowski commented Feb 21, 2026 •

edited

Loading