Skip to content

Conversation

@lsabor
Copy link
Contributor

@lsabor lsabor commented Nov 13, 2025

closes #3707

adds command: python manage.py reshape_continuous_question
params:

--question_id 12345
[--make_copy]
[--alter_copy]
[--approve_copy_post]
[--nominal_range_min 2020-01-01]
[--nominal_range_max 2300-01-01]
[--convert_to_discrete]
[--step 1.0]
[--new_scheduled_close_time 2300-01-01]

This command will be rarely (hopefully essentially never) used and the code doesn't need to be polished. But it should be checked over for any logic faults.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For review purposes, don't worry about the math functions regarding splines.
Reviewers can probably start around line 293

Comment on lines +293 to +296
new_question = Question.objects.get(id=question.id)
new_question.id = None
new_question.group_id = None
new_question.save()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace with questions.services.common.clone_question

Comment on lines +304 to +316
for k, v in new_post.__dict__.items():
if (
k.startswith("_")
or k == "id"
or k == "group_of_questions_id"
or k == "conditional_id"
):
pass
elif k == "question_id":
post_dict[k] = new_question.id
else:
post_dict[k] = v
new_post = Post(**post_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to the separate clone_post function:

def clone_post(post: Post, **kwargs):
    ...

clone_post(post, question=new_question)

Comment on lines +323 to +331
new_forecasts: list[Forecast] = []
for forecast in original_forecasts.iterator(chunk_size=100):
forecast.id = None
forecast.pk = None
forecast.question = new_question
forecast.post = new_post
new_forecasts.append(forecast)
if new_forecasts:
Forecast.objects.bulk_create(new_forecasts, batch_size=500)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that you keep the entire new_forecasts list in memory, which can eat a lot of RAM.

I already have a nice util for batched updates — utils.models.ModelBatchUpdater. Maybe you can create a similar helper, but for batched creation?

Something like:

class ModelBatchCreator(ModelBatchUpdater):
    def __init__(
        self,
        model_class: type[DjangoModelType],
        batch_size: int = 100,
    ):
        self.model_class = model_class
        self.batch_size = batch_size

        self._batch: list[DjangoModelType] = []

    def append(self, obj: DjangoModelType) -> None:
        self._batch.append(obj)

        if len(self._batch) >= self.batch_size:
            self.flush()

    def flush(self) -> None:
        if self._batch:
            self.model_class.objects.bulk_create(self._batch)
            self._batch.clear()

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        self.flush()

And then use it like:

# Updating posts
with ModelBatchCreator(
    model_class=Forecast, batch_size=500
) as creator:
    for idx, forecast in enumerate(original_forecasts.iterator(chunk_size=500)):
        forecast.id = None
        forecast.pk = None
        forecast.question = new_question
        forecast.post = new_post
        creator.append(forecast)

        if idx % batch_size == 0:
            logger.info(f"Created {idx}/{total} forecasts")

question_to_change.scheduled_resolve_time = new_scheduled_close_time
question_to_change.save()
post = question_to_change.get_post()
assert post
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert won't work in python production mode, please replace with explicit exceptions raise

new_cdf = np.cumsum(new_pmf).tolist()[:-1]
return new_cdf

print("rescaling forecasts...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace it with logging

Comment on lines +454 to +469
for i, forecast in enumerate(forecasts.iterator(chunk_size=100), 1):
print(i, "/", c, end="\r")
forecast.continuous_cdf = transform_cdf(forecast.continuous_cdf)
forecast.distribution_input = None
updated_forecasts.append(forecast)
print()
print("Done")
if updated_forecasts:
print("Saving forecasts...", end="\r")
with transaction.atomic():
Forecast.objects.bulk_update(
updated_forecasts,
fields=["continuous_cdf", "distribution_input"],
batch_size=500,
)
print("Saving forecasts... DONE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here -- let's use ModelBatchUpdater

Comment on lines +611 to +615
if question.type not in [
Question.QuestionType.NUMERIC,
Question.QuestionType.DISCRETE,
Question.QuestionType.DATE,
]:
Copy link
Contributor

@hlbmtc hlbmtc Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if question.type not in QUESTION_CONTINUOUS_TYPES

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rescale Continuous Question

3 participants