-
Notifications
You must be signed in to change notification settings - Fork 25
add command to reshape continuous questions #3789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For review purposes, don't worry about the math functions regarding splines.
Reviewers can probably start around line 293
| new_question = Question.objects.get(id=question.id) | ||
| new_question.id = None | ||
| new_question.group_id = None | ||
| new_question.save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's replace with questions.services.common.clone_question
| for k, v in new_post.__dict__.items(): | ||
| if ( | ||
| k.startswith("_") | ||
| or k == "id" | ||
| or k == "group_of_questions_id" | ||
| or k == "conditional_id" | ||
| ): | ||
| pass | ||
| elif k == "question_id": | ||
| post_dict[k] = new_question.id | ||
| else: | ||
| post_dict[k] = v | ||
| new_post = Post(**post_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this to the separate clone_post function:
def clone_post(post: Post, **kwargs):
...
clone_post(post, question=new_question)
| new_forecasts: list[Forecast] = [] | ||
| for forecast in original_forecasts.iterator(chunk_size=100): | ||
| forecast.id = None | ||
| forecast.pk = None | ||
| forecast.question = new_question | ||
| forecast.post = new_post | ||
| new_forecasts.append(forecast) | ||
| if new_forecasts: | ||
| Forecast.objects.bulk_create(new_forecasts, batch_size=500) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that you keep the entire new_forecasts list in memory, which can eat a lot of RAM.
I already have a nice util for batched updates — utils.models.ModelBatchUpdater. Maybe you can create a similar helper, but for batched creation?
Something like:
class ModelBatchCreator(ModelBatchUpdater):
def __init__(
self,
model_class: type[DjangoModelType],
batch_size: int = 100,
):
self.model_class = model_class
self.batch_size = batch_size
self._batch: list[DjangoModelType] = []
def append(self, obj: DjangoModelType) -> None:
self._batch.append(obj)
if len(self._batch) >= self.batch_size:
self.flush()
def flush(self) -> None:
if self._batch:
self.model_class.objects.bulk_create(self._batch)
self._batch.clear()
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
self.flush()And then use it like:
# Updating posts
with ModelBatchCreator(
model_class=Forecast, batch_size=500
) as creator:
for idx, forecast in enumerate(original_forecasts.iterator(chunk_size=500)):
forecast.id = None
forecast.pk = None
forecast.question = new_question
forecast.post = new_post
creator.append(forecast)
if idx % batch_size == 0:
logger.info(f"Created {idx}/{total} forecasts")| question_to_change.scheduled_resolve_time = new_scheduled_close_time | ||
| question_to_change.save() | ||
| post = question_to_change.get_post() | ||
| assert post |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert won't work in python production mode, please replace with explicit exceptions raise
| new_cdf = np.cumsum(new_pmf).tolist()[:-1] | ||
| return new_cdf | ||
|
|
||
| print("rescaling forecasts...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's replace it with logging
| for i, forecast in enumerate(forecasts.iterator(chunk_size=100), 1): | ||
| print(i, "/", c, end="\r") | ||
| forecast.continuous_cdf = transform_cdf(forecast.continuous_cdf) | ||
| forecast.distribution_input = None | ||
| updated_forecasts.append(forecast) | ||
| print() | ||
| print("Done") | ||
| if updated_forecasts: | ||
| print("Saving forecasts...", end="\r") | ||
| with transaction.atomic(): | ||
| Forecast.objects.bulk_update( | ||
| updated_forecasts, | ||
| fields=["continuous_cdf", "distribution_input"], | ||
| batch_size=500, | ||
| ) | ||
| print("Saving forecasts... DONE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here -- let's use ModelBatchUpdater
| if question.type not in [ | ||
| Question.QuestionType.NUMERIC, | ||
| Question.QuestionType.DISCRETE, | ||
| Question.QuestionType.DATE, | ||
| ]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if question.type not in QUESTION_CONTINUOUS_TYPES
…reshape-continuous-command
closes #3707
adds command:
python manage.py reshape_continuous_questionparams:
This command will be rarely (hopefully essentially never) used and the code doesn't need to be polished. But it should be checked over for any logic faults.