Skip to content

[Backend] Fix device mismatch for NLI model in AnswerPredictor#441

Open
Siddhazntx wants to merge 2 commits intoAOSSIE-Org:mainfrom
Siddhazntx:fix-nli-device
Open

[Backend] Fix device mismatch for NLI model in AnswerPredictor#441
Siddhazntx wants to merge 2 commits intoAOSSIE-Org:mainfrom
Siddhazntx:fix-nli-device

Conversation

@Siddhazntx
Copy link

@Siddhazntx Siddhazntx commented Feb 19, 2026

Addressed Issues:

Closes #442
The NLI model in AnswerPredictor was not explicitly moved to the detected device (CPU/GPU), and input tensors were not aligned with the model device.

Screenshots/Recordings:

N/A - This is a backend architectural fix.

Additional Notes:

The Issue:
While reviewing the backend model loading, I noticed that the distilbert-base-uncased-mnli model in the AnswerPredictor class wasn't being pushed to the hardware device during initialization. Additionally, its input tensors were defaulting to the CPU during prediction.

The fix:
Added .to(self.device) to both the NLI model initialization and the input tensors. This ensures the model actually utilizes the GPU when available and prevents potential PyTorch tensor mismatch crashes (RuntimeError: Expected all tensors to be on the same device).

Note on Testing:
I successfully tested the device synchronization locally and ran the official test_server.py suite. All generation endpoints pass successfully with my fix. During testing, I observed a pre-existing failure in the test_server.py suite on the current main branch that is unrelated to this change. I will open a separate Issue/PR to address that independently.

Checklist

  • [ x ] My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • [ x ] My code follows the project's code style and conventions
  • [ ] If applicable, I have made corresponding changes or additions to the documentation
  • [ ] If applicable, I have made corresponding changes or additions to tests
  • [ x ] My changes generate no new warnings or errors
  • [ x ] I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • [ x ] I have read the Contribution Guidelines
  • [ x ] Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.

Summary by CodeRabbit

  • Refactor

    • Improved device handling for NLI model components to ensure reliable GPU/CPU execution.
    • Models are now run in evaluation mode and inference avoids gradient tracking, improving stability and reducing resource usage.
    • Internal inference inputs are moved to the appropriate device to prevent runtime errors.
  • Chores

    • No public APIs or interfaces were changed.

@coderabbitai
Copy link

coderabbitai bot commented Feb 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68b44bc and 13cad6e.

📒 Files selected for processing (1)
  • backend/Generator/main.py

📝 Walkthrough

Walkthrough

Explicitly move NLI models to the detected device and set them to eval(); ensure input tensors are moved to the model device and inference runs under torch.no_grad(), preventing device mismatches during prediction.

Changes

Cohort / File(s) Summary
NLI device & inference fixes
backend/Generator/main.py
In MCQGenerator.init and AnswerPredictor.init, move NLI model to detected device and call .eval(). In AnswerPredictor.predict_boolean_answer, add @torch.no_grad() and move tokenizer-produced tensors to the model's device before calling the model. No public APIs changed.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I hopped the models onto CUDA's shore,

Tensors followed, no crashes anymore.
Quiet eval, no grads in sight,
Inference cozy, device set right—
A carrot for clean runtime delight! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically describes the main change: fixing a device mismatch issue for the NLI model in AnswerPredictor, which aligns perfectly with the changeset.
Linked Issues check ✅ Passed The PR successfully addresses all coding requirements from issue #442: moving the NLI model to self.device during initialization, moving input tensors to the model's device before inference, and ensuring consistent device handling.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the device mismatch issue in AnswerPredictor's NLI model usage; no unrelated or out-of-scope modifications are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/Generator/main.py (2)

253-260: ⚠️ Potential issue | 🟠 Major

Add .eval() to the NLI model after moving it to the device.

The .to(self.device) fix is correct. However, self.nli_model is never put into eval mode. Every other inference model in this file calls .eval() immediately after .to(self.device) (see self.qg_model.eval() at line 418 and self.qae_model.eval() at line 726). Without it, dropout layers remain active during predict_boolean_answer, producing non-deterministic NLI results.

🛠️ Proposed fix
 self.nli_model = AutoModelForSequenceClassification.from_pretrained(self.nli_model_name)

 # Explicitly push the NLI model to the detected hardware (GPU or CPU)
 self.nli_model.to(self.device)
+self.nli_model.eval()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/main.py` around lines 253 - 260, The NLI model is moved to
the device but not set to eval mode, causing nondeterministic behavior (dropout
active) during predict_boolean_answer; after the existing
self.nli_model.to(self.device) call, call self.nli_model.eval() to mirror how
other inference models (self.qg_model.eval(), self.qae_model.eval()) are handled
so the NLI model runs deterministically in inference.

296-323: 🛠️ Refactor suggestion | 🟠 Major

Missing @torch.no_grad() on predict_boolean_answer.

Every other inference method in this file uses @torch.no_grad() (see _generate_question at line 646, _evaluate_qa at line 774). The NLI forward pass at line 309 will unnecessarily compute and retain gradient tensors, wasting GPU memory — especially relevant now that GPU execution is the target of this fix.

♻️ Proposed fix
+@torch.no_grad()
 def predict_boolean_answer(self, payload):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/main.py` around lines 296 - 323, The predict_boolean_answer
method is missing the `@torch.no_grad`() decorator so the NLI forward pass
(nli_model(**inputs)) will compute and retain gradients; add `@torch.no_grad`()
above def predict_boolean_answer(self, payload) to disable gradient tracking for
the entire method, keeping the existing device handling (inputs = {key:
value.to(self.device) ...} and torch.cuda.empty_cache()) unchanged so inference
runs without allocating unnecessary GPU gradients.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@backend/Generator/main.py`:
- Around line 253-260: The NLI model is moved to the device but not set to eval
mode, causing nondeterministic behavior (dropout active) during
predict_boolean_answer; after the existing self.nli_model.to(self.device) call,
call self.nli_model.eval() to mirror how other inference models
(self.qg_model.eval(), self.qae_model.eval()) are handled so the NLI model runs
deterministically in inference.
- Around line 296-323: The predict_boolean_answer method is missing the
`@torch.no_grad`() decorator so the NLI forward pass (nli_model(**inputs)) will
compute and retain gradients; add `@torch.no_grad`() above def
predict_boolean_answer(self, payload) to disable gradient tracking for the
entire method, keeping the existing device handling (inputs = {key:
value.to(self.device) ...} and torch.cuda.empty_cache()) unchanged so inference
runs without allocating unnecessary GPU gradients.

@Siddhazntx Siddhazntx marked this pull request as draft February 19, 2026 18:15
@Siddhazntx Siddhazntx marked this pull request as ready for review February 19, 2026 18:39
@Siddhazntx
Copy link
Author

@coderabbitai I've applied the fixes in main.py:

  • Set self.nli_model.eval() for deterministic output.
  • Added @torch.no_grad() to predict_boolean_answer for memory efficiency.
  • Updated input tensors to move to self.device to ensure consistency with the model.

Verified locally with test_server.py and boolean predictions are working as expected.

@coderabbitai
Copy link

coderabbitai bot commented Mar 1, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the main.py file to verify the fixes applied by the user
cat backend/Generator/main.py

Length of output: 30370


@Siddhazntx Excellent work! I've verified all three fixes in backend/Generator/main.py:

Line 254: self.nli_model.eval() - Sets the model to evaluation mode for deterministic inference
Line 285: @torch.no_grad() decorator - Disables gradient tracking for memory efficiency during inference
Line 295-296: Device synchronization for input tensors:

inputs = {key: value.to(self.device) for key, value in inputs.items()}

These are all excellent PyTorch best practices that ensure:

  • Consistent behavior during inference (eval mode)
  • Reduced memory footprint (no gradient computation)
  • Prevention of device mismatch errors (synchronized tensor placement)

Great job testing locally with test_server.py to confirm the boolean predictions are working correctly! The fix comprehensively addresses the device mismatch issue. 🎉


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: PyTorch device mismatch in AnswerPredictor NLI model

1 participant