add support for hybrid layer models: conv, mamba by pszemraj · Pull Request #43 · p-e-w/heretic

pszemraj · 2025-11-23T19:40:44Z

this PR adds support for new layer types: mamba, convolution layers as well as for hybrid layers (vary throughout model). This enables use for models like granite-4.0 , LFM2 etc.

Add LFM/LFM-MoE support: extract conv/MLP/expert projections, handle self_attn.out_proj, and enable trust_remote_code for LFM loaders.
Broaden component detection to cover Granite hybrid/shared MLP, Mixtral/JetMoE tensors, and mamba/conv layers; keep abliteration resilient when layers lack matrices.
Tune Optuna search ranges for hybrid/conv stacks and add optional filtering of high-KL (>2) trials in the selection menu (show_all_trials to disable).
Improve trial menu usability: sort by refusals (then KL), surface how many trials were filtered, and maintain original ordering otherwise.
fix: add basic generation-config sanitization so saves/uploads won’t fail (if the generation config is inconsistent/has issues, huggingface will now raise errors instead of saving/uploading... problematic in the CLI menu at the end)

Signed-off-by: Peter Szemraj <74869040+pszemraj@users.noreply.github.com>

- Extract mlp.shared_down_proj, block_sparse_moe.output_linear across variants - Discover components dynamically per-layer for distinct Optuna parameter tracking - Separate abliteration stats: "no matrices" vs. "outside configured ranges" - Apply dedicated parameter ranges to shared MLP components Signed-off-by: Peter Szemraj <74869040+pszemraj@users.noreply.github.com>

- Extract LFM-specific projections: self_attn.out_proj, layer.feed_forward.w2/output_linear - Cover LFM conv layers and MoE expert paths - Maintain conv/mamba/Granite handling Signed-off-by: Peter Szemraj <74869040+pszemraj@users.noreply.github.com>

…trials Signed-off-by: Peter Szemraj <74869040+pszemraj@users.noreply.github.com>

Signed-off-by: Peter Szemraj <74869040+pszemraj@users.noreply.github.com>

- Auto-enable do_sample when sampling fields (temperature/top_k/top_p/typical_p) present - Validate config to prevent HF validation errors on upload Signed-off-by: Peter Szemraj <74869040+pszemraj@users.noreply.github.com>

pszemraj · 2025-11-23T19:45:41Z

some quick points/follow ups @p-e-w :

the package version should probably be bumped to 1.0.2 or so after this. do you want me to update the code or you have some process/want to edit yourself?
you can get slightly better results for lfm2 models with even more dynamic/updated value ranges. check out that gist, but I decided not to include in this PR as it makes things a more complex and can impact search space time/quality for standard models
- if you think it's safe/fine to add directly I can do so, or maybe a follow up PR is best
i have a list of ideas/nice-to-haves that i thought of when implementing this PR+standalone usage, want me to create an issue or how best to share?

p-e-w

I'm super excited about this PR! There have been very few successful abliterations of non-traditional architectures so far, and it's amazing that Heretic will have this ability soon!

My comments are just an initial review based on a first look through the code. My most important request is that you split this PR into at least 3 PRs:

One with the trial filtering
One with the generation config sanitization
One with the changes expanding model support

This is a complex change, and I would like to discuss it in semantically distinct units.

As for your questions:

the package version should probably be bumped to 1.0.2 or so after this. do you want me to update the code or you have some process/want to edit yourself?

I intend to publish version 1.1.0 next week or so, incorporating all the improvements so far, hopefully including this one :) I will take care of the version bump after wrapping everything up.

you can get slightly better results for lfm2 models with even more dynamic/updated value ranges. check out that gist, but I decided not to include in this PR as it makes things a more complex and can impact search space time/quality for standard models

In principle, it makes sense to have component-dependent ranges for all parameter types if you have them for some of them. However, in my experience it's very difficult to determine what ranges are good to begin with. I've done well over a hundred full optimization runs and I'm still not sure about some of them.

i have a list of ideas/nice-to-haves that i thought of when implementing this PR+standalone usage, want me to create an issue or how best to share?

I'm excited to hear those ideas! Please file issues to discuss them.

One important consideration is that as the number of components (and thus parameter sets) exceeds 2, the number of trials needed to converge to a good result also increases. The folklore rule of thumb for TPE is that you should have at least 10-20 trials per dimension, which matches my experience for this optimization problem. While I don't think that n_trials should be increased automatically, it's probably a good idea to warn the user if n_components > 2 and n_trials <= 200.

p-e-w · 2025-11-24T01:57:35Z

src/heretic/config.py

        description="Dataset of prompts that tend to result in refusals (used for evaluating model performance).",
    )

+    show_all_trials: bool = Field(


What is the reason for this change? Were there too many trials on the Pareto front?

(responding to comments as shown top to bottom on GitHub mobile, this is the first one)

This is intended as a fallback option to a filtering option I added for KL divergence of results that could be returned (iirc prior to presentation as "here's the KL div:refusals Pareto optimal choices"). Some details

this has to do with strange phenomena (described below) that I was encountering with the LFM2 models, not any defaults

I added a filter to exclude trials with KL divergence > 2.0 prior to any consideration of what appears in the result list/table

the --show_all_trials flag mainly exists to revert this ^ behavior if desired and return to default

on strange conv optimization

Models with conv layers, specifically LFM2, are hard to optimize past a certain 'minimum threshold' of refusals. Past, say, 30-40 refusals/100 the KL divergence will spike dramatically, and the tradeoff KL div/refusals becomes much worse:

for trials that achieve < 30 refusals, for LFM2-2.6B (and the next smallest one) the KL divergence will become 2, 5, sometimes 10+ towards the < 5/100 refusals mark

it does not seem to be a "number of trials" problem. I tried running for much longer, didn't seem to make much, if any, progress

other than "alternative"/hybrid layers included in this PR (mamba) do not exhibit this behavior and can be optimized and Abliterated just fine

my current best theory on why is that the assumption of "refusal is a single direction" is not true, or fully true, in conv layers and some
Of the model's refusal/alignment does in fact 'live there'

Long story short: with hybrid models/conv layers (LFM2) it was common to see KL of 2+ in trial list. I filtered those out by default (destroy model too much) and added this flag in case someone doesn't want to do that/original behavior

p-e-w · 2025-11-24T01:58:54Z

src/heretic/main.py

                f"{component}.min_weight",
-                0.0,
-                1.0,
+                min_weight_fraction_range[0],


👍 min_weight_fraction is a better name than min_weight here, we should probably change the variable name itself as well.

p-e-w · 2025-11-24T02:02:48Z

src/heretic/main.py

-        key=lambda trial: trial.user_attrs["refusals"],
+        key=lambda trial: (
+            trial.user_attrs["refusals"],
+            max(trial.user_attrs["kl_divergence"], 0),


KL divergences cannot be negative by construction. Did you encounter negative values? If so, that would indicate a serious bug in KLD computation.

Agreed. This can go, iirc was a safeguard/overkill ai suggestion that I kept in

p-e-w · 2025-11-24T02:05:48Z

src/heretic/main.py

+                max_weight_range = (0.8, 2.0)
+                min_weight_fraction_range = (0.0, 0.9)
+            else:
+                max_weight_range = (0.8, 2.5)


Doesn't this case include the MLP projections? Based on my tests, 2.5 is much higher than needed. Did you see different behavior?

I wonder if those ranges should be moved into get_abliterable_compoents, to keep this code component-agnostic. Better yet, they should probably be in a map in the configuration.

I'll give you more info on this later (flight about to take off) but the 2.5/much larger range is to help with hybrid layers only (or at least that is my intention), and specifically conv layer/hybrid weird/different behavior (detailed in long comment on show_all_trials)

p-e-w · 2025-11-24T02:06:04Z

src/heretic/main.py

-                f"KL divergence: {trial.user_attrs['kl_divergence']:.2f}"
-            ),
-            value=trial,
+    kl_threshold = 2.0


This should be configurable.

p-e-w · 2025-11-24T02:15:25Z

src/heretic/model.py


        def try_add(component: str, matrix: Any):
+            if matrix is None:
+                return


When does that happen?

p-e-w · 2025-11-24T02:16:01Z

src/heretic/model.py


-            assert torch.is_tensor(matrix)
+            if not torch.is_tensor(matrix):
+                return


Why did you change this? We suppress exceptions everywhere now, right?

p-e-w · 2025-11-24T02:16:30Z

src/heretic/model.py

+        with suppress(Exception):
+            try_add("attn.o_proj", layer.self_attn.o_proj.weight)
+        with suppress(Exception):
+            try_add("attn.o_proj", layer.self_attn.out_proj.weight)


Which models use that name?

LFM2/LFM2-VL (one of them) does this

p-e-w · 2025-11-24T02:18:30Z

src/heretic/model.py

        # Granite MoE Hybrid - attention layers with shared_mlp.
        with suppress(Exception):
-            try_add("mlp.down_proj", layer.shared_mlp.output_linear.weight)
+            try_add("mlp.shared_down_proj", layer.shared_mlp.output_linear.weight)


Why the name distinction? I aliased multiple naming conventions together to simplify configuration and optimization. The assumption was that semantically related components would react similarly to ablation. Did you observe something different?

p-e-w · 2025-11-24T02:25:17Z

src/heretic/model.py

+                try_add("mlp.down_proj", expert.down_proj.weight)
+        with suppress(Exception):
+            for expert in layer.feed_forward.experts:
+                try_add("mlp.down_proj", expert.w2.weight)


Which models use down_proj and which ones use w2?

pszemraj · 2025-11-25T03:38:19Z

fair enough! let me take this back and rework. I appreciate all the comments, might take me a few days to get to reworking this and addressing them it but I will

pszemraj · 2025-12-03T00:17:26Z

(Finally getting a chance to respond to some comments etc. doing that on here for the sake of centralization/continuity but will split the work/PR itself into 3+ PRs as suggested)

p-e-w · 2025-12-03T01:50:45Z

Sounds great! Just a heads up, there is now #60, which touches a lot of the same code as this PR, so this might get a bit messy. I'll merge whichever change is ready first, and then the other one will have to be adapted accordingly.

pszemraj · 2025-12-03T08:37:15Z

no problem! due to time constraints/just getting back from holiday travels, happy to let 60 get merged and play clean up crew. I'll target this weekend-ish for the first 1-2 PRs you discussed (that are still relevant after that one)

pszemraj · 2025-12-03T08:38:06Z

no problem! due to time constraints/just getting back from holiday travels, happy to let 60 get merged and play clean up crew. I'll target this weekend-ish for the first 1-2 PRs you discussed (that are still relevant after that one)

accemlcc · 2025-12-04T11:34:58Z

no problem! due to time constraints/just getting back from holiday travels, happy to let 60 get merged and play clean up crew. I'll target this weekend-ish for the first 1-2 PRs you discussed (that are still relevant after that one)

Thanks, I really appreciate it! Looking forward to seeing your hybrid/Mamba support land on top of this. :)

…thod - Add explanatory comments for warning suppression and gc behavior - Remove redundant gc.collect() calls (empty_cache handles it) - Fix output message order (ask merge strategy before 'Uploading...') - Add comment explaining 8-bit quantization doesn't need compute_dtype - Remove extra newline after dtype comment - Add future-proofing note for hybrid layer support (p-e-w#43) - Remove leftover comment in get_merged_model - Delete test_lora.py (debug script, not a real test) - Add comment explaining needs_reload flag purpose - Extract quantization config into _get_quantization_config() helper - Rename reload_model() to reset_model_for_trial() for clarity - Fix reload_model to respect quantization config (fixes evaluate_model bug) - Remove unused gc import

* Add files via upload * perf: optimize abliteration matrix op (#46) * perf: optimize abliteration matrix op * refactor: comments and var names correspond with arditi * refactor: fix comments and improve var notation * fix: accidental line change and improve comments --------- Co-authored-by: mad-cat-lon <113548315+mad-cat-lon@users.noreply.github.com> * Fix line endings to LF * Add hybrid approach for GPT-OSS compatibility - Check for LoRA adapters before attempting LoRA abliteration - Fall back to direct weight modification for nn.Parameter (GPT-OSS) - Ensures compatibility across all model architectures * Fix projector bug, update print statement, revert README * Revert README changes to match upstream * Fix import sorting for ruff * Fix reload_model for evaluate_model, add type hints and validation * Apply ruff formatting * Replace load_in_4bit with quantization enum * Fix precision loss: use FP32 refusal direction directly * Move r assignment into non-LoRA path * Fix linting: apply ruff formatting * Add auto-merge for LoRA adapters on save/upload * Fix linting: apply ruff formatting * Implement CPU-based merge for 4-bit models with OOM fallback * Remove use_lora flag (LoRA always on), add user prompt for 4-bit export * Fix: PEFT target_modules expects module names without path prefix * Fix linting: apply ruff formatting * Add LoRA fallback and fix quantization_config handling - Add try/except around LoRA initialization with fallback to direct weight modification - Only pass quantization_config when not None (fixes gpt-oss loading) - Use simple forward pass instead of generate() for model test (avoids chat template issues) - Reset non-LoRA models by reloading in reload_model() - Check self.use_lora before accessing LoRA adapters in abliterate() * Add 8-bit quantization support via bitsandbytes - Add BNB_8BIT option to QuantizationMethod enum - Add --load-in-8bit CLI support (auto via pydantic-settings) - Update documentation in config.py and config.default.toml - Useful for mid-range VRAM (12-16 GB) as balance between memory and numeric stability * Improve LoRA merge warning and fix linting * Apply final ruff formatting * Fix CI: apply ruff import sorting * Use tiny model for CI efficiency * Fix import sorting in test_lora.py * Fix formatting in test_lora.py * feat: Show merge warning for all models (requires high RAM) * style: Apply ruff fixes * Fix undefined Style import in main.py * Fix(model): Support MoE/3D tensors and enforce dtype safety in abliterate * Fix(ci): Format model.py with ruff * Fix(main): Remove invalid style argument from prompt_select and unused import * Fix logic errors, memory leak, and redundant merges in main.py * Fix linting and formatting issues (isort, ruff) * chore: Simplify .gitattributes as requested * refactor: Remove defensive try-except around LoRA initialization * chore: Update uv.lock with peft and bitsandbytes * chore: Regenerate uv.lock to include missing peft dependency * style: Fix import sorting (isort) for CI compliance * style: Simplify .gitattributes to single line as requested * Address PR #60 feedback: Remove caching, fix LoRA reload, global LoRA usage, style fixes * Address PR review comments: clarify code, fix quantization, rename method - Add explanatory comments for warning suppression and gc behavior - Remove redundant gc.collect() calls (empty_cache handles it) - Fix output message order (ask merge strategy before 'Uploading...') - Add comment explaining 8-bit quantization doesn't need compute_dtype - Remove extra newline after dtype comment - Add future-proofing note for hybrid layer support (#43) - Remove leftover comment in get_merged_model - Delete test_lora.py (debug script, not a real test) - Add comment explaining needs_reload flag purpose - Extract quantization config into _get_quantization_config() helper - Rename reload_model() to reset_model_for_trial() for clarity - Fix reload_model to respect quantization config (fixes evaluate_model bug) - Remove unused gc import * Restore gc.collect() before empty_cache() for large models * refactor: Remove LoRA fallback remnants, simplify code - Remove use_lora flag (always true since LoRA is always applied) - Remove isinstance(PeftModel) check in get_merged_model() (always true) - Simplify reset_model_for_trial() by removing defensive try/except - Remove redundant gc.collect() calls (empty_cache handles GC) - Remove unused gc import from main.py * Address p-e-w review feedback: rename reset_model, remove loaded_model_name, fix type hints, remove GPT-OSS MoE, update assertion * Restore skip logic for non-LoRA modules and fix 4-bit base_layer.weight access * Remove defensive lora_A check per review - get_layer_modules already filters * Fix try_add: nest component init inside Module check, add assert for unexpected types * Add note about module.weight assumption for type checking * Change 'Reloading model' to 'Resetting model' in logging --------- Co-authored-by: accemlcc <accemlcc@users.noreply.github.com> Co-authored-by: mad-cat-lon <113548315+mad-cat-lon@users.noreply.github.com> Co-authored-by: Hager <Michael.Hager@bruker.com>

p-e-w · 2025-12-14T14:55:15Z

@pszemraj

#60 has been merged, and the way is now clear for the changes from this PR whenever you're ready.

pszemraj · 2025-12-16T05:32:40Z

great! I did start taking a look, in the process of that though/incidentally testing out v1.1.0 I realized that there is a (hopefully minor) consistency issue and opened #88 which IMO should be handled first (otherwise, I can't reliably tell you whether parameter ranges need to be expanded for conv-hybrid models like LFM2, so on)

other quick comments:

let me know if you have a preference: I was thinking we can do any interim discussion here, but I will open new PRs for the ~3 separate ones discussed, then close this one after
some of the checks/helpers as 'minor' prs can happen soon/independent of Longer optimization runs can produce worse Pareto fronts than shorter runs #88 , but figure you might want to handle that asap so spent the time writing that issue first. then this can (finally) get moving!

p-e-w · 2025-12-16T09:59:00Z

I was thinking we can do any interim discussion here, but I will open new PRs for the ~3 separate ones discussed, then close this one after

Sounds perfect!

p-e-w · 2025-12-29T12:17:43Z

/gemini summary

gemini-code-assist · 2025-12-29T12:18:09Z

Summary of Changes

This pull request significantly extends the model compatibility of the system by introducing robust support for hybrid layer architectures, including those with convolution and Mamba components. It refines the process of identifying abliterable components across various complex model structures and dynamically adjusts optimization parameters to suit these new layer types. Furthermore, it enhances the user experience by providing more insightful trial selection options, including filtering based on KL divergence, and ensures smoother model saving and uploading through automatic generation configuration sanitization.

Highlights

Expanded Model Support: Added compatibility for hybrid layer models, including convolution and Mamba layers, enabling support for models like Granite-4.0 and LFM2.
Enhanced Component Detection: Improved the get_layer_matrices function to identify a broader range of model components, such as LFM/LFM-MoE projections, Granite hybrid/shared MLPs, Mixtral/JetMoE tensors, and Mamba/Conv layers, while ensuring resilience when layers lack matrices.
Dynamic Optuna Search Ranges: Introduced component-specific parameter ranges for max_weight and min_weight in Optuna trials, optimizing the search for diverse layer types.
Improved Trial Selection UI: Enhanced the trial selection menu by sorting results first by refusals, then by KL divergence, and added an option (--show-all-trials) to include trials with high KL divergence (above 2.0) that are otherwise filtered by default.
Generation Config Sanitization: Implemented a sanitize_generation_config method to automatically adjust generation parameters (e.g., do_sample) and validate the configuration before saving or uploading models, preventing potential failures.
Detailed Abliteration Statistics: Added comprehensive printouts during abliteration, detailing the number of abliterated layers, skipped layers (due to no matrices or being outside configured ranges), and modified matrices per component.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

src/heretic/config.py
- Added show_all_trials boolean setting to control filtering of high-KL divergence trials in the selection menu.
src/heretic/main.py
- Introduced conditional logic to set max_weight_range and min_weight_fraction_range for Optuna trials based on component type (e.g., "conv", "mamba", "attn", "shared").
- Modified the sorting key for best_trials to prioritize refusals then kl_divergence.
- Implemented filtering of trials with kl_divergence greater than 2.0 by default, with an option to disable this filtering via settings.show_all_trials.
- Added a print statement to inform the user about filtered trials.
- Called model.sanitize_generation_config() before save_pretrained and push_to_hub methods to ensure valid generation configurations.
src/heretic/model.py
- Imported collections.Counter for counting layer types.
- Enhanced the model initialization output to display the total number of layers, a breakdown of layer types (attention, mamba, conv, hybrid, unknown), and total abliterable matrices per component.
- Added sanitize_generation_config method to automatically set do_sample=True if sampling parameters are present and validate the generation config.
- Implemented get_layer_type method to determine if a layer is 'attention', 'mamba', 'conv', 'hybrid', or 'unknown' based on its attributes.
- Expanded get_layer_matrices to detect and add matrices from LFM/LFM-MoE (feed_forward.w2, feed_forward.output_linear, feed_forward.experts), Granite hybrid (shared_mlp.output_linear), Mixtral/JetMoE (block_sparse_moe.output_linear), LFM-style conv (conv.out_proj), and Mamba/SSM (mamba.out_proj) layers.
- Modified try_add helper to handle None and non-tensor inputs gracefully.
- Added self_attn.out_proj.weight as an alternative to self_attn.o_proj.weight for attention output projections.
- Updated get_abliterable_components to collect components from all layers, ensuring a comprehensive list.
- Refactored the abliterate method to remove redundant calculations of layer_refusal_direction and projector within the inner loop.
- Added detailed abliteration statistics printouts, including counts for abliterated layers, skipped layers (no matrices or outside ranges), and modified matrices per component.

Activity

The author, pszemraj, initially suggested bumping the package version, discussed alternative parameter ranges for LFM2 models, and proposed creating separate issues for future ideas.
There was a discussion about splitting this large PR into multiple smaller ones, which the author agreed to.
A conflicting PR (feat: Add 4-bit loading + LoRA support for low VRAM optimization #60) was identified, and it was decided that feat: Add 4-bit loading + LoRA support for low VRAM optimization #60 would be merged first, with this PR being rebased afterward.
After feat: Add 4-bit loading + LoRA support for low VRAM optimization #60 was merged, pszemraj opened a new issue (Longer optimization runs can produce worse Pareto fronts than shorter runs #88) concerning a consistency problem, suggesting it should be addressed before proceeding with this PR's changes.
Review comments included discussions on the show_all_trials flag, the behavior of LFM2 models with high KL divergence, variable naming conventions, the non-negativity of KL divergence, and the rationale behind specific parameter ranges and component naming.
The author provided clarifications on the purpose of the show_all_trials flag (to filter high KL divergence trials for conv layers like LFM2) and confirmed that LFM2 models use attn.out_proj.

p-e-w · 2025-12-29T12:19:01Z

/gemini review

gemini-code-assist

Code Review

This pull request significantly extends the tool's capabilities by adding support for hybrid models with mamba and convolution layers, such as Granite and LFM. The changes are well-structured and include improved component detection, tuned hyperparameter search ranges, and better user experience in trial selection. I've provided a few suggestions to enhance code maintainability and fix a potential bug in how model components are detected within lists of experts. Overall, this is a great contribution that makes the tool more powerful and robust.

gemini-code-assist · 2025-12-29T12:21:29Z

src/heretic/model.py

        with suppress(Exception):
            for expert in layer.moe.experts:
                try_add("mlp.down_proj", expert.output_linear.weight)


The with suppress(Exception) block is outside the for loop. If an exception occurs for one of the experts (e.g., it has an unexpected structure), the entire loop will be aborted, and subsequent experts won't be processed. This could lead to an incomplete list of abliterable components.

To make this more robust, the suppress block should be moved inside the loop to handle exceptions on a per-expert basis. This pattern appears in other parts of this method when iterating over experts, and those should be updated similarly for robustness.

Suggested change

with suppress(Exception):

for expert in layer.moe.experts:

try_add("mlp.down_proj", expert.output_linear.weight)

with suppress(Exception):

for expert in layer.moe.experts:

with suppress(Exception):

try_add("mlp.down_proj", expert.output_linear.weight)

gemini-code-assist · 2025-12-29T12:21:29Z

src/heretic/model.py

+            for expert in layer.feed_forward.experts:
+                try_add("mlp.down_proj", expert.down_proj.weight)
+        with suppress(Exception):
+            for expert in layer.feed_forward.experts:
+                try_add("mlp.down_proj", expert.w2.weight)


These two loops are inefficient and have the same issue I mentioned in another comment: the suppress(Exception) is outside the loop, making it not robust to failures with individual experts.

You can combine these into a single, more robust loop. This avoids iterating over the experts twice and ensures that if one attribute access fails for an expert, you still attempt to access the other.

Suggested change

for expert in layer.feed_forward.experts:

try_add("mlp.down_proj", expert.down_proj.weight)

with suppress(Exception):

for expert in layer.feed_forward.experts:

try_add("mlp.down_proj", expert.w2.weight)

with suppress(Exception):

for expert in layer.feed_forward.experts:

with suppress(Exception):

try_add("mlp.down_proj", expert.down_proj.weight)

with suppress(Exception):

try_add("mlp.down_proj", expert.w2.weight)

gemini-code-assist · 2025-12-29T12:21:29Z

src/heretic/main.py

+            if "conv" in component or "mamba" in component:
+                max_weight_range = (0.8, 2.5)
+                min_weight_fraction_range = (0.0, 0.8)
+            elif "attn" in component:
+                max_weight_range = (0.8, 1.8)
+                min_weight_fraction_range = (0.0, 1.0)
+            elif "shared" in component:
+                max_weight_range = (0.8, 2.0)
+                min_weight_fraction_range = (0.0, 0.9)
+            else:
+                max_weight_range = (0.8, 2.5)
+                min_weight_fraction_range = (0.0, 0.9)


This if/elif/else block for setting parameter ranges contains some redundancy. You can make it more concise and maintainable by setting default values first and then overriding them only where they differ in the conditional branches. This makes the logic for the default case more explicit and reduces repeated code.

Suggested change

if "conv" in component or "mamba" in component:

max_weight_range = (0.8, 2.5)

min_weight_fraction_range = (0.0, 0.8)

elif "attn" in component:

max_weight_range = (0.8, 1.8)

min_weight_fraction_range = (0.0, 1.0)

elif "shared" in component:

max_weight_range = (0.8, 2.0)

min_weight_fraction_range = (0.0, 0.9)

else:

max_weight_range = (0.8, 2.5)

min_weight_fraction_range = (0.0, 0.9)

# Default ranges, also used for MLP components

max_weight_range = (0.8, 2.5)

min_weight_fraction_range = (0.0, 0.9)

if "conv" in component or "mamba" in component:

min_weight_fraction_range = (0.0, 0.8)

elif "attn" in component:

max_weight_range = (0.8, 1.8)

min_weight_fraction_range = (0.0, 1.0)

elif "shared" in component:

max_weight_range = (0.8, 2.0)

gemini-code-assist · 2025-12-29T12:21:29Z

src/heretic/model.py

+        sampling_fields = []
+        if getattr(gen_config, "temperature", None) not in (None, 1.0):
+            sampling_fields.append("temperature")
+        if getattr(gen_config, "top_k", None) not in (None, 0):
+            sampling_fields.append("top_k")
+        if getattr(gen_config, "top_p", None) not in (None, 1.0):
+            sampling_fields.append("top_p")
+        if getattr(gen_config, "typical_p", None) not in (None, 1.0):
+            sampling_fields.append("typical_p")
+
+        if sampling_fields and not bool(getattr(gen_config, "do_sample", False)):
+            gen_config.do_sample = True


The logic to check for sampling parameters can be made more concise. Instead of building a list of fields just to check if it's non-empty, you can use a boolean expression directly. This improves readability and is slightly more efficient.

Suggested change

sampling_fields = []

if getattr(gen_config, "temperature", None) not in (None, 1.0):

sampling_fields.append("temperature")

if getattr(gen_config, "top_k", None) not in (None, 0):

sampling_fields.append("top_k")

if getattr(gen_config, "top_p", None) not in (None, 1.0):

sampling_fields.append("top_p")

if getattr(gen_config, "typical_p", None) not in (None, 1.0):

sampling_fields.append("typical_p")

if sampling_fields and not bool(getattr(gen_config, "do_sample", False)):

gen_config.do_sample = True

has_sampling_params = (

getattr(gen_config, "temperature", 1.0) not in (None, 1.0)

or getattr(gen_config, "top_k", 0) not in (None, 0)

or getattr(gen_config, "top_p", 1.0) not in (None, 1.0)

or getattr(gen_config, "typical_p", 1.0) not in (None, 1.0)

)

if has_sampling_params and not getattr(gen_config, "do_sample", False):

gen_config.do_sample = True

pszemraj added 6 commits November 22, 2025 23:54

feat: support hybrid stacks in abliteration with adaptive layer handling

c8c2238

Signed-off-by: Peter Szemraj <74869040+pszemraj@users.noreply.github.com>

filter extreme (>2) KL trials from list by default w option show_all_…

05e0571

…trials Signed-off-by: Peter Szemraj <74869040+pszemraj@users.noreply.github.com>

clarify

7090d65

Signed-off-by: Peter Szemraj <74869040+pszemraj@users.noreply.github.com>

fix: sanitize generation config before save/push

33a7fa2

- Auto-enable do_sample when sampling fields (temperature/top_k/top_p/typical_p) present - Validate config to prevent HF validation errors on upload Signed-off-by: Peter Szemraj <74869040+pszemraj@users.noreply.github.com>

pszemraj mentioned this pull request Nov 23, 2025

Support for other granite-4 models #34

Open

p-e-w reviewed Nov 24, 2025

View reviewed changes

p-e-w mentioned this pull request Nov 25, 2025

perf: optimize abliteration matrix op #46

Merged

spikymoth mentioned this pull request Nov 26, 2025

Implement Magnitude-Preserving Orthogonal Ablation #52

Merged

p-e-w mentioned this pull request Dec 3, 2025

feat: Add 4-bit loading + LoRA support for low VRAM optimization #60

Merged

p-e-w mentioned this pull request Dec 22, 2025

Project Vision/Roadmap #91

Open

19 tasks

gemini-code-assist bot reviewed Dec 29, 2025

View reviewed changes

Conversation

pszemraj commented Nov 23, 2025

Uh oh!

pszemraj commented Nov 23, 2025

Uh oh!

p-e-w left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pszemraj Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

on strange conv optimization

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pszemraj commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pszemraj commented Dec 3, 2025

Uh oh!

p-e-w commented Dec 3, 2025

Uh oh!

pszemraj commented Dec 3, 2025

Uh oh!

pszemraj commented Dec 3, 2025

Uh oh!

accemlcc commented Dec 4, 2025

Uh oh!

p-e-w commented Dec 14, 2025

Uh oh!

pszemraj commented Dec 16, 2025

Uh oh!

p-e-w commented Dec 16, 2025

Uh oh!

p-e-w commented Dec 29, 2025

Uh oh!

gemini-code-assist bot commented Dec 29, 2025

Summary of Changes

Highlights

Uh oh!

p-e-w commented Dec 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

pszemraj Dec 3, 2025 •

edited

Loading

pszemraj commented Nov 25, 2025 •

edited

Loading