Fuse LlamaRMSNorm class to Circle RMSNorm op by seockho-kim · Pull Request #266 · Samsung/TICO

seockho-kim · 2025-08-06T00:41:59Z

This shows how to fuse LlamaRMSNorm class to Circle RMSNorm operation.

TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com

Like #217,
it does not check patterns, but replaces LlamaRMSNorm with custom op.

This commit fuse LlamaRMSNorm class to Circle RMSNorm op TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com

This commit fixes format with lint. TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com

test/modules/model/TinyLlamaWithFusedRMSNorm/model.py

tico/serialize/operators/op_rmsnorm.py

tico/utils/register_custom_op.py

test/modules/model/TinyLlamaWithFusedRMSNorm/model.py

glistening · 2025-08-06T04:27:33Z

tico/utils/register_custom_op.py



+def CircleRMSNorm():
+    @custom_op("circle_custom::rms_norm", mutates_args=())


rms_norm is circle builtin-op. I think circle::rms_norm is enough. In my op_attention case, @jinevening preferred onert prefix. I don't know the clear rule. Maybe new op which did not exist in tflite and if it is going to run in cpu backend (not triv npu), it is onert. @jinevening Is it right? What prefix do you prefer for rms_norm?

I've followed the name of other custom op like instance_norm. It's also circle builtin-op.

Again, instance norm is not a custom op. I guess someone wanted to distinguish circle-only op from tflite-circle-common op. (why? 🤔)

In register_custom_op.py

def CircleInstanceNorm(): @custom_op("circle_custom::instance_norm", mutates_args=()) def instance_norm( input_: torch.Tensor, weight: Optional[torch.Tensor] = None, bias: Optional[torch.Tensor] = None, running_mean: Optional[torch.Tensor] = None, running_var: Optional[torch.Tensor] = None, use_input_stats: bool = False, momentum: float = 0.1, eps: float = 1e-05, cudnn_enabled: bool = False, ) -> torch.Tensor: NHWC_to_NCHW = [0, 3, 1, 2] NCHW_input = torch.ops.aten.permute.default(input_, NHWC_to_NCHW) args = [NCHW_input, weight, bias, None, None, False, momentum, eps, False] NCHW_output = torch.ops.aten.instance_norm.default(*args) NCHW_to_NHWC = [0, 2, 3, 1] NHWC_output = torch.ops.aten.permute.default(NCHW_output, NCHW_to_NHWC) return NHWC_output ......

@seockho-kim I already understood — some TICO developer wants to define circle built-in op InstanceNorm as custom in TICO's view. I am wondering why? If any (though I don't find) reason to distinguish them, circle_ext would be better one in my personal view, which is not confused with other custom_op in circle_schema.

Well, I don't have any idea why it is named like that. :)
I agree circle_custom is a little confusing with custom_op in circle_schema.

Again, instance norm is not a custom op. I guess someone wanted to distinguish circle-only op from tflite-circle-common op. (why? 🤔)

There are tflite-circle-common Ops too (circle_custom.conv2d, circle_custom.maxpool2d, ..).

circle_custom is just a namespace for circle Ops. It would be ok to change the namespace to circle as you suggested (not in this PR) @mhs4670go AFAIK, you made circle_custom. Is it ok to change?

Sure. I added _custom prefix because this is related with torch "custom" operator creation. Just torch.ops.circle looks good as well. Feel free to change them in another PR.

glistening · 2025-08-06T04:33:39Z

@seockho-kim Why do you want to fuse rmsnorm? For npu compiler? onert? or something else?

seockho-kim · 2025-08-06T05:02:55Z

@seockho-kim Why do you want to fuse rmsnorm? For npu compiler? onert? or something else?

For npu compiler,
and I'm trying to find a way to fuse rmsnorm for quantized model.

This applies review comments - It uses contextmanager - Useless format change removed - Custom RMSNorm Args defined. - register_dynamic_cache() added. TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com

tico/utils/validate_args_kwargs.py

dayo09 · 2025-08-06T06:04:58Z

test/modules/model/TinyLlamaWithFusedRMSNorm/model.py

How about moving these patcher under tico project, not inside test directory?

This commit applies the review comments. - RMSNormCustomArgs is changed to CircleRMSNormArgs - Patcher is moved to tico utils from test. TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com

This commit fixes format error. TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com

seockho-kim · 2025-08-06T09:51:16Z

Without Fusing

Fused RMSNorm

This commit assigns exact transformers version. It cannot work with latest version(ex.4.53) TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com

glistening · 2025-08-08T03:29:50Z

tico/utils/patcher.py

+    try:
+        yield
+    finally:
+        LlamaRMSNorm.forward = orig


I am not sure It is good idea to put patched_llama_rms_norm and related things in utils/patcher.py.

As the operators to fuse grow, patcher.py gets more and more dependencies.

First, models (not only modeling_llama, but also modeling_florence, modeling_something_else, ...).

Second, in same model (e.g. llama), there will be multiple ops to fuse (e.g. attention and so on).

It would be better to break these by operators.

Thus, in my implementation of fusing attention (#217), I put the attention related adapters in op_attention.py.

Yes, it may be complicated if we need to support other ops.
I've referred to your attention implementation, but I'm not sure it is a good way to include an adapter in op code.
The op code has dependency with model code, so I thought it need to be separated.

As we talked in offline, I'm going to move patcher to each operator, but separated files.
like tico/serialize/operators/adapters/adapter_rmsnorm.py

- Move patcher.py to serialize/operators/adapters/rmsnorm.py TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com

This adds __init__.py to adapter folder to make it a package. TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com

glistening

LGTM

jinevening · 2025-08-11T00:08:39Z

tico/serialize/operators/adapters/rmsnorm.py

+
+
+@contextmanager
+def patched_llama_rmsnorm():


This is specific to llama model, so I think it would be better to rename the file to llama_rmsnorm.py.

Okay, I'll update it.

I thought rmsnorm.py will be used as a collection of adapters for several rmsnorms (including llama).

At first, I thought like @glistening has explained.
But @jinevening 's suggestion would be good in terms of SRP.

jinevening · 2025-08-11T00:09:54Z

test/modules/model/TinyLlamaWithFusedRMSNorm/model.py

+class TinyLlamaWithFusedRMSNorm(TestModuleBase):
+    def __init__(self):
+        super().__init__()
+        with patched_llama_rmsnorm():


How can we patch multiple modules? For example, how can we patch both LlamaRMSNorm and LlamaAttention?

Well, I think we can use same approach. (not tested)

@contextmanager def patched_llama_modules(): with patched_llama_rmsnorm(), patched_llama_attention(): yield class TinyLlamaWithFusedRMSNorm(TestModuleBase): def __init__(self): super().__init__() with patched_llama_modules(): self.model = AutoModelForCausalLM.from_pretrained( "Maykeye/TinyLLama-v0" ).to("cpu")

@seockho-kim Yes, I think the same way.

This commit renamed rmsnorm adapter file to llama_rmsnorm.py TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com

jinevening

LGTM

glistening

LGTM

dayo09

LGTM

mhs4670go · 2026-01-28T07:55:18Z

test/modules/model/TinyLlamaWithFusedRMSNorm/model.py

+class TinyLlamaWithFusedRMSNorm(TestModuleBase):
+    def __init__(self):
+        super().__init__()
+        with patched_llama_rmsnorm():


Hmm.. seems that this code doesn't work well. Because the with statement ends before exporting a module. I'll patch this code soon.

Hmm, it really doesn't work, but I'm curious how it worked before.
FYI, #304 is another way to fuse rmsnorm and it works.

seockho-kim added 2 commits August 6, 2025 09:31

Fuse LlamaRMSNorm class to RMSNorm op

f68ca0a

This commit fuse LlamaRMSNorm class to Circle RMSNorm op TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com

Fix format with lint

1512e0d

This commit fixes format with lint. TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com

seockho-kim requested a review from a team August 6, 2025 00:41