Fuse LlamaRMSNorm class to Circle RMSNorm op#266
Conversation
This commit fuse LlamaRMSNorm class to Circle RMSNorm op TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com
This commit fixes format with lint. TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com
|
|
||
|
|
||
| def CircleRMSNorm(): | ||
| @custom_op("circle_custom::rms_norm", mutates_args=()) |
There was a problem hiding this comment.
rms_norm is circle builtin-op. I think circle::rms_norm is enough. In my op_attention case, @jinevening preferred onert prefix. I don't know the clear rule. Maybe new op which did not exist in tflite and if it is going to run in cpu backend (not triv npu), it is onert. @jinevening Is it right? What prefix do you prefer for rms_norm?
There was a problem hiding this comment.
I've followed the name of other custom op like instance_norm. It's also circle builtin-op.
There was a problem hiding this comment.
Again, instance norm is not a custom op. I guess someone wanted to distinguish circle-only op from tflite-circle-common op. (why? 🤔)
There was a problem hiding this comment.
In register_custom_op.py
def CircleInstanceNorm():
@custom_op("circle_custom::instance_norm", mutates_args=())
def instance_norm(
input_: torch.Tensor,
weight: Optional[torch.Tensor] = None,
bias: Optional[torch.Tensor] = None,
running_mean: Optional[torch.Tensor] = None,
running_var: Optional[torch.Tensor] = None,
use_input_stats: bool = False,
momentum: float = 0.1,
eps: float = 1e-05,
cudnn_enabled: bool = False,
) -> torch.Tensor:
NHWC_to_NCHW = [0, 3, 1, 2]
NCHW_input = torch.ops.aten.permute.default(input_, NHWC_to_NCHW)
args = [NCHW_input, weight, bias, None, None, False, momentum, eps, False]
NCHW_output = torch.ops.aten.instance_norm.default(*args)
NCHW_to_NHWC = [0, 2, 3, 1]
NHWC_output = torch.ops.aten.permute.default(NCHW_output, NCHW_to_NHWC)
return NHWC_output
......There was a problem hiding this comment.
@seockho-kim I already understood — some TICO developer wants to define circle built-in op InstanceNorm as custom in TICO's view. I am wondering why? If any (though I don't find) reason to distinguish them, circle_ext would be better one in my personal view, which is not confused with other custom_op in circle_schema.
There was a problem hiding this comment.
Well, I don't have any idea why it is named like that. :)
I agree circle_custom is a little confusing with custom_op in circle_schema.
There was a problem hiding this comment.
Again, instance norm is not a custom op. I guess someone wanted to distinguish circle-only op from tflite-circle-common op. (why? 🤔)
There are tflite-circle-common Ops too (circle_custom.conv2d, circle_custom.maxpool2d, ..).
circle_custom is just a namespace for circle Ops. It would be ok to change the namespace to circle as you suggested (not in this PR) @mhs4670go AFAIK, you made circle_custom. Is it ok to change?
There was a problem hiding this comment.
Sure. I added _custom prefix because this is related with torch "custom" operator creation. Just torch.ops.circle looks good as well. Feel free to change them in another PR.
|
@seockho-kim Why do you want to fuse rmsnorm? For npu compiler? onert? or something else? |
For npu compiler, |
This applies review comments - It uses contextmanager - Useless format change removed - Custom RMSNorm Args defined. - register_dynamic_cache() added. TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com
There was a problem hiding this comment.
How about moving these patcher under tico project, not inside test directory?
This commit applies the review comments. - RMSNormCustomArgs is changed to CircleRMSNormArgs - Patcher is moved to tico utils from test. TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com
This commit fixes format error. TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com
This commit assigns exact transformers version. It cannot work with latest version(ex.4.53) TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com
| try: | ||
| yield | ||
| finally: | ||
| LlamaRMSNorm.forward = orig |
There was a problem hiding this comment.
I am not sure It is good idea to put patched_llama_rms_norm and related things in utils/patcher.py.
As the operators to fuse grow, patcher.py gets more and more dependencies.
First, models (not only modeling_llama, but also modeling_florence, modeling_something_else, ...).
Second, in same model (e.g. llama), there will be multiple ops to fuse (e.g. attention and so on).
It would be better to break these by operators.
Thus, in my implementation of fusing attention (#217), I put the attention related adapters in op_attention.py.
There was a problem hiding this comment.
Yes, it may be complicated if we need to support other ops.
I've referred to your attention implementation, but I'm not sure it is a good way to include an adapter in op code.
The op code has dependency with model code, so I thought it need to be separated.
There was a problem hiding this comment.
As we talked in offline, I'm going to move patcher to each operator, but separated files.
like tico/serialize/operators/adapters/adapter_rmsnorm.py
- Move patcher.py to serialize/operators/adapters/rmsnorm.py TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com
This adds __init__.py to adapter folder to make it a package. TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com
|
|
||
|
|
||
| @contextmanager | ||
| def patched_llama_rmsnorm(): |
There was a problem hiding this comment.
This is specific to llama model, so I think it would be better to rename the file to llama_rmsnorm.py.
There was a problem hiding this comment.
Okay, I'll update it.
There was a problem hiding this comment.
I thought rmsnorm.py will be used as a collection of adapters for several rmsnorms (including llama).
There was a problem hiding this comment.
At first, I thought like @glistening has explained.
But @jinevening 's suggestion would be good in terms of SRP.
| class TinyLlamaWithFusedRMSNorm(TestModuleBase): | ||
| def __init__(self): | ||
| super().__init__() | ||
| with patched_llama_rmsnorm(): |
There was a problem hiding this comment.
How can we patch multiple modules? For example, how can we patch both LlamaRMSNorm and LlamaAttention?
There was a problem hiding this comment.
Well, I think we can use same approach. (not tested)
@contextmanager
def patched_llama_modules():
with patched_llama_rmsnorm(), patched_llama_attention():
yield
class TinyLlamaWithFusedRMSNorm(TestModuleBase):
def __init__(self):
super().__init__()
with patched_llama_modules():
self.model = AutoModelForCausalLM.from_pretrained(
"Maykeye/TinyLLama-v0"
).to("cpu")This commit renamed rmsnorm adapter file to llama_rmsnorm.py TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com
| class TinyLlamaWithFusedRMSNorm(TestModuleBase): | ||
| def __init__(self): | ||
| super().__init__() | ||
| with patched_llama_rmsnorm(): |
There was a problem hiding this comment.
Hmm.. seems that this code doesn't work well. Because the with statement ends before exporting a module. I'll patch this code soon.
There was a problem hiding this comment.
Hmm, it really doesn't work, but I'm curious how it worked before.
FYI, #304 is another way to fuse rmsnorm and it works.


This shows how to fuse LlamaRMSNorm class to Circle RMSNorm operation.
TICO-DCO-1.0-Signed-off-by: Seockho Kim seockho.kim@samsung.com
Like #217,
it does not check patterns, but replaces LlamaRMSNorm with custom op.