If I understand correctly the base CLIP loader of ComfyUI loads only the text encoder part of the LLM models to generate the embeddings for the text conditions for the image/video models. Your node loads the full model (together with the LM head) to be able to also generate text. But that's a superset, containing also the text encoder.
Could we spare some memory swapping here by loading the model only once in full, and use it for both LLM tasks and condition encoding? Basically creating a new loader node which can provide CLIP node output aswell, and an another output which can be provided as an input to the LLM inferencing only variant of "Qwen_TE_LLM"?