Different results in ViT-B/16 and ViT-L/14@336px

Hello dear authors I have a little question about the model choice and parameter optimization.
I set "Human->Zombie",  here are the results:
a. ViT-B/16: just one epoch can achieve the good result like follows
![image](https://github.com/gwang-kim/DiffusionCLIP/assets/57710177/0e659c4f-fcb6-4a67-b5be-b5b66f05edfc)
b. ViT-L/14@336px: but when I use this one the results seem strange. I don't know whether it should be set different parameters to finetune the diffusion model. (left->right:1-5 epochs)
![image](https://github.com/gwang-kim/DiffusionCLIP/assets/57710177/789b7695-7729-4bf0-a7a8-fbd4cb3a38e2)
![image](https://github.com/gwang-kim/DiffusionCLIP/assets/57710177/e1a7b5d4-c5a2-4632-9dd7-f854df6a24fd)
![image](https://github.com/gwang-kim/DiffusionCLIP/assets/57710177/a11e9d73-f798-4fbf-9608-7f7d6f528187)
![image](https://github.com/gwang-kim/DiffusionCLIP/assets/57710177/94a85002-b922-48ba-bf90-978edbff0131)
![image](https://github.com/gwang-kim/DiffusionCLIP/assets/57710177/4cd2dc6b-8255-40c5-aff3-9a3f2e902f4d)





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different results in ViT-B/16 and ViT-L/14@336px #36

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Different results in ViT-B/16 and ViT-L/14@336px #36

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions