Why is the model divided into smaller chunks, and how does this benefit inference? Does it assist with shared memory or reduce RAM usage? Any insights or suggestions would be appreciated. Additionally, is there a guide for best practices on this topic?