Transformers with no external libraries, not even numpy for python.
Currently writing it in python for learning purposes and then will rewrite the same in C and then optimise it heavily with handwritten SIMD Assembly to get maximum CPU performance with the least traning time
Pushing the python framework, Currently beauty with no brains. In progress of writing the gradient decent.