Skip to content

ref: support causal mask mode in flash attention#101

Closed
Connor1996 wants to merge 2 commits intoskyzh:mainfrom
Connor1996:codex/remove-flash-attention-debug-fill
Closed

ref: support causal mask mode in flash attention#101
Connor1996 wants to merge 2 commits intoskyzh:mainfrom
Connor1996:codex/remove-flash-attention-debug-fill

Conversation

@Connor1996
Copy link
Collaborator

Summary

  • add flash attention/quantized matmul extension implementation for week 2 tasks
  • wire attention path updates for causal-mask capable flash attention
  • remove a leftover debug sentinel write (-233.0) from src/extensions_ref/src/flash_attention.metal

Testing

  • pdm run test --week 2 --day 4 -- -k 'task_2_flash_attention_cpu_small and no_mask'
  • pdm run test --week 2 --day 4 -- -k task_3_flash_attention_gpu
    • runtime error: [metal::Device] Unable to load kernel flash_attention
    • causal case also hits ValueError: Invalid type str received in array initialization. in src/tiny_llm/attention.py:149

Signed-off-by: Connor1996 <zbk602423539@gmail.com>
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
@Connor1996 Connor1996 closed this Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant