From 9b93b1c93295ec863db9371eaa6615304a97b779 Mon Sep 17 00:00:00 2001 From: Arrsh Khusaria Date: Tue, 11 Nov 2025 06:19:51 +0530 Subject: [PATCH] Update index.md added C snippet for better understanding --- lesson_03/index.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/lesson_03/index.md b/lesson_03/index.md index 9895392..799b848 100644 --- a/lesson_03/index.md +++ b/lesson_03/index.md @@ -100,6 +100,15 @@ The loads are then done with widthq being negative. So on the first iteration [s mmsize is added to the negative widthq bringing it closer to zero. The loop condition is now jl (jump if less than zero). This trick means widthq is used as a pointer offset **and** as a loop counter at the same time, saving a cmp instruction. It also allows the pointer offset to be used in multiple loads and stores, as well as using multiples of the pointer offsets if needed (remember this for the assignment). +This is equivalent to the C code: +```C +uint8_t *end = src + width; +for (ptrdiff_t off = -width; off < 0; off += mmsize) { + uint8_t *ptr = end + off; + //operate on ptr +} +``` + **Alignment** In all our examples we have been using movu to avoid the topic of alignment. Many CPUs can load and store data faster if the data is aligned, i.e if the memory address is divisible by the SIMD register size. Where possible we try to use aligned loads and stores in FFmpeg using mova. @@ -199,4 +208,4 @@ pshufb m0, m1 ; shuffle m0 based on m1 Note that -1 for easy reading is used as the shuffle index to zero out the output byte: -1 as a byte is the 0b11111111 bitfield (two’s complement), and thus the MSB (0x80) is set. -[image1]: \ No newline at end of file +[image1]: