Skip to content

Optimize the search for byte sequences#26

Open
Xzonn wants to merge 1 commit intoRoadrunnerWMC:masterfrom
Xzonn:lz10
Open

Optimize the search for byte sequences#26
Xzonn wants to merge 1 commit intoRoadrunnerWMC:masterfrom
Xzonn:lz10

Conversation

@Xzonn
Copy link

@Xzonn Xzonn commented May 27, 2025

In certain cases, the end position of a repeated byte sequence may extend beyond the start position of the original data. Wikipedia:

It is not only acceptable but frequently useful to allow length-distance pairs to specify a length that actually exceeds the distance. As a copy command, this is puzzling: "Go back four characters and copy ten characters from that position into the current position". How can ten characters be copied over when only four of them are actually in the buffer? Tackling one byte at a time, there is no problem serving this request, because as a byte is copied over, it may be fed again as input to the copy command. When the copy-from position makes it to the initial destination position, it is consequently fed data that was pasted from the beginning of the copy-from position. The operation is thus equivalent to the statement "copy the data you were given and repetitively paste it until it fits". As this type of pair repeats a single copy of data multiple times, it can be used to incorporate a flexible and easy form of run-length encoding.

For example:

import ndspy._lzCommon

uncompressed = bytes.fromhex("00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00")
compressed_1 = ndspy._lzCommon.compress(uncompressed, 1, 0x1000, 18, False, False)[0]
print(len(compressed_1))
# 10

print(compressed_1.hex(" "))
# 1c 00 00 00 00 02 30 05 40 0b

The data length obtained by the original compression method is 10 bytes. However, after modifying the search method, the compressed data length can be reduced to 4 bytes:

compressed_2 = compressCommon(uncompressed, 1, 0x1000, 18, False, False)[0]
print(len(compressed_2))
# 4

print(compressed_2.hex(" "))
# 40 00 f0 00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant