Skip to content

Conversation

@akshayrai
Copy link
Collaborator

@akshayrai akshayrai commented Oct 13, 2025

Summary

  • This PR introduces a performance optimization. The existing MD5-based partitioning strategy caused chunking and bootstrapping of large tables to be extremely slow and prone to connection timeouts.
  • The partitioning strategy has been updated to use CRC32 instead of MD5 for MySQL bootstrapping.
  • Benchmark results show a ~10x performance improvement together with few other optimizations as well (count star query from task 0 only, setting innodb_parallel_read_threads). For example, on a 0.64 GB table:
    • The previous MD5-based bootstrap took approximately 2 hours 20 minutes at a throughput of ~3 MB/s.
    • With CRC32, the bootstrap completed in ~20 minutes at ~30 MB/s throughput.
  • Further, CRC32() is significantly faster than MD5() in MySQL. For large tables, MD5 can easily become a CPU bottleneck.

Release notes

  • Need to ensure this is deployed when no existing bootstraps are running. Otherwise, it can lead to inconsistency in data.

Testing Done

  • Verified on a 0.64 GB MySQL table in the EI environment.
  • Detailed benchmark results are documented internally.

@akshayrai akshayrai marked this pull request as ready for review October 13, 2025 05:31
Copy link
Collaborator

@kanishkjaiswal2015 kanishkjaiswal2015 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@harshcum harshcum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@akshayrai akshayrai merged commit dde4650 into linkedin:master Oct 13, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants