This repository contains advanced Python scripts for simulating and analyzing matrix multiplication performance, with a focus on understanding cache hierarchy, tiling strategies, and computational acceleration techniques. The tools provide a detailed simulation of matrix multiplication that goes beyond traditional benchmarking by incorporating realistic memory access patterns and hardware-specific optimizations.
Matrix multiplication is a fundamental computational operation in many scientific computing, machine learning, and signal processing applications. The performance of this operation depends critically on:
- Memory hierarchy (L1, L2 caches, main memory)
- Data access patterns
- Computational acceleration techniques
- Tiling and parallel execution strategies
- Simulates matrix multiplication with a detailed memory hierarchy model
- Incorporates an accelerator for dot product operations
- Uses tiling to optimize cache utilization
- Tracks multiple performance metrics:
- Total computation time
- Memory access patterns (L1 and L2 cache)
- Operation count
- Accelerator offloading
- Configurable latency for different memory levels
- Probabilistic cache access model
- Support for various matrix sizes
- Detailed statistical analysis and visualization
- Generates log-log plot of computation time vs. matrix size
- Creates statistical visualizations (violin and box plots)
- Saves simulation results as JSON files for further analysis
- Implements a parallel tiling approach for matrix multiplication
- Simulates computation across different parallel execution factors
- Models memory hierarchy impact on performance
- Tracks computational metrics
- Configurable tile sizes
- Multiple parallel execution factors
- Detailed memory access simulation
- Cycle-accurate performance modeling
- Performance results for different matrix sizes and parallel factors
- Detailed metrics on memory access and computational intensity
Common parameters across both scripts include:
- Memory latency (L1, L2, main memory)
- Cache sizes
- Floating-point data size
- Operational timing characteristics
These simulation tools are valuable for:
- Computer architecture research
- Performance optimization studies
- Understanding cache and memory hierarchy impacts
- Developing efficient matrix multiplication strategies
- Python 3.7+
- NumPy
- Matplotlib
- Time module
- Clone the repository
- Install required dependencies
- Run the scripts directly or import functions for custom analysis
pip install numpy matplotlib
python latencyMatrixMultiplicationAcc.py
python latencyMatrixMultiplicationTilesParallel.py- Add more detailed accelerator models
- Implement more advanced tiling strategies
- Create visualization tools for performance analysis
- Support for different matrix types and sparsity
Contributions are welcome! Please submit pull requests or open issues to discuss potential improvements or extensions to the simulation toolkit.
MIT (2024)
Philippe Velha, University of Trento
Next4EXA: https://eurohpc-ju.europa.eu/net4exa-advancing-european-interconnect-hpc-and-ai-2024-12-13_en