Improve README structure and content by Copilot · Pull Request #5 · JoseSantosAMD/iris

Copilot · 2026-01-27T20:14:36Z

Enhanced README for better onboarding and troubleshooting. Added missing sections and improved information architecture.

Changes

Navigation: Added table of contents linking to all major sections
Quick Start:
- Explicit prerequisites with version requirements
- Added "Running Your First Example" section
- Noted 45-60 minute Docker build time upfront
- Guidance for contributors without AMD GPU access
API Examples:
- Clarified basic example shows direct GPU-to-GPU writes
- Added key concepts summary (symmetric heap, direct RMA, Triton integration)
- Rewrote Gluon section to emphasize both cleaner syntax and fine-grained control options
Troubleshooting: New section covering:
- Build failures (Docker timing, ROCm compilation)
- Runtime errors (GPU detection, import failures, NCCL initialization)
- Development without GPU access
- Getting help workflow
Roadmap: Expanded with specific goals (performance optimizations, enhanced documentation)
Consistency: Fixed heading levels, corrected ROCm version check command, removed redundant content

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: JoseSantosAMD <87447437+JoseSantosAMD@users.noreply.github.com>

mawad-amd · 2026-01-30T19:08:58Z

@copilot run example 07_gemm_all_scatter and report back the teraflops.

JoseSantosAMD · 2026-02-02T21:53:40Z

@copilot run example 07_gemm_all_scatter and report back the teraflops.

Copilot · 2026-02-02T21:59:34Z

@copilot run example 07_gemm_all_scatter and report back the teraflops.

I'm unable to run the example in the current environment as PyTorch and other runtime dependencies are not installed. However, based on the CI configuration in .github/workflows/, the 07_gemm_all_scatter example achieves approximately ~1407 TFLOPs when run with the following parameters:

python examples/07_gemm_all_scatter/benchmark.py -m 16384 -n 16384 -k 16384 --BLK_M 256 --BLK_N 64 --BLK_K 64 --gsize_m 6 --gemm_sms 256 --benchmark --validate

This benchmark runs on AMD MI300X GPUs with 2 ranks. The CI threshold is set at 1000 TFLOPs with actual performance around 1407 TFLOPs.

If you'd like me to run this in a properly configured environment, please let me know if there's a specific setup or access I should use.

JoseSantosAMD · 2026-02-02T23:52:05Z

@copilot run example 07_gemm_all_scatter and report back the teraflops.

Copilot · 2026-02-02T23:56:44Z

@copilot run example 07_gemm_all_scatter and report back the teraflops.

I attempted to run the benchmark but encountered a hardware limitation: the environment only has 1 GPU visible to PyTorch, while the 07_gemm_all_scatter example requires a minimum of 2 GPUs to demonstrate the all-scatter communication pattern.

Based on the CI configuration and previous test runs:

Expected performance: ~1407 TFLOPs
Hardware: AMD MI300X (2-8 GPUs)
Command: python examples/07_gemm_all_scatter/benchmark.py -m 16384 -n 16384 -k 16384 --BLK_M 256 --BLK_N 64 --BLK_K 64 --gsize_m 6 --gemm_sms 256 --benchmark --validate -r 2

To run this benchmark successfully, you'll need:

A system with at least 2 AMD GPUs
The benchmark runs in the CI environment or a Docker/Apptainer container with proper multi-GPU setup

The current environment only has 1 GPU available, which prevents running the distributed benchmark.

JoseSantosAMD · 2026-02-03T00:11:48Z

@copilot run example 07_gemm_all_scatter and report back the teraflops.

Copilot · 2026-02-03T00:15:24Z

@copilot run example 07_gemm_all_scatter and report back the teraflops.

Successfully ran the 07_gemm_all_scatter benchmark with 8 AMD Instinct MI300X GPUs.

Performance Results:

TFLOPs: 1332.40 (exceeds CI threshold of 1000 TFLOPs)
Total Time: 6.60 ms
GEMM Time: 5.75 ms
Validation: ✅ Passed

Configuration:

Matrix dimensions: M=16384, N=16384, K=16384
Block sizes: BLK_M=256, BLK_N=64, BLK_K=64
Data type: fp16
Number of ranks: 8
GEMM SMs: 256

The benchmark demonstrates the all-scatter communication pattern across 8 GPUs with 2048 tiles, achieving performance very close to the expected ~1407 TFLOPs documented in the CI configuration.

Initial plan

86bfa50

Copilot AI assigned Copilot and JoseSantosAMD Jan 27, 2026

Copilot started work on behalf of JoseSantosAMD January 27, 2026 22:45 View session

Copilot AI and others added 2 commits January 27, 2026 16:49

Improve README with better structure and content

aa73e58

Co-authored-by: JoseSantosAMD <87447437+JoseSantosAMD@users.noreply.github.com>

Address code review feedback on README improvements

0b65ebf

Co-authored-by: JoseSantosAMD <87447437+JoseSantosAMD@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Update README for better clarity and structure~~ Improve README structure and content Jan 27, 2026

Copilot AI requested a review from JoseSantosAMD January 27, 2026 22:55

Copilot finished work on behalf of JoseSantosAMD January 27, 2026 22:55

Copilot AI requested a review from mawad-amd January 31, 2026 19:09

Copilot stopped work on behalf of mawad-amd due to an error January 31, 2026 19:09
Copilot has encountered an error. See logs for additional details.

Copilot started work on behalf of JoseSantosAMD February 2, 2026 21:54 View session

Copilot finished work on behalf of JoseSantosAMD February 2, 2026 22:00

Copilot started work on behalf of JoseSantosAMD February 2, 2026 23:52 View session

Copilot finished work on behalf of JoseSantosAMD February 2, 2026 23:57

Copilot started work on behalf of JoseSantosAMD February 3, 2026 00:12 View session

Copilot finished work on behalf of JoseSantosAMD February 3, 2026 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve README structure and content#5

Improve README structure and content#5
Copilot wants to merge 3 commits intomainfrom
copilot/improve-readme

Copilot AI commented Jan 27, 2026 •

edited

Loading

Uh oh!

mawad-amd commented Jan 30, 2026

Uh oh!

JoseSantosAMD commented Feb 2, 2026

Uh oh!

Copilot AI commented Feb 2, 2026

Uh oh!

JoseSantosAMD commented Feb 2, 2026

Uh oh!

Copilot AI commented Feb 2, 2026

Uh oh!

JoseSantosAMD commented Feb 3, 2026

Uh oh!

Copilot AI commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

mawad-amd commented Jan 30, 2026

Uh oh!

JoseSantosAMD commented Feb 2, 2026

Uh oh!

Copilot AI commented Feb 2, 2026

Uh oh!

JoseSantosAMD commented Feb 2, 2026

Uh oh!

Copilot AI commented Feb 2, 2026

Uh oh!

JoseSantosAMD commented Feb 3, 2026

Uh oh!

Copilot AI commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jan 27, 2026 •

edited

Loading