-
Notifications
You must be signed in to change notification settings - Fork 62
Feature: add t8_cmesh_get_mpicomm #1884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1884 +/- ##
==========================================
- Coverage 75.54% 75.53% -0.01%
==========================================
Files 103 103
Lines 19316 19313 -3
==========================================
- Hits 14592 14588 -4
- Misses 4724 4725 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
sandro-elsweijer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the implementation! Only minor comments
| if (t8_cmesh_is_committed (cmesh)) { | ||
| comm = t8_shmem_array_get_comm (cmesh->tree_offsets); | ||
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm)); | ||
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still use t8_cmesh_comm_is_valid here because of the straighforward naming
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); | |
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change was proposed because it felt clearer that way, since the term "valid" is kind of vague in this context. I would personally argue for endorsing the == spelling as the canonical way of checking this and deprecating the is_valid spelling, but if you guys feel otherwise I'm not against reverting this either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had another look concerning the is_valid function and there is an easily addressable problem:
The cmesh now saves the comm along with the mpirank and mpisize. Then afterwards, is_valid only checks if the already saved comm and the provided comm are the same. The mpisize and rank are not checked. This can lead to possible errors in other routines which create a new cmesh. For example in t8_cmesh_copy ().
This function takes an empty cmesh, a committed cmesh and a communicator. The new cmesh copies everything of the committed cmesh, including the mpisize and rank. Then the provided comm is checked with the new cmesh. In your case this would lead to a segfault, because the copy routine does not fill the comm of the new cmesh. But even if it filled the new one with the provided comm, the is_valid function would return true, even though it was never checked if the comm has the same mpisize and rank as the already saved rank and size. So is_valid is still necessary and should not only check if the comms are the same, but also if the comm is in alignment with the stored mpisize and rank.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that there is indeed room for inconsistency here, which could lead to subtle bugs, so this can't make it into production as-is.
A possible solution is effectively to harden the checks to have this bug class be catched by the debug mode. Although probably effective for debugging purposes, this approach does not prevent the bugs from appearing in the first place, thus implying a (hopefully minor) maintenance burden in the long run.
Another solution could be prevent the inconsistency from appearing by only storing the communicator which already holds all the relevant information. The rank and size could then be queried when necessary using MPI_Comm_rank / MPI_Comm_size. This solution requires a little bit more effort right now, but is arguably superior because it would completely eliminate this class of bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sandro-elsweijer While reading the forest implementation, it seems like this mpicomm+mpirank+mpisize pattern could lead to the same kind of issues on the forest side. Maybe this should be addressed in a separate PR then, and your solution is the best strategy for this PR. What do you think?
| if (!meta_info.pre_commit) { | ||
| T8_ASSERT (t8_cmesh_is_committed (cmesh_out)); | ||
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh_out, comm)); | ||
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh_out) == comm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still use t8_cmesh_comm_is_valid here, because of the naming
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh_out) == comm); | |
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm); |
| #endif | ||
|
|
||
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm)); | ||
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still use t8_cmesh_comm_is_valid here because of the straighforward naming
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); | |
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm); |
| cmesh->set_partition = cmesh_from->set_partition; | ||
| cmesh->set_partition_level = cmesh_from->set_partition_level; | ||
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm)); | ||
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still use t8_cmesh_comm_is_valid here because of the straighforward naming
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); | |
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm); |
| int ieclass; | ||
|
|
||
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm)); | ||
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still use t8_cmesh_comm_is_valid here because of the straighforward naming
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); | |
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm); |
| T8_ASSERT (t8_cmesh_is_committed (cmesh)); | ||
| } | ||
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm)); | ||
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still use t8_cmesh_comm_is_valid here because of the straighforward naming
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); | |
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm); |
| const t8_gloidx_t *offset_from, *offset_to; | ||
|
|
||
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm)); | ||
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still use t8_cmesh_comm_is_valid here because of the straighforward naming
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); | |
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm); |
|
|
||
| T8_ASSERT (t8_cmesh_is_committed (cmesh)); | ||
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm)); | ||
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still use t8_cmesh_comm_is_valid here because of the straighforward naming
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); | |
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm); |
| } | ||
| if (cmesh != NULL) { | ||
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm)); | ||
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still use t8_cmesh_comm_is_valid here because of the straighforward naming
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); | |
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm); |
sandro-elsweijer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check also the other cmesh creation routines like copy and bcast
| if (t8_cmesh_is_committed (cmesh)) { | ||
| comm = t8_shmem_array_get_comm (cmesh->tree_offsets); | ||
| T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm)); | ||
| T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had another look concerning the is_valid function and there is an easily addressable problem:
The cmesh now saves the comm along with the mpirank and mpisize. Then afterwards, is_valid only checks if the already saved comm and the provided comm are the same. The mpisize and rank are not checked. This can lead to possible errors in other routines which create a new cmesh. For example in t8_cmesh_copy ().
This function takes an empty cmesh, a committed cmesh and a communicator. The new cmesh copies everything of the committed cmesh, including the mpisize and rank. Then the provided comm is checked with the new cmesh. In your case this would lead to a segfault, because the copy routine does not fill the comm of the new cmesh. But even if it filled the new one with the provided comm, the is_valid function would return true, even though it was never checked if the comm has the same mpisize and rank as the already saved rank and size. So is_valid is still necessary and should not only check if the comms are the same, but also if the comm is in alignment with the stored mpisize and rank.
I am not familiar with these functions. What is precisely the expected behavior for each of them? |
Describe your changes here:
This PR adds the possibility to query a cmesh for its associated MPI communicator.
All these boxes must be checked by the AUTHOR before requesting review:
Documentation:,Bugfix:,Feature:,Improvement:orOther:.All these boxes must be checked by the REVIEWERS before merging the pull request:
As a reviewer please read through all the code lines and make sure that the code is fully understood, bug free, well-documented and well-structured.
General
Tests
If the Pull request introduces code that is not covered by the github action (for example coupling with a new library):
Scripts and Wiki
script/find_all_source_files.scpto check the indentation of these files.License
doc/(or already has one).