Skip to content

Conversation

@dutkalex
Copy link
Contributor

Describe your changes here:
This PR adds the possibility to query a cmesh for its associated MPI communicator.

All these boxes must be checked by the AUTHOR before requesting review:

  • The PR is small enough to be reviewed easily. If not, consider splitting up the changes in multiple PRs.
  • The title starts with one of the following prefixes: Documentation:, Bugfix:, Feature:, Improvement: or Other:.
  • If the PR is related to an issue, make sure to link it.
  • The author made sure that, as a reviewer, he/she would check all boxes below.

All these boxes must be checked by the REVIEWERS before merging the pull request:

As a reviewer please read through all the code lines and make sure that the code is fully understood, bug free, well-documented and well-structured.

General

  • The reviewer executed the new code features at least once and checked the results manually.
  • The code follows the t8code coding guidelines.
  • New source/header files are properly added to the CMake files.
  • The code is well documented. In particular, all function declarations, structs/classes and their members have a proper doxygen documentation.
  • All new algorithms and data structures are sufficiently optimal in terms of memory and runtime (If this should be merged, but there is still potential for optimization, create a new issue).

Tests

  • The code is covered in an existing or new test case using Google Test.
  • The code coverage of the project (reported in the CI) should not decrease. If coverage is decreased, make sure that this is reasonable and acceptable.
  • Valgrind doesn't find any bugs in the new code. This script can be used to check for errors; see also this wiki article.

If the Pull request introduces code that is not covered by the github action (for example coupling with a new library):

  • Should this use case be added to the github action?
  • If not, does the specific use case compile and all tests pass (check manually).

Scripts and Wiki

  • If a new directory with source files is added, it must be covered by the script/find_all_source_files.scp to check the indentation of these files.
  • If this PR introduces a new feature, it must be covered in an example or tutorial and a Wiki article.

License

  • The author added a BSD statement to doc/ (or already has one).

@codecov
Copy link

codecov bot commented Sep 30, 2025

Codecov Report

❌ Patch coverage is 69.23077% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.53%. Comparing base (50b167e) to head (3526184).

Files with missing lines Patch % Lines
src/t8_cmesh/t8_cmesh.cxx 40.00% 3 Missing ⚠️
src/t8_cmesh/t8_cmesh_partition.cxx 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1884      +/-   ##
==========================================
- Coverage   75.54%   75.53%   -0.01%     
==========================================
  Files         103      103              
  Lines       19316    19313       -3     
==========================================
- Hits        14592    14588       -4     
- Misses       4724     4725       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@sandro-elsweijer sandro-elsweijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the implementation! Only minor comments

if (t8_cmesh_is_committed (cmesh)) {
comm = t8_shmem_array_get_comm (cmesh->tree_offsets);
T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm));
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still use t8_cmesh_comm_is_valid here because of the straighforward naming

Suggested change
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm);

Copy link
Contributor Author

@dutkalex dutkalex Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change was proposed because it felt clearer that way, since the term "valid" is kind of vague in this context. I would personally argue for endorsing the == spelling as the canonical way of checking this and deprecating the is_valid spelling, but if you guys feel otherwise I'm not against reverting this either

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had another look concerning the is_valid function and there is an easily addressable problem:
The cmesh now saves the comm along with the mpirank and mpisize. Then afterwards, is_valid only checks if the already saved comm and the provided comm are the same. The mpisize and rank are not checked. This can lead to possible errors in other routines which create a new cmesh. For example in t8_cmesh_copy ().
This function takes an empty cmesh, a committed cmesh and a communicator. The new cmesh copies everything of the committed cmesh, including the mpisize and rank. Then the provided comm is checked with the new cmesh. In your case this would lead to a segfault, because the copy routine does not fill the comm of the new cmesh. But even if it filled the new one with the provided comm, the is_valid function would return true, even though it was never checked if the comm has the same mpisize and rank as the already saved rank and size. So is_valid is still necessary and should not only check if the comms are the same, but also if the comm is in alignment with the stored mpisize and rank.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is indeed room for inconsistency here, which could lead to subtle bugs, so this can't make it into production as-is.
A possible solution is effectively to harden the checks to have this bug class be catched by the debug mode. Although probably effective for debugging purposes, this approach does not prevent the bugs from appearing in the first place, thus implying a (hopefully minor) maintenance burden in the long run.
Another solution could be prevent the inconsistency from appearing by only storing the communicator which already holds all the relevant information. The rank and size could then be queried when necessary using MPI_Comm_rank / MPI_Comm_size. This solution requires a little bit more effort right now, but is arguably superior because it would completely eliminate this class of bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandro-elsweijer While reading the forest implementation, it seems like this mpicomm+mpirank+mpisize pattern could lead to the same kind of issues on the forest side. Maybe this should be addressed in a separate PR then, and your solution is the best strategy for this PR. What do you think?

if (!meta_info.pre_commit) {
T8_ASSERT (t8_cmesh_is_committed (cmesh_out));
T8_ASSERT (t8_cmesh_comm_is_valid (cmesh_out, comm));
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh_out) == comm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still use t8_cmesh_comm_is_valid here, because of the naming

Suggested change
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh_out) == comm);
T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm);

#endif

T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm));
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still use t8_cmesh_comm_is_valid here because of the straighforward naming

Suggested change
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm);

cmesh->set_partition = cmesh_from->set_partition;
cmesh->set_partition_level = cmesh_from->set_partition_level;
T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm));
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still use t8_cmesh_comm_is_valid here because of the straighforward naming

Suggested change
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm);

int ieclass;

T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm));
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still use t8_cmesh_comm_is_valid here because of the straighforward naming

Suggested change
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm);

T8_ASSERT (t8_cmesh_is_committed (cmesh));
}
T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm));
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still use t8_cmesh_comm_is_valid here because of the straighforward naming

Suggested change
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm);

const t8_gloidx_t *offset_from, *offset_to;

T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm));
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still use t8_cmesh_comm_is_valid here because of the straighforward naming

Suggested change
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm);


T8_ASSERT (t8_cmesh_is_committed (cmesh));
T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm));
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still use t8_cmesh_comm_is_valid here because of the straighforward naming

Suggested change
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm);

}
if (cmesh != NULL) {
T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm));
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still use t8_cmesh_comm_is_valid here because of the straighforward naming

Suggested change
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm);

Copy link
Member

@sandro-elsweijer sandro-elsweijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check also the other cmesh creation routines like copy and bcast

if (t8_cmesh_is_committed (cmesh)) {
comm = t8_shmem_array_get_comm (cmesh->tree_offsets);
T8_ASSERT (t8_cmesh_comm_is_valid (cmesh, comm));
T8_ASSERT (t8_cmesh_get_mpicomm (cmesh) == comm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had another look concerning the is_valid function and there is an easily addressable problem:
The cmesh now saves the comm along with the mpirank and mpisize. Then afterwards, is_valid only checks if the already saved comm and the provided comm are the same. The mpisize and rank are not checked. This can lead to possible errors in other routines which create a new cmesh. For example in t8_cmesh_copy ().
This function takes an empty cmesh, a committed cmesh and a communicator. The new cmesh copies everything of the committed cmesh, including the mpisize and rank. Then the provided comm is checked with the new cmesh. In your case this would lead to a segfault, because the copy routine does not fill the comm of the new cmesh. But even if it filled the new one with the provided comm, the is_valid function would return true, even though it was never checked if the comm has the same mpisize and rank as the already saved rank and size. So is_valid is still necessary and should not only check if the comms are the same, but also if the comm is in alignment with the stored mpisize and rank.

@dutkalex
Copy link
Contributor Author

We should check also the other cmesh creation routines like copy and bcast

I am not familiar with these functions. What is precisely the expected behavior for each of them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants