Skip to content

Conversation

@shurickdaryin
Copy link

I propose to add simple exception handlers for failed links, so that discovery can finish in their presence.

@jgunthorpe
Copy link
Owner

Thank you for the patch,

What scenarios were you able to use this in?

There are many reasons a SMP send during discovery could fail, this seems to deal with the forward direction failing - is that because a SMA is non-responsive or similar?

I would think the most common reason would be a change in the already discovered region - ie a link going down?

@shurickdaryin
Copy link
Author

This helps in two cases observed in our practice:

  1. an active link is faulty: the switch port is up, but cannot transmit anything;
  2. a device is non-responsive: its ports are up, but do not respond to MADs.

In both cases python-rdma's discovery does not finish due to exceptions. On the contrary, standard tools like ibnetdiscover do finish while reporting observed errors. Results of discovery with proposed patch are consisted with those of ibnetdiscover.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants