Conversation
@muliby-lb the scenario is that a volume is deleted through lbcli and then deleted/detach in openstack. |
49ee2cb to
d047940
Compare
so "not supposed to happen". But a reasonable piece of defensive programming, so fine by me. |
cinder/volume/drivers/lightos.py
Outdated
| hostnqn) | ||
|
|
||
| project_name = self._get_lightos_project_name(volume) | ||
| status, resp = self._get_lightos_volume_with_retry(project_name, vol_uuid=volume.id) |
There was a problem hiding this comment.
IMHO, this is too much of protection against unexpected use-case, there are two reasons
1 - Admin deleting an Openstack volume outside openstack UI or CLI.
2 - API failure of volume Get Operation.
If an Openstack volume is deleted using Lighbtis CLI or API then it's guaranteed this code will retry max = 10 and all retries will fail, because of getting unavailable volume.
I would suggest
- Retries are not needed at all or reduce it to MAX=2.
There was a problem hiding this comment.
@rahman-lb this exactly the case its solving. if the admin delete a volume - it will return with 404 and exist. the loop stops if the volume is not found or exists.
There was a problem hiding this comment.
I see,
how about changing this condition if status == httpstatus.NOT_FOUND: to if status == httpstatus.OK
There are many possible cases e.g 404, 401, 403, 204 etc. and You want to continue deletion only when you find the volume and status is OK and in all other cases you don't want to proceed/delete.
7b99551 to
e4dda51
Compare
e4dda51 to
dd51372
Compare
When a volume is deleted in the LightBits cluster and then detached/deleted through OpenStack, the operation will fail and the volume will remain in 'detaching' state. This patch introduces two main improvements: 1. Added _get_lightos_volume_with_retry() method: - Implements retry logic with configurable timeout (default 5 seconds) - Handles transient errors (HTTP 500) by retrying - Returns immediately for definitive status codes (OK, NOT_FOUND, etc.) - Prevents false negatives due to temporary API unavailability 2. Enhanced terminate_connection() to check volume existence: - Uses _get_lightos_volume_with_retry() to verify volume exists - Returns early if volume is NOT_FOUND (already deleted) - Prevents unnecessary ACL operations on non-existent volumes - Avoids leaving volumes stuck in 'detaching' state Also adds a new constant STOP_LIGHTOS_CMD_FOR_RETURN_STATUS_LIST to define HTTP status codes that should stop retries immediately. Test added: test_terminate_connection_no_api_response to verify the early exit behavior when volume is not found. Change-Id: If9cbaf6c73cad9e02a266f3038b64bff37fca00d
dd51372 to
0150d08
Compare
|
uploaded to upstread: |
We check if the volume exists before trying to change the acl - if its return "not found" just return since there is no volume and no acl to change.
Change-Id: If9cbaf6c73cad9e02a266f3038b64bff37fca00d