-
Notifications
You must be signed in to change notification settings - Fork 135
Description
Hi all,
I believe I have encountered a bug with the optimization around the reduction of wake signals for qat offload which causes unneeded latency. The issue seems to be around the usage of thread-local.
Commit which added this optimization: 32f3710
This optimization seems to make the assumption that a offload request will be resumed on the same thread that started it, see this pseudo-example:
This example considers a single epoll instance, with multiple worker threads waiting for events.
int global_epoll_inst = epoll_create()
Thread 1:
- app: bssl_qat_async_start_job -> ASYNC_WAIT_CTX_get_all_fds -> epollctl(global_epoll_inst. EPOLL_CTL_ADD, fd)
- engine: tlv->localOpsInFlight = 0
- engine: qat_rsa_decrypt -> QAT_INC_IN_FLIGHT_REQS -> sem_post
- engine: tlv->localOpsInFlight = 1
Intel QAT Polling Thread:
- sem_timedwait -> WAKE
- poll
Thread 2:
- app: epoll_wait(global_epoll_inst) --> resumed
- app: bssl_qat_async_ctx_copy_result(ctx)
- tlv->localOpsInFlight = 0
- engine: QAT_DEC_IN_FLIGHT_REQS
- tlv->localOpsInFlight = -1
Thread 1:
- app: bssl_qat_async_start_job -> ASYNC_WAIT_CTX_get_all_fds -> epollctl(global_epoll_inst. EPOLL_CTL_ADD, fd)
- engine: tlv->localOpsInFlight = 1
- engine: qat_rsa_decrypt -> QAT_INC_IN_FLIGHT_REQS (*bug* no call to sem_post!)
- engine: tlv->localOpsInFlight = 2
Intel QAT Polling Thread:
- sem_timedwait -> TIMEOUT
- poll
...
The side-effect is that the TLV get's into a state where localOpsInFlight it will never == 1 and so sem_post never get's called anymore and then the polling thread's sem times out.
The relevant code section seems to be:
Lines 324 to 327 in ba2035c
| QAT_INC_IN_FLIGHT_REQS(num_requests_in_flight, tlv); | |
| if (qat_use_signals()) { | |
| if (tlv->localOpsInFlight == 1) { | |
| if (sem_post(&hw_polling_thread_sem) != 0) { |
I'm wondering if it would be possible to use num_requests_in_flight == 1 (atomic) in the if ? I did not test this, but the use of a TLV here seems like it could be problematic to consumers