Rebasing Multi EE support in QCE and PM runtime patches#742
Open
uditqcom wants to merge 14 commits intoqualcomm-linux:tech/security/cryptofrom
Open
Rebasing Multi EE support in QCE and PM runtime patches#742uditqcom wants to merge 14 commits intoqualcomm-linux:tech/security/cryptofrom
uditqcom wants to merge 14 commits intoqualcomm-linux:tech/security/cryptofrom
Conversation
There's no reason for the instances of this struct to be modifiable. Constify the pointer in struct dma_async_tx_descriptor and all drivers currently using it. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251219-qcom-qce-cmd-descr-v10-1-ff7e4bf7dad4@oss.qualcomm.com
…data In preparation for supporting the pipe locking feature flag, extend the amount of information we can carry in device match data: create a separate structure and make the register information one of its fields. This way, in subsequent patches, it will be just a matter of adding a new field to the device data. Reviewed-by: Dmitry Baryshkov <lumag@kernel.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251219-qcom-qce-cmd-descr-v10-2-ff7e4bf7dad4@oss.qualcomm.com
Extend the device match data with a flag indicating whether the IP supports the BAM lock/unlock feature. Set it to true on BAM IP versions 1.4.0 and above. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251219-qcom-qce-cmd-descr-v10-3-ff7e4bf7dad4@oss.qualcomm.com
Use metadata operations in DMA descriptors to allow BAM users to pass additional information to the engine. To that end: define a new structure - struct bam_desc_metadata - as a medium and define two new commands: for locking and unlocking the BAM respectively. Handle the locking in the .attach() callback. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251219-qcom-qce-cmd-descr-v10-4-ff7e4bf7dad4@oss.qualcomm.com
The header defines a struct embedding struct crypto_queue whose size needs to be known and which is defined in crypto/algapi.h. Move the inclusion from core.c to core.h. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251219-qcom-qce-cmd-descr-v10-5-ff7e4bf7dad4@oss.qualcomm.com
It's unclear what the purpose of this field is. It has been here since the initial commit but without any explanation. The driver works fine without it. We still keep allocating more space in the result buffer, we just don't need to store its address. While at it: move the QCE_IGNORE_BUF_SZ definition into dma.c as it's not used outside of this compilation unit. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251219-qcom-qce-cmd-descr-v10-6-ff7e4bf7dad4@oss.qualcomm.com
This function can extract all the information it needs from struct qce_device alone so simplify its arguments. This is done in preparation for adding support for register I/O over DMA which will require accessing even more fields from struct qce_device. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251219-qcom-qce-cmd-descr-v10-7-ff7e4bf7dad4@oss.qualcomm.com
…est() Switch to devm_kmalloc() and devm_dma_alloc_chan() in devm_qce_dma_request(). This allows us to drop two labels and shrink the function. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251219-qcom-qce-cmd-descr-v10-8-ff7e4bf7dad4@oss.qualcomm.com
As the first step in converting the driver to using DMA for register I/O, let's map the crypto memory range. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251219-qcom-qce-cmd-descr-v10-9-ff7e4bf7dad4@oss.qualcomm.com
Implement the infrastructure for performing register I/O over BAM DMA, not CPU. No functional change yet. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251219-qcom-qce-cmd-descr-v10-10-ff7e4bf7dad4@oss.qualcomm.com
Implement the infrastructure for using the new DMA controller lock/unlock feature of the BAM driver. No functional change for now. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251219-qcom-qce-cmd-descr-v10-11-ff7e4bf7dad4@oss.qualcomm.com
With everything else in place, we can now switch to actually using the BAM DMA for register I/O with DMA engine locking. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251219-qcom-qce-cmd-descr-v10-12-ff7e4bf7dad4@oss.qualcomm.com
…ling support The Qualcomm Crypto Engine (QCE) driver currently lacks support for runtime power management (PM) and interconnect bandwidth control. As a result, the hardware remains fully powered and clocks stay enabled even when the device is idle. Additionally, static interconnect bandwidth votes are held indefinitely, preventing the system from reclaiming unused bandwidth. Address this by enabling runtime PM and dynamic interconnect bandwidth scaling to allow the system to suspend the device when idle and scale interconnect usage based on actual demand. Improve overall system efficiency by reducing power usage and optimizing interconnect resource allocation. Make the following changes as part of this integration: - Add support for pm_runtime APIs to manage device power state transitions. - Implement runtime_suspend() and runtime_resume() callbacks to gate clocks and vote for interconnect bandwidth only when needed. - Replace devm_clk_get_optional_enabled() with devm_pm_clk_create() + pm_clk_add() and let the PM core manage device clocks during runtime PM and system sleep. - Register dev_pm_ops with the platform driver to hook into the PM framework. Tested: - Verify that ICC votes drop to zero after probe and upon request completion. - Confirm that runtime PM usage count increments during active requests and decrements afterward. - Observe that the device correctly enters the suspended state when idle. Signed-off-by: Udit Tiwari <quic_utiwari@quicinc.com> Link: https://lore.kernel.org/r/20260220072818.2921517-1-quic_utiwari@quicinc.com
Signed-off-by: uditqcom <utiwari@qti.qualcomm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently the QCE crypto driver accesses the crypto engine registers
directly via CPU. Trust Zone may perform crypto operations simultaneously
resulting in a race condition. To remedy that, let's introduce support
for BAM locking/unlocking using DMA descriptor metadata as medium for
passing the relevant information from the QCE engine driver to the BAM
driver.
In the specific case of the BAM DMA this translates to sending command
descriptors performing dummy writes with the relevant flags set. The BAM
will then lock all other pipes not related to the current pipe group, and
keep handling the current pipe only until it sees the the unlock bit.
In order for the locking to work correctly, we also need to switch to
using DMA for all register I/O.
On top of this, the series contains some additional tweaks and
refactoring.
The goal of this is not to improve the performance but to prepare the
driver for supporting decryption into secure buffers in the future.
Tested with tcrypt.ko, kcapi and cryptsetup.
Shout out to Daniel and Udit from Qualcomm for helping me out with some
DMA issues we encountered.
Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org
Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@oss.qualcomm.com
Changes in v10:
series
The Qualcomm Crypto Engine (QCE) driver currently lacks support for runtime power management (PM) and interconnect bandwidth control. As a result, the hardware remains fully powered and clocks stay enabled even when the device is idle. Additionally, static interconnect bandwidth votes are held indefinitely, preventing the system from reclaiming unused bandwidth.
Address this by enabling runtime PM and dynamic interconnect bandwidth scaling to allow the system to suspend the device when idle and scale interconnect usage based on actual demand. Improve overall system efficiency by reducing power usage and optimizing interconnect resource allocation.
Make the following changes as part of this integration:
Add support for pm_runtime APIs to manage device power state transitions.
Implement runtime_suspend() and runtime_resume() callbacks to gate clocks and vote for interconnect bandwidth only when needed.
Replace devm_clk_get_optional_enabled() with devm_pm_clk_create() + pm_clk_add() and let the PM core manage device clocks during runtime PM and system sleep.
Register dev_pm_ops with the platform driver to hook into the PM framework.
Tested:
Verify that ICC votes drop to zero after probe and upon request completion.
Confirm that runtime PM usage count increments during active requests and decrements afterward.
Observe that the device correctly enters the suspended state when idle.
Link: https://lore.kernel.org/r/20251120062443.2016084-1-quic_utiwari@quicinc.com
Signed-off-by: Udit Tiwari quic_utiwari@quicinc.com