Skip to content

Conversation

@PlaidCat
Copy link
Collaborator

This is the attempt at a re-builder built on Cron and some internal tools, but the same process is as follows as previous rebuilds

  • Download all unprocessed src.rpm
  • for each src,pm
    • Find all commits in changelog up to last known tag ... in this case 5.14.0-611
    • Re-play commits in reverse order (oldest in change log to newest) with git cherry-pick
    • After replay replace ENTIRE code in branch with rpmbuild -bp from corresponding src.rpm.
    • Tag Rebuild branch

Rebuild Splat Inspection

kernel-5.14.0-611.16.1.el9_7

[jmaple@devbox kernel-src-tree]$ cat ciq/ciq_backports/kernel-5.14.0-611.16.1.el9_7/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v5.14~1..kernel-mainline: 337791
Number of commits in rpm: 20
Number of commits matched with upstream: 16 (80.00%)
Number of commits in upstream but not in rpm: 337775
Number of commits NOT found in upstream: 4 (20.00%)

Rebuilding Kernel on Branch rocky9_7_rebuild_kernel-5.14.0-611.16.1.el9_7 for kernel-5.14.0-611.16.1.el9_7
Clean Cherry Picks: 14 (87.50%)
Empty Cherry Picks: 2 (12.50%)
_______________________________

__EMPTY COMMITS__________________________
4e034bf045b12852a24d5d33f2451850818ba0c1 iommufd: Fix race during abort for file descriptors
c28f922c9dcee0e4876a2c095939d77fe7e15116 clone_private_mnt(): make sure that caller has CAP_SYS_ADMIN in the right userns

__CHANGES NOT IN UPSTREAM________________
Porting to Rocky Linux 9, debranding and Rocky branding'
Ensure aarch64 kernel is not compressed'
scsi: lpfc: avoid crashing in lpfc_nlp_get() if lpfc_nodelist was freed
scsi: lpfc: Fix reusing an ndlp that is marked NLP_DROPPED during FLOGI

Build

[jmaple@devbox code]$ egrep -B 5 -A 5 "\[TIMER\]|^Starting Build" $(ls -t kbuild* | head -n1)
/mnt/code/kernel-src-tree-build
Running make mrproper...
  CLEAN   scripts/basic
  CLEAN   scripts/kconfig
  CLEAN   include/config include/generated
[TIMER]{MRPROPER}: 5s
x86_64 architecture detected, copying config
'configs/kernel-x86_64-rhel.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-rocky9_7_rebuild-03fef51f5457"
Making olddefconfig
--
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
#
# configuration written to .config
#
Starting Build
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_64.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
--
  BTF [M] sound/usb/usx2y/snd-usb-usx2y.ko
  LD [M]  sound/xen/snd_xen_front.ko
  BTF [M] sound/x86/snd-hdmi-lpe-audio.ko
  BTF [M] sound/virtio/virtio_snd.ko
  BTF [M] sound/xen/snd_xen_front.ko
[TIMER]{BUILD}: 1498s
Making Modules
  INSTALL /lib/modules/5.14.0-rocky9_7_rebuild-03fef51f5457/kernel/arch/x86/crypto/blake2s-x86_64.ko
  INSTALL /lib/modules/5.14.0-rocky9_7_rebuild-03fef51f5457/kernel/arch/x86/crypto/blowfish-x86_64.ko
  INSTALL /lib/modules/5.14.0-rocky9_7_rebuild-03fef51f5457/kernel/arch/x86/crypto/camellia-aesni-avx-x86_64.ko
  INSTALL /lib/modules/5.14.0-rocky9_7_rebuild-03fef51f5457/kernel/arch/x86/crypto/camellia-aesni-avx2.ko
--
  SIGN    /lib/modules/5.14.0-rocky9_7_rebuild-03fef51f5457/kernel/sound/x86/snd-hdmi-lpe-audio.ko
  SIGN    /lib/modules/5.14.0-rocky9_7_rebuild-03fef51f5457/kernel/sound/xen/snd_xen_front.ko
  SIGN    /lib/modules/5.14.0-rocky9_7_rebuild-03fef51f5457/kernel/sound/virtio/virtio_snd.ko
  SIGN    /lib/modules/5.14.0-rocky9_7_rebuild-03fef51f5457/kernel/sound/usb/snd-usb-audio.ko
  DEPMOD  /lib/modules/5.14.0-rocky9_7_rebuild-03fef51f5457
[TIMER]{MODULES}: 8s
Making Install
sh ./arch/x86/boot/install.sh 5.14.0-rocky9_7_rebuild-03fef51f5457 \
        arch/x86/boot/bzImage System.map "/boot"
[TIMER]{INSTALL}: 22s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-5.14.0-rocky9_7_rebuild-03fef51f5457 and Index to 0
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 5s
[TIMER]{BUILD}: 1498s
[TIMER]{MODULES}: 8s
[TIMER]{INSTALL}: 22s
[TIMER]{TOTAL} 1538s
Rebooting in 10 seconds

KSelfTest

[jmaple@devbox code]$ ~/workspace/auto_kernel_history_rebuild/Rocky10/rocky10/code/get_kselftest_diff.sh
kselftest.5.14.0-jmaple_rlc-9_5.14.0-611.9.1.el9_7-4edb6878037a+.log
311
kselftest.5.14.0-rocky9_7_rebuild-386d677a861c.log
313
kselftest.5.14.0-rocky9_7_rebuild-2d407efe1dcc.log
313
kselftest.5.14.0-rocky9_7_rebuild-03fef51f5457.log
313
Before: kselftest.5.14.0-rocky9_7_rebuild-2d407efe1dcc.log
After: kselftest.5.14.0-rocky9_7_rebuild-03fef51f5457.log
Diff:
No differences found.

jira KERNEL-393
cve CVE-2025-39966
Rebuild_History Non-Buildable kernel-5.14.0-611.16.1.el9_7
commit-author Jason Gunthorpe <jgg@ziepe.ca>
commit 4e034bf
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-5.14.0-611.16.1.el9_7/4e034bf0.failed

fput() doesn't actually call file_operations release() synchronously, it
puts the file on a work queue and it will be released eventually.

This is normally fine, except for iommufd the file and the iommufd_object
are tied to gether. The file has the object as it's private_data and holds
a users refcount, while the object is expected to remain alive as long as
the file is.

When the allocation of a new object aborts before installing the file it
will fput() the file and then go on to immediately kfree() the obj. This
causes a UAF once the workqueue completes the fput() and tries to
decrement the users refcount.

Fix this by putting the core code in charge of the file lifetime, and call
__fput_sync() during abort to ensure that release() is called before
kfree. __fput_sync() is a bit too tricky to open code in all the object
implementations. Instead the objects tell the core code where the file
pointer is and the core will take care of the life cycle.

If the object is successfully allocated then the file will hold a users
refcount and the iommufd_object cannot be destroyed.

It is worth noting that close(); ioctl(IOMMU_DESTROY); doesn't have an
issue because close() is already using a synchronous version of fput().

The UAF looks like this:

    BUG: KASAN: slab-use-after-free in iommufd_eventq_fops_release+0x45/0xc0 drivers/iommu/iommufd/eventq.c:376
    Write of size 4 at addr ffff888059c97804 by task syz.0.46/6164

    CPU: 0 UID: 0 PID: 6164 Comm: syz.0.46 Not tainted syzkaller #0 PREEMPT(full)
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025
    Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:94 [inline]
     dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
     print_address_description mm/kasan/report.c:378 [inline]
     print_report+0xcd/0x630 mm/kasan/report.c:482
     kasan_report+0xe0/0x110 mm/kasan/report.c:595
     check_region_inline mm/kasan/generic.c:183 [inline]
     kasan_check_range+0x100/0x1b0 mm/kasan/generic.c:189
     instrument_atomic_read_write include/linux/instrumented.h:96 [inline]
     atomic_fetch_sub_release include/linux/atomic/atomic-instrumented.h:400 [inline]
     __refcount_dec include/linux/refcount.h:455 [inline]
     refcount_dec include/linux/refcount.h:476 [inline]
     iommufd_eventq_fops_release+0x45/0xc0 drivers/iommu/iommufd/eventq.c:376
     __fput+0x402/0xb70 fs/file_table.c:468
     task_work_run+0x14d/0x240 kernel/task_work.c:227
     resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
     exit_to_user_mode_loop+0xeb/0x110 kernel/entry/common.c:43
     exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
     syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
     syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
     do_syscall_64+0x41c/0x4c0 arch/x86/entry/syscall_64.c:100
     entry_SYSCALL_64_after_hwframe+0x77/0x7f

Link: https://patch.msgid.link/r/1-v1-02cd136829df+31-iommufd_syz_fput_jgg@nvidia.com
	Cc: stable@vger.kernel.org
Fixes: 07838f7 ("iommufd: Add iommufd fault object")
	Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
	Reviewed-by: Nirmoy Das <nirmoyd@nvidia.com>
	Reviewed-by: Kevin Tian <kevin.tian@intel.com>
	Tested-by: Nicolin Chen <nicolinc@nvidia.com>
	Reported-by: syzbot+80620e2d0d0a33b09f93@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/68c8583d.050a0220.2ff435.03a2.GAE@google.com
	Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
(cherry picked from commit 4e034bf)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	drivers/iommu/iommufd/eventq.c
#	drivers/iommu/iommufd/main.c
jira KERNEL-393
Rebuild_History Non-Buildable kernel-5.14.0-611.16.1.el9_7
commit-author Ashish Kalra <ashish.kalra@amd.com>
commit 6131e11

If SNP host support (SYSCFG.SNPEn) is set, then the RMP table must
be initialized before calling SEV INIT.

In other words, if SNP_INIT(_EX) is not issued or fails then
SEV INIT will fail if SNP host support (SYSCFG.SNPEn) is enabled.

	Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
	Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 6131e11)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira KERNEL-393
Rebuild_History Non-Buildable kernel-5.14.0-611.16.1.el9_7
commit-author Ashish Kalra <ashish.kalra@amd.com>
commit 9770b42

Move dev_info and dev_err messages related to SEV/SNP initialization
and shutdown into __sev_platform_init_locked(), __sev_snp_init_locked()
and __sev_platform_shutdown_locked(), __sev_snp_shutdown_locked() so
that they don't need to be issued from callers.

This allows both _sev_platform_init_locked() and various SEV/SNP ioctls
to call __sev_platform_init_locked(), __sev_snp_init_locked() and
__sev_platform_shutdown_locked(), __sev_snp_shutdown_locked() for
implicit SEV/SNP initialization and shutdown without additionally
printing any errors/success messages.

	Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
	Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
	Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 9770b42)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira KERNEL-393
Rebuild_History Non-Buildable kernel-5.14.0-611.16.1.el9_7
commit-author Ashish Kalra <ashish.kalra@amd.com>
commit ceac7fb

Modify the behavior of implicit SEV initialization in some of the
SEV ioctls to do both SEV initialization and shutdown and add
implicit SNP initialization and shutdown to some of the SNP ioctls
so that the change of SEV/SNP platform initialization not being
done during PSP driver probe time does not break userspace tools
such as sevtool, etc.

Prior to this patch, SEV has always been initialized before these
ioctls as SEV initialization is done as part of PSP module probe,
but now with SEV initialization being moved to KVM module load instead
of PSP driver probe, the implied SEV INIT actually makes sense and gets
used and additionally to maintain SEV platform state consistency
before and after the ioctl SEV shutdown needs to be done after the
firmware call.

It is important to do SEV Shutdown here with the SEV/SNP initialization
moving to KVM, an implicit SEV INIT here as part of the SEV ioctls not
followed with SEV Shutdown will cause SEV to remain in INIT state and
then a future SNP INIT in KVM module load will fail.

Also ensure that for these SEV ioctls both implicit SNP and SEV INIT is
done followed by both SEV and SNP shutdown as RMP table must be
initialized before calling SEV INIT if SNP host support is enabled.

Similarly, prior to this patch, SNP has always been initialized before
these ioctls as SNP initialization is done as part of PSP module probe,
therefore, to keep a consistent behavior, SNP init needs to be done
here implicitly as part of these ioctls followed with SNP shutdown
before returning from the ioctl to maintain the consistent platform
state before and after the ioctl.

	Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
	Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
	Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
	Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit ceac7fb)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira KERNEL-393
Rebuild_History Non-Buildable kernel-5.14.0-611.16.1.el9_7
commit-author Ashish Kalra <ashish.kalra@amd.com>
commit 65a895a

Implicit SNP initialization as part of some SNP ioctls modify TMR size
to be SNP compliant which followed by SNP shutdown will leave the
TMR size modified and then subsequently cause SEV only initialization
to fail, hence, reset TMR size to default at SNP Shutdown.

	Acked-by: Dionna Glaze <dionnaglaze@google.com>
	Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
	Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
	Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 65a895a)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira KERNEL-393
Rebuild_History Non-Buildable kernel-5.14.0-611.16.1.el9_7
commit-author Ashish Kalra <ashish.kalra@amd.com>
commit 19860c3

Currently, the SNP panic notifier is registered on module initialization
regardless of whether SNP is being enabled or initialized.

Instead, register the SNP panic notifier only when SNP is actually
initialized and unregister the notifier when SNP is shutdown.

	Reviewed-by: Dionna Glaze <dionnaglaze@google.com>
	Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
	Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
	Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
	Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 19860c3)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira KERNEL-393
Rebuild_History Non-Buildable kernel-5.14.0-611.16.1.el9_7
commit-author Ashish Kalra <ashish.kalra@amd.com>
commit f7b86e0

Add new API interface to do SEV/SNP platform shutdown when KVM module
is unloaded.

	Reviewed-by: Dionna Glaze <dionnaglaze@google.com>
	Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
	Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
	Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit f7b86e0)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira KERNEL-393
Rebuild_History Non-Buildable kernel-5.14.0-611.16.1.el9_7
commit-author Ashish Kalra <ashish.kalra@amd.com>
commit 6f1d5a3

Move platform initialization of SEV/SNP from CCP driver probe time to
KVM module load time so that KVM can do SEV/SNP platform initialization
explicitly if it actually wants to use SEV/SNP functionality.

Add support for KVM to explicitly call into the CCP driver at load time
to initialize SEV/SNP. If required, this behavior can be altered with KVM
module parameters to not do SEV/SNP platform initialization at module load
time. Additionally, a corresponding SEV/SNP platform shutdown is invoked
during KVM module unload time.

Continue to support SEV deferred initialization as the user may have the
file containing SEV persistent data for SEV INIT_EX available only later
after module load/init.

	Suggested-by: Sean Christopherson <seanjc@google.com>
	Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
	Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
	Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 6f1d5a3)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira KERNEL-393
Rebuild_History Non-Buildable kernel-5.14.0-611.16.1.el9_7
commit-author Ashish Kalra <ashish.kalra@amd.com>
commit 3f8f013

SNP initialization is forced during PSP driver probe purely because SNP
can't be initialized if VMs are running.  But the only in-tree user of
SEV/SNP functionality is KVM, and KVM depends on PSP driver for the same.
Forcing SEV/SNP initialization because a hypervisor could be running
legacy non-confidential VMs make no sense.

This patch removes SEV/SNP initialization from the PSP driver probe
time and moves the requirement to initialize SEV/SNP functionality
to KVM if it wants to use SEV/SNP.

	Suggested-by: Sean Christopherson <seanjc@google.com>
	Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
	Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
	Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 3f8f013)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira KERNEL-393
Rebuild_History Non-Buildable kernel-5.14.0-611.16.1.el9_7
commit-author Ashish Kalra <ashish.kalra@amd.com>
commit 9af6339

Fix smatch warning:
	drivers/crypto/ccp/sev-dev.c:1755 __sev_snp_shutdown_locked()
	error: uninitialized symbol 'dfflush_error'.

Fixes: 9770b42 ("crypto: ccp - Move dev_info/err messages for SEV/SNP init and shutdown")
	Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/linux-crypto/d9c2e79c-e26e-47b7-8243-ff6e7b101ec3@stanley.mountain/
	Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
	Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
	Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 9af6339)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira KERNEL-393
Rebuild_History Non-Buildable kernel-5.14.0-611.16.1.el9_7
commit-author Ashish Kalra <ashish.kalra@amd.com>
commit 0fa7667

Fix below smatch warnings:
drivers/crypto/ccp/sev-dev.c:1312 __sev_platform_init_locked()
error: we previously assumed 'error' could be null

Fixes: 9770b42 ("crypto: ccp - Move dev_info/err messages for SEV/SNP init and shutdown")
	Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202505071746.eWOx5QgC-lkp@intel.com/
	Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
	Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
	Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 0fa7667)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira KERNEL-393
Rebuild_History Non-Buildable kernel-5.14.0-611.16.1.el9_7
commit-author Ashish Kalra <ashish.kalra@amd.com>
commit ab8b9fd

Panic notifiers are invoked with RCU read lock held and when the
SNP panic notifier tries to unregister itself from the panic
notifier callback itself it causes a deadlock as notifier
unregistration does RCU synchronization.

Code flow for SNP panic notifier:
snp_shutdown_on_panic() ->
__sev_firmware_shutdown() ->
__sev_snp_shutdown_locked() ->
atomic_notifier_chain_unregister(.., &snp_panic_notifier)

Fix SNP panic notifier to unregister itself during SNP shutdown
only if panic is not in progress.

	Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
	Cc: stable@vger.kernel.org
Fixes: 19860c3 ("crypto: ccp - Register SNP panic notifier only if SNP is enabled")
	Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
	Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit ab8b9fd)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
…own_locked()

jira KERNEL-393
Rebuild_History Non-Buildable kernel-5.14.0-611.16.1.el9_7
commit-author Borislav Petkov (AMD) <bp@alien8.de>
commit 46834d9

When

  9770b42 ("crypto: ccp - Move dev_info/err messages for SEV/SNP init and shutdown")

moved the error messages dumping so that they don't need to be issued by
the callers, it missed the case where __sev_firmware_shutdown() calls
__sev_platform_shutdown_locked() with a NULL argument which leads to
a NULL ptr deref on the shutdown path, during suspend to disk:

  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: Oops: 0000 [#1] SMP NOPTI
  CPU: 0 UID: 0 PID: 983 Comm: hib.sh Not tainted 6.17.0-rc4+ #1 PREEMPT(voluntary)
  Hardware name: Supermicro Super Server/H12SSL-i, BIOS 2.5 09/08/2022
  RIP: 0010:__sev_platform_shutdown_locked.cold+0x0/0x21 [ccp]

That rIP is:

  00000000000006fd <__sev_platform_shutdown_locked.cold>:
   6fd:   8b 13                   mov    (%rbx),%edx
   6ff:   48 8b 7d 00             mov    0x0(%rbp),%rdi
   703:   89 c1                   mov    %eax,%ecx

  Code: 74 05 31 ff 41 89 3f 49 8b 3e 89 ea 48 c7 c6 a0 8e 54 a0 41 bf 92 ff ff ff e8 e5 2e 09 e1 c6 05 2a d4 38 00 01 e9 26 af ff ff <8b> 13 48 8b 7d 00 89 c1 48 c7 c6 18 90 54 a0 89 44 24 04 e8 c1 2e
  RSP: 0018:ffffc90005467d00 EFLAGS: 00010282
  RAX: 00000000ffffff92 RBX: 0000000000000000 RCX: 0000000000000000
  			     ^^^^^^^^^^^^^^^^
and %rbx is nice and clean.

  Call Trace:
   <TASK>
   __sev_firmware_shutdown.isra.0
   sev_dev_destroy
   psp_dev_destroy
   sp_destroy
   pci_device_shutdown
   device_shutdown
   kernel_power_off
   hibernate.cold
   state_store
   kernfs_fop_write_iter
   vfs_write
   ksys_write
   do_syscall_64
   entry_SYSCALL_64_after_hwframe

Pass in a pointer to the function-local error var in the caller.

With that addressed, suspending the ccp shows the error properly at
least:

  ccp 0000:47:00.1: sev command 0x2 timed out, disabling PSP
  ccp 0000:47:00.1: SEV: failed to SHUTDOWN error 0x0, rc -110
  SEV-SNP: Leaking PFN range 0x146800-0x146a00
  SEV-SNP: PFN 0x146800 unassigned, dumping non-zero entries in 2M PFN region: [0x146800 - 0x146a00]
  ...
  ccp 0000:47:00.1: SEV-SNP firmware shutdown failed, rc -16, error 0x0
  ACPI: PM: Preparing to enter system sleep state S5
  kvm: exiting hardware virtualization
  reboot: Power down

Btw, this driver is crying to be cleaned up to pass in a proper I/O
struct which can be used to store information between the different
functions, otherwise stuff like that will happen in the future again.

Fixes: 9770b42 ("crypto: ccp - Move dev_info/err messages for SEV/SNP init and shutdown")
	Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
	Cc: <stable@kernel.org>
	Reviewed-by: Ashish Kalra <ashish.kalra@amd.com>
	Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
	Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 46834d9)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira KERNEL-393
Rebuild_History Non-Buildable kernel-5.14.0-611.16.1.el9_7
commit-author Ondrej Mosnacek <omosnace@redhat.com>
commit 81ccca3

sock_{send,recv}msg() internally calls security_socket_{send,recv}msg(),
which does security checks (e.g. SELinux) for socket access against the
current task. However, _sock_xmit() in drivers/block/nbd.c may be called
indirectly from a userspace syscall, where the NBD socket access would
be incorrectly checked against the calling userspace task (which simply
tries to read/write a file that happens to reside on an NBD device).

To fix this, temporarily override creds to kernel ones before calling
the sock_*() functions. This allows the security modules to recognize
this as internal access by the kernel, which will normally be allowed.

A way to trigger the issue is to do the following (on a system with
SELinux set to enforcing):

    ### Create nbd device:
    truncate -s 256M /tmp/testfile
    nbd-server localhost:10809 /tmp/testfile

    ### Connect to the nbd server:
    nbd-client localhost

    ### Create mdraid array
    mdadm --create -l 1 -n 2 /dev/md/testarray /dev/nbd0 missing

After these steps, assuming the SELinux policy doesn't allow the
unexpected access pattern, errors will be visible on the kernel console:

[  142.204243] nbd0: detected capacity change from 0 to 524288
[  165.189967] md: async del_gendisk mode will be removed in future, please upgrade to mdadm-4.5+
[  165.252299] md/raid1:md127: active with 1 out of 2 mirrors
[  165.252725] md127: detected capacity change from 0 to 522240
[  165.255434] block nbd0: Send control failed (result -13)
[  165.255718] block nbd0: Request send failed, requeueing
[  165.256006] block nbd0: Dead connection, failed to find a fallback
[  165.256041] block nbd0: Receive control failed (result -32)
[  165.256423] block nbd0: shutting down sockets
[  165.257196] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[  165.257736] Buffer I/O error on dev md127, logical block 0, async page read
[  165.258263] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[  165.259376] Buffer I/O error on dev md127, logical block 0, async page read
[  165.259920] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[  165.260628] Buffer I/O error on dev md127, logical block 0, async page read
[  165.261661] ldm_validate_partition_table(): Disk read failed.
[  165.262108] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[  165.262769] Buffer I/O error on dev md127, logical block 0, async page read
[  165.263697] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[  165.264412] Buffer I/O error on dev md127, logical block 0, async page read
[  165.265412] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[  165.265872] Buffer I/O error on dev md127, logical block 0, async page read
[  165.266378] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[  165.267168] Buffer I/O error on dev md127, logical block 0, async page read
[  165.267564]  md127: unable to read partition table
[  165.269581] I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[  165.269960] Buffer I/O error on dev nbd0, logical block 0, async page read
[  165.270316] I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[  165.270913] Buffer I/O error on dev nbd0, logical block 0, async page read
[  165.271253] I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[  165.271809] Buffer I/O error on dev nbd0, logical block 0, async page read
[  165.272074] ldm_validate_partition_table(): Disk read failed.
[  165.272360]  nbd0: unable to read partition table
[  165.289004] ldm_validate_partition_table(): Disk read failed.
[  165.289614]  nbd0: unable to read partition table

The corresponding SELinux denial on Fedora/RHEL will look like this
(assuming it's not silenced):
type=AVC msg=audit(1758104872.510:116): avc:  denied  { write } for  pid=1908 comm="mdadm" laddr=::1 lport=32772 faddr=::1 fport=10809 scontext=system_u:system_r:mdadm_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=tcp_socket permissive=0

The respective backtrace looks like this:
@security[mdadm, -13,
        handshake_exit+221615650
        handshake_exit+221615650
        handshake_exit+221616465
        security_socket_sendmsg+5
        sock_sendmsg+106
        handshake_exit+221616150
        sock_sendmsg+5
        __sock_xmit+162
        nbd_send_cmd+597
        nbd_handle_cmd+377
        nbd_queue_rq+63
        blk_mq_dispatch_rq_list+653
        __blk_mq_do_dispatch_sched+184
        __blk_mq_sched_dispatch_requests+333
        blk_mq_sched_dispatch_requests+38
        blk_mq_run_hw_queue+239
        blk_mq_dispatch_plug_list+382
        blk_mq_flush_plug_list.part.0+55
        __blk_flush_plug+241
        __submit_bio+353
        submit_bio_noacct_nocheck+364
        submit_bio_wait+84
        __blkdev_direct_IO_simple+232
        blkdev_read_iter+162
        vfs_read+591
        ksys_read+95
        do_syscall_64+92
        entry_SYSCALL_64_after_hwframe+120
]: 1

The issue has started to appear since commit 060406c ("block: add
plug while submitting IO").

	Cc: Ming Lei <ming.lei@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2348878
Fixes: 060406c ("block: add plug while submitting IO")
	Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
	Acked-by: Paul Moore <paul@paul-moore.com>
	Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
	Reviewed-by: Ming Lei <ming.lei@redhat.com>
	Tested-by: Ming Lei <ming.lei@redhat.com>
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 81ccca3)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira KERNEL-393
cve CVE-2025-40176
Rebuild_History Non-Buildable kernel-5.14.0-611.16.1.el9_7
commit-author Sabrina Dubroca <sd@queasysnail.net>
commit b8a6ff8

Async decryption calls tls_strp_msg_hold to create a clone of the
input skb to hold references to the memory it uses. If we fail to
allocate that clone, proceeding with async decryption can lead to
various issues (UAF on the skb, writing into userspace memory after
the recv() call has returned).

In this case, wait for all pending decryption requests.

Fixes: 84c61fe ("tls: rx: do not use the standard strparser")
	Reported-by: Jann Horn <jannh@google.com>
	Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/b9fe61dcc07dab15da9b35cf4c7d86382a98caf2.1760432043.git.sd@queasysnail.net
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit b8a6ff8)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
…ight userns

jira KERNEL-393
cve CVE-2025-38499
Rebuild_History Non-Buildable kernel-5.14.0-611.16.1.el9_7
Rebuild_CHGLOG: - CVE-2025-38499 kernel: clone_private_mnt(): make sure that caller has CAP_SYS_ADMIN in the right userns (Abhi Das) [RHEL-129261] {CVE-2025-38499}
Rebuild_FUZZ: 87.43%
commit-author Al Viro <viro@zeniv.linux.org.uk>
commit c28f922
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-5.14.0-611.16.1.el9_7/c28f922c.failed

What we want is to verify there is that clone won't expose something
hidden by a mount we wouldn't be able to undo.  "Wouldn't be able to undo"
may be a result of MNT_LOCKED on a child, but it may also come from
lacking admin rights in the userns of the namespace mount belongs to.

clone_private_mnt() checks the former, but not the latter.

There's a number of rather confusing CAP_SYS_ADMIN checks in various
userns during the mount, especially with the new mount API; they serve
different purposes and in case of clone_private_mnt() they usually,
but not always end up covering the missing check mentioned above.

	Reviewed-by: Christian Brauner <brauner@kernel.org>
	Reported-by: "Orlando, Noah" <Noah.Orlando@deshaw.com>
Fixes: 427215d ("ovl: prevent private clone if bind mount is not allowed")
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit c28f922)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	fs/namespace.c
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v5.14~1..kernel-mainline: 337791
Number of commits in rpm: 20
Number of commits matched with upstream: 16 (80.00%)
Number of commits in upstream but not in rpm: 337775
Number of commits NOT found in upstream: 4 (20.00%)

Rebuilding Kernel on Branch rocky9_7_rebuild_kernel-5.14.0-611.16.1.el9_7 for kernel-5.14.0-611.16.1.el9_7
Clean Cherry Picks: 14 (87.50%)
Empty Cherry Picks: 2 (12.50%)
_______________________________

Full Details Located here:
ciq/ciq_backports/kernel-5.14.0-611.16.1.el9_7/rebuild.details.txt

Includes:
* git commit header above
* Empty Commits with upstream SHA
* RPM ChangeLog Entries that could not be matched

Individual Empty Commit failures contained in the same containing directory.
The git message for empty commits will have the path for the failed commit.
File names are the first 8 characters of the upstream SHA
@PlaidCat PlaidCat requested review from a team December 22, 2025 14:00
@PlaidCat PlaidCat self-assigned this Dec 22, 2025
Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

Copy link

@jdieter jdieter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@PlaidCat PlaidCat merged commit 03fef51 into rocky9_7 Dec 22, 2025
4 checks passed
@PlaidCat PlaidCat deleted the rocky9_7_rebuild branch December 22, 2025 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants