StarlingX stopped supporting CentOS builds in the after release 7.0.
This update will strip CentOS from our code base. It will also remove
references to the failed OpenSUSE feature as well.
Story: 2011110
Task: 49955
Change-Id: I927a02d39114862c6a4ebd12c8c88640be18e370
Signed-off-by: Scott Little <scott.little@windriver.com>
This commit upgrades iavf to version 4.5.3.4 from 4.5.3.2 to fix the
issue "iavf 0000:17:01.6: Never saw reset".
The following root cause analysis comes from Intel.
"""
The iavf_adminq_task() function processes the device Admin queue,
which is used to handle receiving messages from the PF driver.
It calls iavf_clean_arq_element() to extract the message at the head
of the queue, and processes it by calling iavf_virtchnl_completion().
There is a subtle race between iavf_adminq_task() and
iavf_watchdog_task() involving the processing of
VIRTCHNL_EVENT_RESET_IMPENDING. The race results in the iavf driver
getting stuck waiting for a reset that has already completed, printing
"Never saw reset" once every 5 seconds, and locking the driver in the
__IAVF_RESET state, preventing normal operations from proceeding.
The entire race can be avoided if the iavf_adminq_task() stops holding
onto potentially stale data. To do this, acquire the
__IAVF_IN_CRITICAL_TASK at the start of the function. With this, it is
no longer possible for the function to be blocked holding the data in
its event buffer while the iavf_watchdog_task() function processes the
entire hardware reset.
Instead of sleeping with a while loop, just re-queue the
iavf_adminq_task() when we are unable to acquire the bit lock.
Additionally, align with upstream and check the removal status to
avoid re-queuing in the event that the driver has already started
remove.
This new flow also aligns with the way the upstream driver handles
locking and completely avoids the race. If the iavf_adminq_task()
happens to be delayed until the hardware reset completes, it will no
longer see the VIRTCHNL_EVENT_RESET_IMPENDING data, as this will have
been cleared by the hardware reset.
"""
Verification:
- The following command with this commit results in a successful iavf
kernel module build for standard and PREEMPT_RT kernels:
build-pkgs -c -p iavf
- A StarlingX ISO image was installed onto an All-in-One Dell XR11 lab
with one Intel E810 NIC server in low-latency mode.
- The user who reported this issue was provided with a StarlingX
designer patch that incorporates this change. The user in question
did not encounter any issues during their testing with the designer
patch.
Closes-Bug: 2058858
Change-Id: I448ee1e302bdc7277a6c5db990d4d5cfc485a0f4
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
Intel 4th generation Xeon Scalable Processor (Sapphire Rapids) support
has been introduced for the platform. In order to leverage the
integrated QAT device of the SP-MCC SKUs, QAT user space package
QATengine need be integrated.
QATengine provides cryptographic acceleration for both hardware
and optimized software using Intel QuickAssist Technology enabled Intel
platforms. Intel QATengine project repository link is
https://github.com/intel/QAT_Engine
And qat_hw target with OpenSSL\* 1.1.1 is built from source.
New package qat2.0.l-dev is added to contain the *.c and *.h files in
the qat2.0.l driver package. Some of these files are required by the
user-space packages' build procedures.
Test plan:
- PASS: build-pkgs -a && build-image
- PASS: /usr/bin/openssl engine -t -c qatengine
- PASS: Test engine with openssl utility
Story: 2010796
Task: 49675
Change-Id: Id174fd06580e693a305b3e9ebaa09f550418b51c
Signed-off-by: Peng Zhang <Peng.Zhang2@windriver.com>
This commit uprevisions the octeon_ep, octeon_ep_vf and oct_ep_phc
drivers from v23.04 to v23.11 to enable use cases that utilize the Dell
Open RAN Accelerator (DORA) card based on Marvell's Octeon
system-on-chip (SoC).
As the driver source code available on Sourceforge does not appear to be
kept up-to-date, the build system configuration files are updated to
acquire the driver source code from a Marvell-maintained git repository
on GitHub.
This commit also accommodates the minor differences between the
directory structures of the source code tar archive on Sourceforge and
the git repository on GitHub by modifying the debian/rules file.
We also block the automatic loading of the oct_ep_phc driver via a
modprobe.d configuration entry, for two reasons:
1) The oct_ep_phc driver does not appear to be needed by the major user
whose use cases are enabled by this driver uprevision.
2) The oct_ep_phc driver triggers a kernel crash when being unloaded,
due to an initialization error handling bug related to the DORA card,
as reported at:
https://github.com/MarvellEmbeddedProcessors/pcie_ep_octeon_host/issues/2
The two patches applied to the driver package as part of StarlingX are
refreshed and adapted to apply cleanly onto the newer driver package
version acquired from GitHub.
The modprobe configuration file is renamed to octeon-ep.conf to adhere
to inclusive language guidelines.
Finally, the "debian/copyright" file is updated to adhere to Debian's
formatting guidelines published at [1], to update the name of the source
package, to note that most files are licensed under the GPL-2 license
and that the "apps" directory is licensed under the Apache-2.0 license.
Also, please note that the Makefile in the source code package acquired
from GitHub does not have a specific/different license, unlike the
package acquired from Sourceforge, so the special case for that file is
removed.
[1] https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
== Additional Information ==
We would like to note that the DORA card is a bit special with respect
to its configuration interface. In summary, the octeon_ep physical
function (PF) driver instantiates a network interface managed by the
kernel, which acts as the configuration interface to the accelerator
card. The card sends DHCP discovery requests via the network interface.
If a DHCP server is listening on the network interface, then the card
acquires an IP address and a firmware download can be carried out via
the interface to fully initialize the accelerator card.
Unfortunately, we do not have access to the firmware images and the
software packages necessary to test the accelerator card end-to-end, so
our verification has been limited to ensuring that the DHCP discovery
requests are observed on the network interface created by the PF driver
after the driver is loaded.
The accelerator also has a serial console that can be attached to the
host via a USB-to-serial adapter, but our understanding is that the labs
we have been using for verification did not have this serial connection
set up.
== Verification ==
* An ISO image can be built with this commit applied to a repo project
of a StarlingX-based distribution tracking StarlingX's master branch.
* The ISO image can be installed to a Dell XR11 server with a DORA card,
and the system is successfully Ansible-bootstrapped.
* The octeon_ep, octeon_ep_vf and oct_ep_phc drivers are observed to not
be automatically loaded.
* The octeon_ep driver can be loaded manually with modprobe, and a PF
interface is instantiated by the kernel. Once the PF interface is
brought up with the "ip" command, DHCP discovery packets are observed
on the interface by running (as root):
tcpdump -i <pf_iface> -nn -e 'udp port 67 or udp port 68'
* Virtual function (VF) interfaces can be instantiated by loading the
octeon_ep_vf driver with modprobe and then writing (for example) the
string "2" to the magic sysfs file at:
/sys/class/net/<pf_iface>/device/sriov_numvfs
* The VF interfaces can be brought up with the "ip" command.
Story: 2010047
Task: 49651
Change-Id: I11965bf1be278030934b4b517860bc28683a6673
Signed-off-by: M. Vefa Bicakci <vefa.bicakci@windriver.com>
Intel 4th generation Xeon Scalable Processor (Sapphire Rapids) support
has been introduced for the platform. In order to leverage the
integrated QAT device of the SP-MCC SKUs, QAT user space packages
QATzip need be integrated.
QATzip provides extended accelerated compression and decompression
services by offloading the actual compression and decompression
request(s) to the Intel® Chipset Series. Intel QATzip project
repository link is
https://github.com/intel/QATzip
Test plan:
- PASS: build test
- PASS: qzip -O 7z FILE1 FILE2 FILE3... -o result.7z
- PASS: qzip -d result.7z
- PASS: qzip -k $your_input_file -O gzipext -A deflate
Story: 2010796
Task: 48568
Change-Id: I59e62d81e40b8d062bf780c681a38bed79fb520e
Signed-off-by: Peng Zhang <Peng.Zhang2@windriver.com>
The constraints file used for tox.ini was removed. We need to
update the file to use the StarlingX Debian constraints file.
Test Plan:
PASS - Run tox command
Closes-bug: 2055734
Change-Id: I02d8d7e65cd889a24ffb6e9f9d3a5cc36a0f4248
Signed-off-by: Hugo Brito <hugo.brito@windriver.com>
This commit resolves the following error message printed out during
qat2.0.l and qatzip package builds:
dpkg-shlibdeps: warning: can't extract name and version from \
library name 'libqat_s.so'
This is caused by the lack of a version number in the "libqat_s.so" and
"libusdm_drv_s.so" shared libraries' file names as well as their soname
fields, and it is resolved by adding a placeholder version number to the
soname field of the shared libraries built by the qat2.0.l package. For
further information, please see the description of the included patch.
This commit also adds symbolic links from the non-versioned library file
names to the versioned library file names, to adhere to the shared
library conventions.
Verification:
* An ISO image was successfully built with this commit and a
cherry-picked version of the commit at the following link, and the
build logs for both the qat2.0.l-common and the qatzip packages did
not exhibit the aforementioned warning message:
https://review.opendev.org/c/starlingx/kernel/+/890744
* The "qatzip" Debian package resulting from the build automatically
included the qat2.0.l-common package in its dependencies list:
```
$ dpkg-deb -f \
/localdisk/.../std/qatzip/qatzip_1.1.2-1.stx.1_amd64.deb \
Depends
libc6 (>= 2.17), liblz4-1 (>= 0.0~r127), \
qat2.0.l-common (>= 1.0.20), zlib1g (>= 1:1.2.2)
```
* The ISO image built with this commit was installed into a
qemu/KVM-based virtual machine in All-in-One simplex low-latency mode,
and running the "cpa_sample_code" and "qzip" executables did not
result in shared library resolution-related error messages.
Furthermore, "ldd" indicated that the libraries were successfully
located by the dynamic linker. An example:
$ ldd /usr/bin/qzip | grep -e libusdm_drv_s -e libqat_s
libusdm_drv_s.so.0 => /lib/x86_64-linux-gnu/libusdm_drv_s.so.0 \
(0x00007fd6150c6000)
libqat_s.so.0 => /lib/x86_64-linux-gnu/libqat_s.so.0 \
(0x00007fd614fcf000)
Closes-Bug: 2046175
Change-Id: I2039b09be89bc75540550d94acb779a489326dce
Signed-off-by: M. Vefa Bicakci <vefa.bicakci@windriver.com>
We meet the version compatibility issue after upgrading mlnx-ofa_kernel
to 5.9. mlnx-ofa_kernel-5.5 is based on linux kernel 5.13-rc4.
mlnx-ofa_kernel-5.9 is based on linux kernel v6.0-rc5. We adapt bnxt_re
to mlnx-ofa_kernel-5.9 by referring to the following two upstream
commits and the bnxt_re-227.0.130.0 source code.
The definition of create_qp() was changed with the following commit
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.15-rc6&id=514aee660df493cd673154a6ba6bab745ec47b8c
IB_DEVICE_LOCAL_DMA_LKEY was removed with the following commit
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.0.y&id=e945c653c8e972d1b81a88e474d79f801b60213a
bnxt_re-220.0.12.0/main.c:1340:16: error: initialization of int \
(*)(struct ib_qp *, struct ib_qp_init_attr *, struct ib_udata *) \
from incompatible pointer type struct ib_qp * (*)(struct ib_pd *,\
struct ib_qp_init_attr *, struct ib_udata *) \
[-Werror=incompatible-pointer-types]
1340 | .create_qp = bnxt_re_create_qp,
| ^~~~~~~~~~~~~~~~~
bnxt_re-220.0.12.0/ib_verbs.c:160:11: error: IB_DEVICE_LOCAL_DMA_LKEY \
undeclared (first use in this function); did you mean \
IBK_LOCAL_DMA_LKEY?
160 | | IB_DEVICE_LOCAL_DMA_LKEY
| ^~~~~~~~~~~~~~~~~~~~~~~~
Note: The dependency mlnx-ofed-kernel-dev package's name in debian/rules
and debian/control files was updated to have a @KERNEL_TYPE@ suffix, to
accommodate a similar change in the packaging of the Mellanox drivers.
Verification:
- Build module success for kernel-std/kernel-rt.
- Installation of the ISO image is successful with standard and
low-latency profiles.
- Physical function interfaces are up and pass packets for rt and std.
- Create vfs, ensure that the interface can come up and pass packets.
- RDMA/Infiniband over Ethernet functionalities of the Broadcom adapters
were successfully tested using the Linux RDMA community's perftest
package.
Story: 2010958
Task: 49057
Depends-On: https://review.opendev.org/c/starlingx/kernel/+/900742
Change-Id: Ib2e597811f9289c7840fcef662d44ca6dbf26270
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
This upgrades the OFED driver related packages to the ones that are
located in https://linux.mellanox.com/public/repo/mlnx_ofed/5.9-0.5.6.0/SRPMS/
That includes rdma-core and the mlnx-tools package that
mlnx-ofa_kernel depends on, and the firmware tool mstflint.
The new versions are:
mlnx-ofa_kernel-5.9.tgz
rdma-core-59mlnx44.tgz
mstflint-4.16.1-2.tar.gz
mlnx-tools-5.2.0.tar.gz
Verification:
- Install onto a StarlingX system with two controller and two compute
nodes with network adapters Mellanox's OFED. The network adapters
of controllers are Mellanox Technologies MT27710 Family
[ConnectX-4 Lx], the network adapters of computes are Mellanox
Technologies MT27800 Family [ConnectX-5].
- Use mstflint to query the firmware on the device.
- Use mstflint to verify firmware.
- Use mstconfig to query configurations.
- Use mstvpd to dump the on-card VPD.
- Use mstregdump to dump hardware registers from Mellanox hardware.
- RDMA/Infiniband over Ethernet functionalities of the Mellanox adapters
were successfully tested using the Linux RDMA community's perftest
package.
Story: 2010958
Task: 49056
Depends-On: https://review.opendev.org/c/starlingx/kernel/+/900742
Change-Id: I7811eb10682e204225933316cd45a0ea8e84fb96
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
This upgrades the OFED driver package to the mlnx-ofa_kernel-5.9.tgz
located in https://linux.mellanox.com/public/repo/mlnx_ofed/5.9-0.5.6.0/SRPMS/
In addition, removes irq_update_affinity_hint related patch
because the fix is already included in the source code.
Reason:
The required Mellanox drivers must be upgraded to the latest version
(5.8+) to support Dell 15G and 16G platforms.
Verification:
- Build module success for kernel-std/kernel-rt.
- Build package success for rdma-core, mstflint and mlnx-tools.
- Install onto a StarlingX system with All-in-One lab with network
adapters Mellanox's OFED. The network adapters of controllers
are Mellanox Technologies MT27800 Family [ConnectX-5].
- Install onto a StarlingX system in labs with network
adapters Mellanox's OFED. The network adapters of controllers
are [ConnectX-6 DX],[ConnectX-6 LX].
- The physical function interfaces are up and pass packets for rt
and std.
- create vfs, ensure that the interface can come up and pass packets.
- RDMA/Infiniband over Ethernet functionalities of the Mellanox adapters
were successfully tested using the Linux RDMA community's perftest
package.
Story: 2010958
Task: 49055
Change-Id: I824e5e07b597e8b7cc518388a2ab93264b3f0947
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
This commit resolves the following issue in StarlingX: When running
sysbench's disk I/O workload in a pod for an extended period, the
following warning shows up in the kernel logs, the fall-out from which
can eventually trigger a kernel panic, such as the following:
------------[ cut here ]------------
WARNING: CPU: 49 PID: 0 at kernel/sched/core.c:2503 \
set_task_cpu+0x1cd/0x1e0
Modules linked in: ...
...
CPU: 49 PID: 0 Comm: swapper/49 Kdump: loaded \
Tainted: G S O \
5.10.0-6-rt-amd64 #1 Debian 5.10.177-1.stx.60
Hardware name: HPE Edgeline e920t/Edgeline e920t, BIOS H10 04/20/2023
RIP: 0010:set_task_cpu+0x1cd/0x1e0
Code: ...
<snip register dump>
Call Trace:
<IRQ>
push_rt_task.part.0+0x1bf/0x410
task_woken_rt+0x5d/0x70
ttwu_do_wakeup+0x45/0x190
try_to_wake_up+0x194/0x690
__handle_irq_event_percpu+0x86/0x1f0
? mwait_idle+0x76/0x90
handle_irq_event+0xa5/0x110
handle_edge_irq+0x93/0x290
asm_call_irq_on_stack+0xf/0x20
</IRQ>
common_interrupt+0xb3/0x130
asm_common_interrupt+0x1e/0x40
RIP: 0010:mwait_idle+0x76/0x90
Code: ...
<snip register dump>
default_idle_call+0x3b/0x150
do_idle+0x251/0x2f0
cpu_startup_entry+0x19/0x20
secondary_startup_64_no_verify+0xc2/0xcb
---[ end trace 0000000000000002 ]---
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(l->owner != current)
WARNING: CPU: 0 PID: 1069 at include/linux/local_lock_internal.h:68 \
__local_bh_enable+0x119/0x160
Modules linked in: ...
...
CPU: 0 PID: 1069 Comm: irq/1862-nvme0q Kdump: loaded \
Tainted: G S W O \
5.10.0-6-rt-amd64 #1 Debian 5.10.177-1.stx.60
Hardware name: HPE Edgeline e920t/Edgeline e920t, BIOS H10 04/20/2023
RIP: 0010:__local_bh_enable+0x119/0x160
Code: ...
<snip register dump>
Call Trace:
__local_bh_enable_ip+0x5e/0xd0
irq_forced_thread_fn+0x73/0x80
irq_thread+0x102/0x1d0
? irq_finalize_oneshot.part.0+0xe0/0xe0
? irq_thread_check_affinity+0xa0/0xa0
kthread+0x176/0x190
? __kthread_parkme+0xa0/0xa0
ret_from_fork+0x1f/0x30
---[ end trace 0000000000000003 ]---
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 4a4b49067 P4D 0
Oops: 0002 [#1] PREEMPT_RT SMP NOPTI
CPU: 0 PID: 1000 Comm: irq/1795-nvme0q Kdump: loaded \
Tainted: G S W O \
5.10.0-6-rt-amd64 #1 Debian 5.10.177-1.stx.60
Hardware name: HPE Edgeline e920t/Edgeline e920t, BIOS H10 04/20/2023
RIP: 0010:rb_erase+0x1b4/0x350
Code: ...
<snip register dump>
Call Trace:
mark_wakeup_next_waiter+0x73/0x140
rt_mutex_futex_unlock+0x60/0xb0
dma_pool_free+0xa7/0xc0
nvme_unmap_data.part.0+0x7b/0xc0 [nvme]
nvme_pci_complete_rq+0x45/0xc0 [nvme]
nvme_process_cq+0x173/0x290 [nvme]
? irq_thread_fn+0x60/0x60
nvme_irq+0x10/0x20 [nvme]
irq_forced_thread_fn+0x2e/0x80
irq_thread+0x102/0x1d0
? irq_finalize_oneshot.part.0+0xe0/0xe0
? irq_thread_check_affinity+0xa0/0xa0
kthread+0x176/0x190
? __kthread_parkme+0xa0/0xa0
ret_from_fork+0x1f/0x30
Modules linked in: ....
...
CR2: 0000000000000000
In addition to the kernel panic mentioned above, we have also observed
three cases (one of which was locally reproduced) where the system under
test becomes unresponsive after a number of kernel warnings, starting
with the first warning quoted above. Reviewing the vmcore file generated
via the use of the magic system request key (while the system was
unresponsive) indicated that an NVMe-related IRQ thread may have been
migrated incorrectly while the thread had disabled migration.
We cherry-pick commit feffe5bb274d ("sched/rt: Fix bad task migration
for rt tasks") to resolve the aforementioned issues. These issues are
caused by a commit that was cherry-picked to the PREEMPT_RT kernel by
upstream:
The "fixed" commit was inherited by StarlingX from linux-yocto's
v5.10/standard/preempt-rt/base branch:
* commit ad592ffad5a7 ("sched,rt: Use the full cpumask for balancing")
https://git.yoctoproject.org/linux-yocto/commit/?h=ad592ffad5a7
That commit in turn was likely inherited from the linux-stable-rt
project's rt-stable/v5.10-rt branch:
* commit 0523fce6f661 ("sched,rt: Use the full cpumask for balancing")
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/?id=0523fce6f661
And that commit was cherry-picked from the following mainline commit
(a.k.a., v5.11-rc1~7^2~30^2~5):
* commit 95158a89dd50 ("sched,rt: Use the full cpumask for balancing")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=95158a89dd50
Unfortunately, the aforementioned commit introduced the bug we describe
above, which necessitated the following mainline bug-fix commit (a.k.a.
v6.4-rc1~94^2~1), which we are applying to the StarlingX kernel with
this commit:
* commit feffe5bb274d ("sched/rt: Fix bad task migration for rt tasks")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=feffe5bb274d
Verification confirming that the issue is fixed, carried out with a
StarlingX-based distribution using the v5.10.177 PREEMPT_RT kernel:
* Different manifestations of the same issue were reproduced on two
All-in-One Simplex systems installed with the low-latency profile
(i.e., the PREEMPT_RT kernel) which were running the sysbench workload
in a loop.
* One server reproduced the "set_task_cpu" warning followed by the
kernel panic quoted above, after about ~9 hours running the workload.
At the time, we were not aware of the relationship of the
"set_task_cpu" warning and the unresponsive system scenario. We also
noticed that the panic occurred while the sysbench workload was disk
I/O-intensive.
* We prepared a designer patch, incorporating an earlier version of this
commit with the same cherry-picked mainline commit, and we installed
the designer patch onto the same server. The sysbench workload was
modified so that only the disk I/O-intensive parts of the workload
were repeatedly executed. The workload was kept running for more than
seven days, and the "set_task_cpu" warning was not encountered during
this time frame.
* Concurrently, near the end of the sixth day of the tests running with
the designer patch applied, another server (that was *not* patched)
reproduced the "set_task_cpu" warning followed by the "unresponsive
system" scenario after about 40 hours of uptime, during the most of
which the workload was running. Based on a review of the vmcore file
(as discussed above) we connected the two issues (unresponsive system
and kernel panic).
Verification carried out for due diligence, with a distribution based on
the StarlingX master branch using the v5.10.198 PREEMPT_RT kernel:
* An ISO image was successfully built with this commit (via "build-pkgs
--reuse"), and the resulting ISO image was installed into a
qemu/KVM-based virtual machine in All-in-One Simplex and low-latency
configuration. The installation was Ansible bootstrapped successfully.
* We should note that we did encounter one issue after the Ansible
bootstrap autonomously/expectedly rebooted the system. About 20
minutes of uptime after the reboot, two "dockerd" tasks and one "sync"
task were reported by the kernel to be blocked for more than 122
seconds, with backtraces implicating file system sync operations. We
currently believe that this issue is related to disk I/O contention on
the host running the virtual machine, and that this issue is not
related to this commit.
Closes-Bug: 2043023
Change-Id: Ifd186473c33d221a1e3e51c44edd7325f59a7c7f
Signed-off-by: M. Vefa Bicakci <vefa.bicakci@windriver.com>
Problem:
The Dell R750 will hang after the following command being executed:
$sudo -i /bin/bash -c 'echo b > /proc/sysrq-trigger'
This issue can be reproduced almost within 5 times testing cycle.
The activated controller will send reboot command to mtcClient on the
standby controller due to the SM failure(heartbeat missed), and then
mtcClient tries to reboot the system gracefully. But if the standby
controller isn't rebooted within 120s, mtcClient tries to force reboot
it using the following command "echo b > /proc/sysrq-trigger".
Unfortunately the machine Dell PowerEdge R750 is stuck and the BMC
console doesn't show anything.
Solution:
After searching if there is any revelant clues about this machine,
nothing was found but the kernel parameter 'reboot=p' to change the
reboot type to pci_reboot for the sysrq magic key. With doing the test
cycle multiple times, and the issue has been gone with the kernel
option. The behavior that the system can reboot properly is expected.
So this way should be helpful for the Dell R750 reset.
Considering this kernel option should not be applicable to all target
machines, we just adjust the method to change reboot type for R750
machine based on DMI table quirk. The other kind of machine still uses
the default reboot type, and this commit just affects the R750 machine.
Base on the above, we add the pci reboot quirk in DMI table to change
the reboot_type to pci_reboot to make sure the kernel On Dell PowerEdge
R750 reboot properly.
On the R750 target we can see the following dmidecode information:
$sudo dmidecode |grep 'Product Name'
Product Name: PowerEdge R750
$sudo dmidecode |grep 'Vendor'
Vendor: Dell Inc.
TestPlan:
PASS: downloader && build-pkgs && build-image
PASS: Jenkins Installation on R750 machine and the other labs.
PASS: Execute the following testing cycle more than 20 times:
$sudo -i /bin/bash -c 'echo b > /proc/sysrq-trigger'
The system can reboot properly every time during test cycles.
The stuck issue after reset hasn't been seen anymore.
Closes-Bug: 2041606
Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>
Change-Id: I05467cc6d5105aa813852dca0c935278741b043f
This commit updates the default Intel NIC driver bundle version of the
iavf driver from v4.5.3 to v4.5.3.2 to resolve an issue involving system
hangs after the following messages are printed out by the iavf driver:
```
iavf 0000:51:11.0: Failed to init adminq: -53
iavf 0000:51:11.0: failed to allocate resources during reinit
```
This is reproduced with the following commands on iavf-4.5.3, which
carry out rapid virtual function (VF) interface resets:
```
while true; do
# enp81s17 is the first VF interface
ip l set dev enp81s17 up;
# enp81s0f2 is the corresponding PF interface
ip l set dev enp81s0f2 vf 0 trust on;
ip l set dev enp81s0f2 vf 0 vlan 333;
ip l set dev enp81s0f2 vf 0 trust off;
ip l set dev enp81s0f2 vf 0 vlan 310;
ip l set dev enp81s17 down;
sleep 0.1 ;
done
```
Eventually, iavf reports the aforementioned error messages, and the VF
bring down operation hangs. This is followed by the hang of many
unrelated processes, likely due to the "rtnl" mutex.
This commit updates iavf from v4.5.3 to v4.5.3.2 to resolve this issue
and other issues that Intel has recommended to fix. Please note that
this version of the iavf driver is found in the "unsupported" directory
on Intel's Sourceforge project for NIC drivers, despite Intel having
recommended this version of the iavf driver to fix the reported issue.
This is how Intel provides fixed intermediate versions of their older
NIC drivers on Sourceforge. Furthermore, this version of iavf has gone
through testing by Intel as well as by the StarlingX community, despite
the driver having been declared as an "unsupported" version by Intel.
The corresponding mainline commits are as follows, but note that the
changes in iavf 4.5.3.2 are only loosely based on these commits, due to
the divergence between the out-of-tree and mainline versions of the iavf
source code:
* Commit 31071173771e ("iavf: Fix reset error handling")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=31071173771e
(This is the commit that resolves the issue the user in question has
encountered.)
* Commit c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger
it")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2ed2403f12c
* Commit 7598f4b40bd6 ("iavf: Move netdev_update_features() into
watchdog task")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7598f4b40bd6
The iavf driver versions belonging to other Intel NIC driver bundle
versions are not updated due to the following reasons:
- intel-iavf-cvl-2.54: We do not yet know if this version of iavf
(v4.0.1) is affected by this issue. The user reporting the issue fixed
by this commit is currently using iavf v4.5.3, and we have not
received field reports regarding a similar issue encountered with iavf
v4.0.1.
- intel-iavf-cvl-4.10: This version of iavf (v4.6.1) is not affected by
this issue, as the changes included in iavf v4.5.3.2 were backported
by Intel from iavf v4.6.1.
Verification
- The following command with this commit results in a successful iavf
kernel module build for standard and PREEMPT_RT kernels:
build-pkgs -c -p iavf
- A StarlingX ISO image from 2023-09-28 was installed onto an All-in-One
Duplex Dell XR11 lab with one quad-port Intel E810 NIC per server in
low-latency mode (i.e., with the PREEMPT_RT kernel).
- The issue was reproduced using a script similar to the one depicted at
the beginning of this commit message. We should note that the issue
manifests itself usually within ~200 iterations of the loop.
- Afterwards, in a StarlingX build environment, the kernel and all of
the kernel modules were built with this commit from scratch. The
resulting *.deb files were copied to controller-1 of the StarlingX
installation and converted into a "sneaky" designer patch with a
customized version of the "sneaky_patch.py" script, the original
version of which is available in StarlingX.
- The resulting designer patch was successfully applied onto
controller-0 of the aforementioned StarlingX ISO image installation.
Afterwards, it was confirmed that the iavf driver version changed from
4.5.3 (prior to the designer patch) to 4.5.3.2 (after the application
of the designer patch).
- Afterwards, a shell script based on the snippet quoted above was
executed for 4000 iterations of the loop, without the reproduction of
the original issue.
- Furthermore, basic tests with iavf-managed VF interfaces were carried
out, involving creating two network namespaces on controller-0,
assigning one iavf-managed VF interface to each network namespace, and
finally, running iperf3 across the VF interfaces, from within the
network namespaces.
Closes-Bug: 2037692
Change-Id: I75415e5668b002b91c2208bff081775c9eced083
Signed-off-by: M. Vefa Bicakci <vefa.bicakci@windriver.com>
Remove the kernel abiname/version from Build-Depends in OOT kernel
modules. After commit <Add pkgs without abiname for image/headers>
the new dependency is as this:
linux-kbuild-5.10 is depended by linux-headers-5.10.0-6-amd64;
linux-headers-5.10.0-6-amd64 is depended by linux-headers-stx-amd64.
Package linux-keys-5.10 is renamed to linux-keys.
Then the version numbers and abiname can be completely removed from
the Build-Depends of OOT kernel modules' codes.
Similar is done for RT kernel modules.
This is a preparation for kernel upgrading with major version.
Test plan:
PASS: Build all the packages and iso successfully.
PASS: The rt/std installations are fine for both qemu and lib.
PASS: No warning appears for insmod/modprobe.
Depends-On: https://review.opendev.org/c/starlingx/kernel/+/896187
Story: 2010643
Task: 48815
Signed-off-by: Li Zhou <li.zhou@windriver.com>
Change-Id: I860a751cf4c11f64c81877714ecddb10b488fa96
Add 2 packages linux-image-stx-amd64/linux-headers-stx-amd64
which don't have abiname in their names. They depend on packages with
abiname in names. Then we can use these 2 packages in anywhere
that involves image/headers packages (e.g. Build-Depends/yaml config
and so on). When the abiname is changed later in any kernel upgrading
we don't need change above places involved any more.
We don't use the linux-image-amd64/linux-headers-amd64 as Debian does
because they are built by linux-signed-amd64, and coupled with signed
kernel. We don't follow Debian's signing process so we create 2 new
packages which are coupled with unsigned image.
BTW, rename package "linux-keys-@version@" to "linux-keys" because the
"@version@" isn't necessary for this package. Then the version numbers
can be completely removed from the Build-Depends of OOT kernel
modules' codes.
All of above are done on rt kernel too.
This is a preparation for kernel upgrading with major version.
Test plan:
PASS: 2 new pkgs linux-image-stx-amd64/linux-headers-stx-amd64 can be
built successfully for linux.
PASS: 2 new pkgs linux-rt-image-stx-amd64/linux-rt-headers-stx-amd64
can be built successfully for linux-rt.
Story: 2010643
Task: 48815
Signed-off-by: Li Zhou <li.zhou@windriver.com>
Change-Id: I63f968d3b24728b2b5b08e889c26c3c4f6a0e1df
mirror.starlingx.cengn.ca no longer exists. CENGN is kindly forwarding
requests to the new location mirror.starlingx.windriver.com for now, but
that will only last a few months. We need to replace all the references
with the new URL.
I will also remove as many 'cengn' references as possible, replacing
them with 'stx_mirror'
Partial-Bug: 2033555
Change-Id: I250f7aff90f71ea67b1502c21b4e914ba682946c
Signed-off-by: Scott Little <scott.little@windriver.com>
When the system is stressed running pods on isolated cores (using
stress-ng for instance [1]) and the Power Metrics App [2] is also
being executed, the system hangs.
[1] https://github.com/ColinIanKing/stress-ng
[2] https://opendev.org/starlingx/app-power-metrics
Dmesg shows the following output:
WARNING: CPU: 16 PID: 207561 at
kernel/events/core.c:868 perf_cgroup_switch+0x222/0x230
RIP: 0010:perf_cgroup_switch+0x222/0x230
Call Trace:
? __warn+0x79/0xc0
? perf_cgroup_switch+0x222/0x230
? report_bug+0x9e/0xc0
? handle_bug+0x41/0x90
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x12/0x20
? perf_cgroup_switch+0x222/0x230
? perf_cgroup_switch+0xff/0x230
__perf_event_task_sched_in+0x169/0x330
? __perf_event_task_sched_out+0x27c/0x6d0
? newidle_balance+0x3fd/0x480
finish_task_switch.isra.0+0x118/0x4b0
__schedule+0x2ae/0x930
? hrtimer_start_range_ns+0x2fc/0x420
schedule+0xa7/0x110
do_nanosleep+0x7c/0x1a0
hrtimer_nanosleep+0x9b/0x140
? __hrtimer_init+0xe0/0xe0
__x64_sys_nanosleep+0xad/0xe0
do_syscall_64+0x30/0x40
entry_SYSCALL_64_after_hwframe+0x61/0xc6
There is an upstream patch set that fix a race condition on
perf_cgroup_switch. Applying these patches into stx kernel solved the
issue.
* commit a0827713e298
("perf/core: Don't pass task around when ctx sched in")
(v5.18-rc2~8^2~3)
* commit 6875186aea5c
("perf/core: perf/core: Use perf_cgroup_info->active to check if
cgroup is active") (v5.18-rc2~8^2~2)
* commit 96492a6c558a
("perf/core: Fix perf_cgroup_switch()") (v5.18-rc2~8^2~1)
* commit e19cd0b6fa59
("perf/core: Always set cpuctx cgrp when enable cgroup event")
(v5.18-rc2~8^2)
Note: It was verified that are no "fixes" commits from mainline kernel
to the commits mentioned above
Test plan:
PASS: Build iso success for rt and std.
PASS: Install success onto a AIO-SX lab with both rt and std kernel.
PASS: Apply power-metrics app, launch stress pods and confirm the
system is stable.
Closes-Bug: 2035124
Change-Id: I30fcb63e4564a23cdb26794f4dfefa748eaa0cee
Signed-off-by: Alyson Deives Pereira <alyson.deivespereira@windriver.com>
This commit prevents the ice and iavf drivers from causing kernel
panics when forced reboot is initiated with "reboot -f".
Issue #1: iavf driver
If the netdev pointer is NULL, then iavf_remove() returns early to
ensure that it does not proceed with an already-freed netdev instance.
However, drvdata field of the iavf driver's pci_dev structure continues
to keep the former value of the netdev pointer, and this value can be
acquired from the pci_dev structure via pci_get_drvdata(). This causes
a kernel panic when a forced reboot/shutdown is in progress due to the
following sequence of events:
- The iavf_shutdown() callback is called by the kernel. This function
detaches the device, brings it down if it was running and frees
resources.
- Later, the associated PF driver's shutdown callback is called:
ice_shutdown(). That callback calls, among others, sriov_disable(),
which then indirectly calls iavf_remove() again.
- Kernel WARNING is reported because the work adminq_task->func is NULL
in cancel_work_sync(&adapter->adminq_task) during iavf_remove(), that
reason is the resource already had been freed in the first
iavf_remove() running stage.
"WARNING: CPU: 63 PID: 93678 at kernel/workqueue.c:3047
__flush_work.isra.0+0x6b/0x80"
The patch for iavf resolves this issue by checking the pci_dev
structure's is_busmaster field at the beginning of iavf_remove(). If the
PCI device had already been disabled by an earlier call to
iavf_shutdown() or iavf_remove(), via a call to pci_disable_device(),
then the is_busmaster field would be set to zero. Based on this logic,
if the is_busmaster field is set to zero, then the iavf_remove function
returns early. This in turn avoids the aforementioned kernel panic
caused by multiple calls to iavf_remove().
Note that the description above is applicable to iavf-4.6.1 (in NIC
driver bundle cvl-4.10); however, a similar issue occurs in earlier
versions of the iavf driver as well, which necessitates the same fix.
Issue #2: ice driver
When the system is rebooted, then the PTP-related resources are released
by the ice driver's ice_remove() function before the irq_msix_misc
interrupt is disabled. However, the interrupt handler continues to use
these resources, and when the interrupt in question occurs, then a
kernel panic occurs.
This issue is fixed by disabling the irq_msix_misc interrupt before the
call to ice_ptp_release() in ice_remove().
Please note that colleagues at Intel have reviewed the fixes included in
this commit, and they have confirmed that these changes could be used as
a temporary workaround for now. The changes introduced by this commit
can be reverted once Intel resolves the aforementioned issues in the
official ice and iavf driver releases.
This issue can be reproduced with the below steps.
1. Installed sts-silicom app.
2. Make sure sts-silicom must be running status.
3. reboot -f
Verification:
- build-pkgs; build-iso; install and boot up on aio-sx lab.
- The issue can not be reproduced after the fix with the up reproduced
steps.
Closes-Bug: 2030725
Change-Id: Ib296dc3180023230c46aa028a7d7c4283b17cff0
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
Intel 4th generation Xeon Scalable Processor (Sapphire Rapids) support
has been introduced for the platform. In order to leverage the
integrated QAT device of the SP-MCC SKUs, QAT driver need to be
upgraded to version 2.0.
To upgrade to version 2.0, the following items have been done:
1. Update qat related patches for code context change;
2. Update control, rules and such things for QAT version 2.0.
Test plan:
- PASS: build-pkgs -a && build-image
- PASS: lsmod |grep qat
- PASS: /etc/init.d/qat_service status/start/stop
- PASS: ./cpa_sample_code
Story: 2010796
Task: 48248
Change-Id: I1cba1660a13d1f28eee2b35713a54d31c048c609
Signed-off-by: Peng Zhang <Peng.Zhang2@windriver.com>
This commit resolves the Zuul/tox failures encountered when running
sphinx to generate documentation, which in turn prevents merging changes
that are otherwise fine:
```
docs: 350 W commands[1]> sphinx-build -a -E -W -d doc/build/doctrees \
-b html doc/source doc/build/html [tox/tox_env/api.py:427]
Running Sphinx v6.2.1
Warning, treated as error:
Invalid configuration value found: 'language = None'. Update your \
configuration to a valid language code. Falling back to 'en' \
(English).
docs: 723 C exit 2 (0.37 seconds) \
/home/zuul/src/opendev.org/starlingx/kernel> \
sphinx-build -a -E -W -d doc/build/doctrees -b html doc/source \
doc/build/html pid=1720 [tox/execute/api.py:279]
docs: FAIL code 2 (0.44=setup[0.07]+cmd[0.00,0.37] seconds)
evaluation failed :( (0.54 seconds)
```
This issue was fixed for another StarlingX repository (tools) with
https://review.opendev.org/c/starlingx/tools/+/893165
from which this commit is inspired.
The issue is related to a Sphinx update that requires the language
parameter to be specified:
https://github.com/sphinx-doc/sphinx/issues/10062https://github.com/sphinx-doc/sphinx/issues/10474
Partial-Bug: 1976377
Partial-Bug: 2033431
Change-Id: Ic20fec5145b1a4ddb12051f018614562a4773b95
Signed-off-by: M. Vefa Bicakci <vefa.bicakci@windriver.com>
Under PREEMPT_RT, __put_task_struct() indirectly acquires sleeping
locks. Therefore, it can't be called from an non-preemptible context.
Instead of calling __put_task_struct() directly, we defer it using
call_rcu(). A more natural approach would use a workqueue, but since
in PREEMPT_RT, we can't allocate dynamic memory from atomic context,
the code would become more complex because we would need to put the
work_struct instance in the task_struct and initialize it when we
allocate a new task_struct.
We met 5 same panics, __put_task_struct is called during the process
holding a lock that caused the kernel BUG_ON. The below is the call
trace.
We also need cherry pick the following commits, because the necessary
context is not in 5.10.18x, such as there is not definition
DEFINE_WAIT_OVERRIDE_MAP.
* commit 5f2962401c6e
("locking/lockdep: Exclude local_lock_t from IRQ inversions")
* commit 175b1a60e880
("locking/lockdep: Clean up check_redundant() a bit")
* commit bc2dd71b2836
("locking/lockdep: Add a skip() function to __bfs()")
* commit 0cce06ba859a
("debugobjects,locking: Annotate debug_object_fill_pool() wait type
violation")
kernel BUG at kernel/locking/rtmutex.c:1331!
invalid opcode: 0000 [#1] PREEMPT_RT SMP NOPTI
......
Call Trace:
rt_spin_lock_slowlock_locked+0xb2/0x2a0
? update_load_avg+0x80/0x690
rt_spin_lock_slowlock+0x50/0x80
? update_load_avg+0x80/0x690
rt_spin_lock+0x2a/0x30
free_unref_page+0xc5/0x280
__vunmap+0x17f/0x240
put_task_stack+0xc6/0x130
__put_task_struct+0x3d/0x180
rt_mutex_adjust_prio_chain+0x365/0x7b0
task_blocks_on_rt_mutex+0x1eb/0x370
rt_spin_lock_slowlock_locked+0xb2/0x2a0
rt_spin_lock_slowlock+0x50/0x80
rt_spin_lock+0x2a/0x30
free_unref_page_list+0x128/0x5e0
release_pages+0x2b4/0x320
tlb_flush_mmu+0x44/0x150
tlb_finish_mmu+0x3c/0x70
zap_page_range+0x12a/0x170
? find_vma+0x16/0x70
do_madvise+0x99d/0xba0
? do_epoll_wait+0xa2/0xe0
? __x64_sys_madvise+0x26/0x30
__x64_sys_madvise+0x26/0x30
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Verification:
- build-pkgs; build-iso; install and boot up on aio-sx lab.
- Can not reproduce the isue during the stress-ng test for almost 24 hours.
while true; do sudo stress-ng --sched rr --mmapfork 23 -t 20; done
while true; do sudo stress-ng --sched fifo--mmapfork 23 -t 20; done
Closes-Bug: 2031597
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
Change-Id: If022441d61492eaec88eede8603a6cb052af99d1
This commit cherry-picks commits from the mainline kernel to improve
Sapphire Rapids CPU support in the following components of the StarlingX
kernel: intel_idle, perf/x86/RAPL and powercap, and perf/x86/cstate.
(RAPL stands for "Running Average Power Limit", which is a CPU feature
for measuring and limiting power consumption.)
These improvements are required to support a new power metrics
application in StarlingX, which is intended to work with Sapphire Rapids
CPUs: https://opendev.org/starlingx/app-power-metrics
The following commits are cherry-picked as part of this effort, in
chronological order, organized by component:
=> intel_idle
* commit 9edf3c0ffef0
("intel_idle: add SPR support")
(v5.18-rc1~203^2~3^3~5)
* commit da0e58c038e6
("intel_idle: add 'preferred_cstates' module argument")
(v5.18-rc1~203^2~3^3~4)
* commit 3a9cf77b60dc
("intel_idle: add core C6 optimization for SPR")
(v5.18-rc1~203^2~3^3~3)
* commit 39c184a6a9a7
("intel_idle: Fix the 'preferred_cstates' module parameter")
(v5.18-rc5~22^2^2~1)
* commit 7eac3bd38d18
("intel_idle: Fix SPR C6 optimization")
(v5.18-rc5~22^2^2)
* commit 1548fac47a11
("intel_idle: make SPR C1 and C1E be independent")
(v6.0-rc1~184^2~2^2^2)
=> perf/x86/rapl + powercap
* commit ffb20c2e52e8
("perf/x86/rapl: Add msr mask support")
(v5.12-rc1~146^2~3)
* commit b6f78d3fba7f
("perf/x86/rapl: Only check lower 32bits for RAPL energy counters")
(v5.12-rc1~146^2~2)
* commit 838342a6d6b7
("perf/x86/rapl: Fix psys-energy event on Intel SPR platform")
(v5.12-rc1~146^2~1)
* commit 931da6a0de5d
("powercap: intel_rapl: support new layout of Psys PowerLimit Register
on SPR")
(v5.17-rc1~167^2^4^2~1)
* commit 80275ca9e525
("perf/x86/rapl: Use standard Energy Unit for SPR Dram RAPL domain")
(v6.1-rc4~3^2~3)
=> perf/x86/cstate
* commit 87bf399f86ec
("perf/x86/cstate: Add ICELAKE_X and ICELAKE_D support")
(v5.14-rc1~7^2~1)
* commit 528c9f1daf20
("perf/x86/cstate: Add SAPPHIRERAPIDS_X CPU support")
(v5.18-rc4~3^2)
The set of commits listed above is a reduced version of a slightly
larger superset of commits we had originally considered for
cherry-picking. We opted for the commits listed above to limit potential
impact on the StarlingX kernel by focusing on Sapphire Rapids support
and direct dependencies only.
We should note that we encountered a number of merge conflicts while
cherry-picking these commits; however, none of the merge conflict
resolutions required significantly altering the modifications made by
the original commits. The individual patch files denote the nature of
the merge conflicts.
Verification:
* The kernel recipes and all kernel modules were built from scratch with
this commit, using the following command in a StarlingX build
environment:
$ build-pkgs -c -p linux,linux-rt,bnxt-en,i40e,i40e-cvl-2.54,\
i40e-cvl-4.10,iavf,iavf-cvl-2.54,iavf-cvl-4.10,ice,ice-cvl-2.54,\
ice-cvl-4.10,igb-uio,iqvlinux,kmod-opae-fpga-driver,mlnx-ofed-kernel,\
octeon-ep,qat1.7.l
These packages were further packaged into a StarlingX (ostree) patch
for easier deployment.
* An Ansible-bootstrapped low-latency All-in-One simplex StarlingX
set-up was prepared on a server with a Sapphire Rapids CPU.
* The ostree patch was installed onto the server to start testing our
changes. The kernel was confirmed to boot up as expected.
* We enabled RAPL Psys domain reporting the server's BIOS (originally
disabled), and we also disabled the BIOS-enforced limit on the CPU
*package* C-states (originally set to C0/C1).
* We forcibly removed the "intel_idle.max_cstate=0" kernel command line
argument by modifying the sysinv daemon's Python source code on the
server (with a systemd service that bind-mounts a replacement *.py
file, to avoid another ostree patch). This was required to prevent the
intel_idle driver from disabling itself, so that we could confirm the
sanity of the cherry-picked commits.
* The following tests were carried out, first with the patched
preempt-rt kernel, and next with the original unpatched preempt-rt
kernel:
* Confirm that the intel_idle CPU idling driver is active:
$ cat /sys/devices/system/cpu/cpuidle/current_driver
* Confirm the CPU idling state names and parameters:
$ grep -s '^' \
/sys/devices/system/cpu/cpu0/cpuidle/state[0-9]*/\
{name,desc,time,latency,residency}
* Confirm that the RAPL/powercap and C-state related performance
monitor unit (PMU) counters are usable by the kernel and with perf:
$ sudo perf list
* Confirm that the CPU and package C-state residency counters are
working:
$ perf stat -a \
-e cstate_core/c1-residency/ -e cstate_core/c6-residency/ \
-e cstate_pkg/c2-residency/ -e cstate_pkg/c6-residency/ \
-- sleep 5
* Confirm that RAPL/powercap-related performance counters are working:
$ perf stat -a \
-e power/energy-pkg/ -e power/energy-ram/ -e power/energy-psys/ \
-- sleep 5
With the unpatched kernel, we observed that the intel_idle driver used
CPU idling information exposed by the ACPI tables, with the following
idle state names: POLL, C1_ACPI, C2_ACPI. With the patched kernel the
C-state tables embedded in the intel_idle driver were used as
expected, with the following idle state names: POLL, C1, C1E, C6.
With the unpatched kernel, we observed that the CPU/package C-state
residency counters were not detected, whereas they were detected with
the patched kernel, as expected.
With both the unpatched and the patched kernels, the RAPL/powercap
related performance counters were detected. We observed that the units
for the DRAM domain were incorrect for the unpatched kernel, which was
expected due to the lack of commit 80275ca9e525 ("perf/x86/rapl: Use
standard Energy Unit for SPR Dram RAPL domain").
* To confirm the sanity of our results acquired with the patched kernel
in the previous step, we also carried out the following experiment
with the v6.4.3-rt6 kernel available in the linux-yocto repository as
commit 917d160a84f6 ("Merge branch 'v6.4/standard/base' into
v6.4/standard/preempt-rt/base") in the "v6.4/standard/preempt-rt/base"
branch.
The "notification of death" StarlingX kernel patch was forward-ported
to the v6.4.3-rt6 kernel and the "kernel.sched_nr_migrate" sysctl was
reintroduced to make this kernel work with the aforementioned
Ansible-bootstrapped StarlingX system.
Furthermore, to ensure that the RAPL/powercap features are aligned to
the most recent mainline kernel version, we cherry-picked the
following commits from v6.5-rc1 onto the v6.4.3-rt6 kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=range&q=44c026a73be8..49776c712eb6
Afterwards, this v6.4.3-rt6-based test kernel was built and installed
onto the test server, and test procedures discussed in the previous
step were repeated.
Compared to the patched StarlingX v5.10 kernel, we observed that the
RAPL/powercap measurements were similar, and the CPU and package
C-state residency counters were not extremely different with the
v6.4.3-rt6-based test kernel.
* We should note that we have repeated tests with the patched StarlingX
v5.10 kernel as well, but we did not reinstall the system to acquire a
standard/non-low-latency set-up. Instead, we opted for running the
following command, rebooting the system into the standard kernel,
followed by repeating the test procedures, which had similar results.
sudo grub-editenv /boot/1/kernel.env set kernel=vmlinuz-5.10.0-6-amd64
Acknowledgements:
* Thanks to Alyson Deives Pereira for his extensive help in pruning the
commits that we had originally thought of cherry-picking with this
commit.
* Thanks to Mark Asselstine for his advice on the second phase of the
commit pruning activity.
Story: 2010773
Task: 48449
Change-Id: Ibe6bff65e8a415ac027a5d493a0e65fe58c9e344
Signed-off-by: M. Vefa Bicakci <vefa.bicakci@windriver.com>
Octeon-ep debian package includes PF, VF and virtual PHC device drivers.
These drivers are loaded as part of octeon operator deployment.
This commit includes release 11.23.04.
Test Plan:
Pass: build-pkgs -p octeon-ep
Pass: build-image
Tested and verified on hardware.
Story: 2010047
Task: 48033
Change-Id: Id505d96e190dd95e9200d2134142a17174712b64
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
This change enables support for the Intel RAPL (Running Average Power
Limit) technology via MSR interface, which allows power limits to be
enforced and monitored on modern Intel processors, and the
intel-uncore-frequency driver, which allows control of Uncore
frequency limits on supported server platforms.
This is achieved by enabling the following kernel configuration items:
- CONFIG_POWERCAP
- CONFIG_INTEL_RAPL
- CONFIG_INTEL_RAPL_CORE
- CONFIG_INTEL_UNCORE_FREQ_CONTROL
This change also adds intel-uncore-frequency support for Sapphire
Rapids processor by including the following upstream kernel commit:
* commit 60accc011af0 (v5.12-rc1~123^2~41)
("platform/x86/intel-uncore-freq: Add Sapphire Rapids server support")
TEST PLAN:
PASS: Build iso success for rt and std.
PASS: Install success onto a AIO-DX lab with both rt and std kernel.
PASS: The following kernel modules are successfully enabled on a
Sapphire Rapids server, on both std and rt kernels:
- sudo modprobe rapl
- sudo modprobe msr
- sudo modprobe intel_rapl_common
- sudo modprobe intel_rapl_msr
- sudo modprobe intel-uncore-frequency
Story: 2010773
Task: 48304
Change-Id: I1b618f65483c657d8c936f6f8494a8611ab09e70
Signed-off-by: Alyson Deives Pereira <alyson.deivespereira@windriver.com>
This commit updates kernel to 5.10.180 to fix following CVE issue:
CVE-2023-32233: https://nvd.nist.gov/vuln/detail/CVE-2023-32233
CVE-2023-31436: https://nvd.nist.gov/vuln/detail/CVE-2023-31436
CVE-2023-2513: https://nvd.nist.gov/vuln/detail/CVE-2023-2513
CVE-2023-1859: https://nvd.nist.gov/vuln/detail/CVE-2023-1859
CVE-2023-34256: https://nvd.nist.gov/vuln/detail/CVE-2023-34256
One of our source patches requires refresh against the new kernel
source. It was deleted for content has been contained in the new
kernel:
xfs-drop-submit-side-trans-alloc-for-append-ioends.patch
Verification:
- Build kernel and out of tree modules success for rt and std.
- Build iso success for rt and std.
- Install success onto a AIO-DX lab with rt kernel.
- Boot up successfully in the lab.
- The sanity testing was done by our test team and no regression
defect was found.
- The cyclictest benchmark was also run on the starlingx lab, the
result is "samples: 259200000 avg: 1660 max: 10167 99.9999th
percentile: 2527 overflows: 0", It is not big difference with
5.10.177 for avg and max.
Closes-Bug: 2021927
Change-Id: Ia676889d752715dc404132ed66e2f2ddb7d17d62
Signed-off-by: Peng Zhang <Peng.Zhang2@windriver.com>
As there is no plan to support livepatch feature in StarlingX community,
now we drop the packages and components of this feature.
Since this component depends on the userspace tools kpatch, so this
patch should be merged first(Another commit in integ repo will depend on
this).
TestPlan:
PASS: build-pkgs -a
PASS: build-image
PASS: Jenkins installation.
Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>
Change-Id: Iaf96daddab40f87d6155333eae7d1780a3696764
The following warning message is printed out starting with kernel
version 5.10.105 in response to a newer Spectre-type security issue:
Spectre V2: WARNING: Unprivileged eBPF is enabled with eIBRS on, data
leaks possible via Spectre v2 BHB attacks!
This message is printed out when Spectre v2 mitigations are enabled and
unprivileged eBPF is enabled.
This warning message was introduced with commit afc2d635b5e1
("x86/speculation: Include unprivileged eBPF status in spectre v2
mitigation reporting") in the Linux stable team's linux-5.10.y branch.
The first tag that includes this change in that branch is "v5.10.105".
This commit sets the "CONFIG_BPF_UNPRIV_DEFAULT_OFF" Kconfig option to
suppress the aforementioned warning message. Note that unprivileged eBPF
is disabled by default in most distributions. Disabling unprivileged
eBPF is recommended as a (partial) mitigation against attack primitives
known as Spectre-v2-BHB ("Spectre v2 aided by the Branch History
Buffer"), as documented at the following links:
- https://www.vusec.net/projects/bhi-spectre-bhb/
- https://www.intel.com/content/www/us/en/developer/articles/\
technical/software-security-guidance/technical-documentation/\
branch-history-injection.html
Also note that if unprivileged eBPF is re-enabled at runtime via
"sysctl" or by writing to "/proc/sys/kernel/unprivileged_bpf_disabled",
then the warning message in question will appear in the kernel logs.
Verification:
- On a test system, remove the kernel command line argument
'nospectre_v2' from "/boot/efi/EFI/BOOT/boot.env", and save the file.
- Reboot.
- With this commit, the following warning message does not appear in the
kernel's logs, as confirmed with "dmesg | grep eIBRS": "Spectre V2:
WARNING: Unprivileged eBPF is enabled with eIBRS on, data leaks
possible via Spectre v2 BHB attacks!". (Without this commit, the
aforementioned warning message would appear in the kernel logs.)
Closes-Bug: 2019268
Signed-off-by: Haiqing Bai <haiqing.bai@windriver.com>
Change-Id: I03d9ef494384c52cd4d81d02d8c76cd0fef6edb5
This reverts commit ce36f16cd721b2b267d8658492f6eid3a6f64c81b.
The legacy ice driver was upgraded from v1.5.8 to v1.5.8.2 to fix
Bug-2016445. The fixed issue is a memory corruption bug that affects
use cases involving the ice driver and virtual function (VF)
interface reset operations, to the best of our current knowledge.
Despite efforts to validate ice driver v1.5.8.2, concerns remained that
the upgrade might negatively affect deployed systems. Hence, this commit
downgrades the legacy ice driver from v1.5.8.2 back to v1.5.8.
Verification:
- ethtool and confirm the version is 1.5.8
- links come up and pass traffic
Closes-Bug: 2019769
Signed-off-by: Jiping Ma <Jiping.ma2@windriver.com>
Change-Id: Ie25ceab4b8c677f15ce589dab207eb0b696b912f
This ports the Redhat feature forward from the 3.10 kernel version.
This feature allows one to specifiy a loose maximum of total memory
which is allowed to be used for negative dentries. This is done
via setting a sysctl variable which is used to calculate a
negative dentry limit for the system. Every 15 seconds a kworker
task will prune back the negative dentries that exceed the limit,
plus an extra 1% for hysteresis purposes.
Intent is that the feature code is kept as close to the 3.10 version
as possible.
Main differences from the 3.10 version of the code:
- count of dentries associated with a superblock is kept in a
different location, requiring a procedure call to obtain
- superblocks are now kept by node id and memcg, requiring
more calls into iterate_super
Verification
- ensure the sysctl variable is set to 20:
sysctl fs.negative-dentry-limit
- run a test program that continuously tries to access a lot of
non-existent files, causing a lot of negative dentries to build up
- monitor the number of negative dentries building up in the system:
cat /proc/sys/fs/dentry-state
the fifth number is the negative dentries
- get the calculated negative dentry limit:
dmesg | grep Negative
- watch the negative dentries you are monitoring periodically drop
to below the limit before they start building back up again. The
drop should happen about 4 times per minute
Closes-Bug: 2017703
Signed-off-by: Jim Somerville <jim.somerville@windriver.com>
Change-Id: I3f55249aab45471802d123ed2253b6f36cc4af50
Updating the rsa ssh host key based on:
https://github.blog/2023-03-23-we-updated-our-rsa-ssh-host-key/
Note: In the future, StarlingX should have a zuul job and
secret setup for all repos so we do not need to do this
for every repo.
Needed to rename the secret, because zuul fails if like-named
secrets have diffent values in different branches of the same
repo.
Partial-Bug: #2015246
Change-Id: Ia60a3b7e0725182edd64078aa7f8c6bb4b35a373
Signed-off-by: Davlet Panech <davlet.panech@windriver.com>
This commit cherry-picks commits from the mainline kernel tree to let
the intel_pstate support the following newer CPUs in certain cases:
* Ice Lake
* Sapphire Rapids
Support for the following cases are added:
* When hardware P-states (HWP) is disabled in the BIOS/firmware, then
the intel_pstate driver will still get enabled (albeit in 'passive'
mode) for these CPUs, as opposed to the intel_pstate driver reporting
that the CPU is not supported.
* When out-of-band (OOB) P-state management is enabled with Ice Lake and
Sapphire Rapids CPUs, then the intel_pstate driver gracefully disables
itself, as the BIOS/platform firmware is responsible for managing the
HWP in such cases.
The following bullet point list depicts the commits that have been
cherry-picked, along with the output of "git describe --contains" for
each commit, to provide a sense of how recent each commit is:
* commit fbdc21e9b038 ("cpufreq: intel_pstate: Add Icelake servers
support in no-HWP mode") (v5.14-rc1~144^2~1^2~1^2~6)
* commit cd23f02f1668 ("cpufreq: intel_pstate: Add Ice Lake server to
out-of-band IDs") (v5.16-rc3~24^2~3)
* commit bbd67f1b5a94 ("cpufreq: intel_pstate: Support Sapphire Rapids
OOB mode") (v5.19-rc1~182^2~2^2~11)
* commit df51f287b5de ("cpufreq: intel_pstate: Add Sapphire Rapids
support in no-HWP mode") (v6.2-rc1~189^2~3^2~4)
We should note that we considered cherry-picking the following commits
too, but we opted to not do that due to the reasons discussed below:
* commit 706c5328851d ("cpufreq: intel_pstate: Add Cometlake support in
no-HWP mode") (v5.14-rc1~144^2~1^2~1^2~5)
* commit b6e6f8beec98 ("cpufreq: intel_pstate: Update EPP for AlderLake
mobile") (v5.17-rc1~167^2~2^2~21)
* commit 71bb5c82aaae ("cpufreq: intel_pstate: Add Tigerlake support in
no-HWP mode") (v6.1-rc1~205^2~1^2~1)
* commit 60675225ebee ("cpufreq: intel_pstate: Adjust
balance_performance EPP for Sapphire Rapids") (v6.3-rc1~21^2~6)
The commits related to Comet Lake and Tiger Lake were skipped, because
they refer to non-server class CPUs to the best of our knowledge, and
StarlingX is usually run on server-class hardware.
The commit for Alder Lake mobile CPUs is a dependency for the commit
that adjusts the Energy/Performance Preference (EPP) setting for
Sapphire Rapids CPUs. The latter commit improves performance when the
'powersave' governor is used, or when balance_performance setting is
used in the BIOS. While it is possible to cherry-pick both of these
commits (which was done in an earlier iteration of this commit), we
opted to not do that, mainly to keep the scope of this commit smaller,
and also because the two commits appeared optional (as opposed to
necessary) to us.
Verification
- The standard and real-time kernel packages were successfully built
with this commit.
- An ISO image, built with a slightly different version of this commit
for an older CentOS-based StarlingX version, containing Ice Lake
CPU-related changes only, was installed onto a server that has an Ice
Lake Xeon CPU, with HWP disabled and with out-of-band hardware P-state
management enabled in the BIOS. This resulted in the intel_pstate
driver being disabled, as expected.
- A StarlingX ISO image based on the StarlingX master branch with this
commit included was successfully installed onto a server with a
Sapphire Rapids Xeon CPU, in low-latency All-in-One simplex mode.
(Please note that the server was not Ansible-bootstrapped due to
unrelated difficulties.)
The intel_pstate driver was successfully initialized for the case
where HWP was enabled ("Native Mode") in the BIOS settings, and for
the case where HWP was turned off ("Disabled") in the BIOS settings.
For the former case, intel_pstate's status was reported as "active",
and for the latter case, intel_pstate's status was reported as
"passive", as indicated by
"/sys/devices/system/cpu/intel_pstate/status".
Prior to this commit, with HWP disabled in the BIOS, intel_pstate
would not load due to the CPU not being recognized.
- On the same server with a Sapphire Rapids CPU, enabling out-of-band
HWP management caused the intel_pstate driver to disable itself by
printing out "P-states controlled by the platform" and returning
-ENODEV, as expected.
Partial-Bug: 2016028
Change-Id: I55384c2239d6543662eeef62e86a4b8951887fd7
Signed-off-by: M. Vefa Bicakci <vefa.bicakci@windriver.com>
This commit updates kernel to 5.10.177 to fix following CVE issue:
CVE-2022-4379: https://nvd.nist.gov/vuln/detail/CVE-2022-4379
One of our source patches requires refresh against the new kernel
source. It was modified to acommodate the context changes in the new
kernel:
0001-Notification-of-death-of-arbitrary-processes.patch
Verification:
- Build kernel and out of tree modules success for rt and std.
- Build iso success for rt and std.
- Install success onto a All-in-One lab with rt kernel.
- Boot up successfully in the lab.
- The sanity testing was run including kernel and applications
by our test team.
- The cyclictest benchmark was also run on the starlingx lab, the result
is "samples: 259199999 avg: 1614 max: 4759 99.9999th percentile: 2572
overflows: 0", It is not big difference with 5.10.162 for avg and max,
but percentile seems little lower than 5.10.162.
Closes-Bug: 2015711
Change-Id: I98a92534154989446ba6eda9529cd799498ee800
Signed-off-by: Peng Zhang <Peng.Zhang2@windriver.com>
This is a backport of a collection of 12 upstream patches.
The main one being the switch to use a rwsem instead.
The next important one being the switch of the rwsem to be a
per filesystem lock instead of global.
See the individual patches for details. They did not require
much work or wiggling to get them applied.
They all come from Linus' tree and are easily located. As such
I have not modified their individual headers with upstream
commit ids.
Verification:
- two scripts, the concept behind them supplied by Vefa Bicakci.
The first one causes a lot of concurrent contention in sysfs.
The second script highlights how well systemd is also contending.
Run Script1 followed by Script2
Without this change, Script2 has timeouts and fails.
Script1:
for i in `seq 20`; do
(while :; do find /sys/fs/cgroup/ -type f -readable -print0 \
2>/dev/null | xargs -0 -n 20 -r cat >&/dev/null ; done) &
done
for i in `seq 10`; do
(while :; do systemd-run --scope -q sleep 0.5 >/dev/null; done) &
done
Script2:
while true; do
date -Is
/usr/bin/time -f %e systemctl enable -q lighttpd.service ||
break
/usr/bin/time -f %e systemctl disable -q lighttpd.service ||
break
/usr/bin/time -f %e systemctl restart -q lighttpd.service ||
break
sleep 0.5 || break
done
- also soak testing to ensure that these patches don't introduce issues
Partial-Bug: 2016028
Signed-off-by: Jim Somerville <jim.somerville@windriver.com>
Change-Id: I6ad64cd7c90f756c6eb904065febfeb516e73009
Correct the date format in the changelog file.
Erro logs:
stderr: dch: warning: debian/changelog(l6): badly formatted
trailer line
stderr: LINE: -- Jiping Ma <jiping.ma2@windriver.com> Wed Jan 11
9:33:12 CST 2023
stderr: dch: warning: debian/changelog(l8): found start of
entry where expected more change data or trailer
stderr: LINE: linux (5.10.152-1) unstable; urgency=medium
stderr: dch: warning: debian/changelog(l8): found end of file
where expected more change data or trailer
Closes-Bug: 2017385
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
Change-Id: Ia9f0abd3d30b1e755e56609b673014ad6d5002a4