Crashlog with pstore
The Linux kernel supports the automatic storage of stack traces for
crashes in a special memory area. This can be in a non-volatile memory
area (like the efivars) or a reserved memory area in the system ram. In
older OpenWrt releases, this was for example implemented via the
/sys/kernel/debug/crashlog
file which was created after a system
rebooted after a kernel panic. Newer OpenWrt releases (with kernel 5.4
or newer) don’t use the old (OpenWrt-only) implementation and the user
has to manually enable the generic kernel implementation. This is is
most cases PSTORE_RAM
(previously called ramoops).
Enable PSTORE_RAM for ath79 board
Enabling kernel feature
The kernel feature has to be enabled via make kernel_menuconfig
. The
option for it is called CONFIG_PSTORE_RAM and requires the pstore
subystem (CONFIG_PSTORE
) to be enabled.
diff --git a/target/linux/ath79/config-5.4 b/target/linux/ath79/config-5.4
index e37b728554..7e5ee88817 100644
--- a/target/linux/ath79/config-5.4
+++ b/target/linux/ath79/config-5.4
@@ -210,7 +159,23 @@ CONFIG_PHYLIB=y
# CONFIG_PHY_AR7200_USB is not set
# CONFIG_PHY_ATH79_USB is not set
CONFIG_PINCTRL=y
+CONFIG_PSTORE=y
+# CONFIG_PSTORE_842_COMPRESS is not set
+CONFIG_PSTORE_COMPRESS=y
+CONFIG_PSTORE_COMPRESS_DEFAULT="deflate"
+# CONFIG_PSTORE_CONSOLE is not set
+CONFIG_PSTORE_DEFLATE_COMPRESS=y
+CONFIG_PSTORE_DEFLATE_COMPRESS_DEFAULT=y
+# CONFIG_PSTORE_LZ4HC_COMPRESS is not set
+# CONFIG_PSTORE_LZ4_COMPRESS is not set
+# CONFIG_PSTORE_LZO_COMPRESS is not set
+# CONFIG_PSTORE_PMSG is not set
+CONFIG_PSTORE_RAM=y
+# CONFIG_PSTORE_ZSTD_COMPRESS is not set
CONFIG_RATIONAL=y
+CONFIG_REED_SOLOMON=y
+CONFIG_REED_SOLOMON_DEC8=y
+CONFIG_REED_SOLOMON_ENC8=y
CONFIG_REGMAP=y
CONFIG_REGMAP_MMIO=y
CONFIG_REGULATOR=y
Reserving memory for the board
The PSTORE_RAM is using a small RAM area for a crashlog ring-buffer. It is recommended to use a 64KB area in the system RAM area in the address space of the SoC which neither the bootloader nor Linux need for booting the system. The “System RAM” region of the system can be found using cat /proc/iomem. A good place must not conflict with any other entry in iomem - but there can still be conflicts with the bootloader which are not listed here. Board specific information like the data from uboot’s bdinfo, the map files (generated using the u-boot build) and the boot scripts must be checked here.
The old crashlog implementation used a couple of kilobytes at the 63MB address (or the highest memory address - 1MB when the board has less than 64MB of RAM) for the storage. This would be the 0x03f00000 address (because the system RAM starts at 0x0 for ath79) which we can add manually to the devicetree file of the board:
--- a/target/linux/ath79/dts/xxx_xxx.dts
+++ b/target/linux/ath79/dts/xxx_xxx.dts
@@ -6,4 +18 @@
#include <dt-bindings/input/input.h>
/ {
+ reserved-memory {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges;
+
+ /* 64 KiB reserved for ramoops/pstore */
+ ramoops@03f00000 {
+ compatible = "ramoops";
+ reg = <0x03f00000 0x10000>;
+ record-size = <0x1000>;
+ console-size = <0x1000>;
+ };
+ };
+
chosen {
Other options how to specify the ramoops region are defined in the official documentation
The system will initialize the area on bootup:
root@OpenWrt:/# dmesg|grep -e ramoops -e pstore
[ 0.134694] pstore: Registered ramoops as persistent store backend
[ 0.141271] ramoops: using 0x10000@0x3f00000, ecc: 0
[ 1.253190] pstore: Using crash dump compression: deflate
Accessing the pstore
OpenWrt 21.02 (or newer) will automatically handle the mounting of the pstore filesystem. For other systems, it is necessary to manually mount the pstore using
root@OpenWrt:/# /bin/mount -o noatime -t pstore pstore /sys/fs/pstore
After a kernel panic, one or more files called dmesg-ramoops-* can be
found in the directory /sys/fs/pstore/
.
root@OpenWrt:/# ls -ltr /sys/fs/pstore/
-r--r--r-- 1 root root 9001 Oct 19 12:45 dmesg-ramoops-1
-r--r--r-- 1 root root 9024 Oct 19 12:45 dmesg-ramoops-0
It should be easy to extract the relevant stacktrace from such a file. It could for example look like this:
<1>[ 95.770888] CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 8699f200, ra == 8699f1f4
<4>[ 95.781691] Oops[#1]:
<4>[ 95.783999] CPU: 0 PID: 2323 Comm: batctl Not tainted 5.4.152 #0
<4>[ 95.790082] $ 0 : 00000000 00000001 00000017 00000000
<4>[ 95.795379] $ 4 : 86f5d61c 86d7dc08 86d7dc08 869b3218
<4>[ 95.800677] $ 8 : 00000034 805464c8 86d7dc6c 00000002
<4>[ 95.805975] $12 : fffffffd 00000402 80691574 00000040
<4>[ 95.811273] $16 : 86d7dc08 86e85460 86f5d600 8689f3c0
<4>[ 95.816569] $20 : 00000000 86d7dc6c 00000000 8068fed8
<4>[ 95.821867] $24 : 00000000 86d7dde4
<4>[ 95.827164] $28 : 86d7c000 86d7dbc8 868cea00 8699f1f4
<4>[ 95.832463] Hi : 00000000
<4>[ 95.835375] Lo : 00000003
<4>[ 95.838627] epc : 8699f200 batadv_netlink_set_mesh+0x40/0x320 [batman_adv]
<4>[ 95.846091] ra : 8699f1f4 batadv_netlink_set_mesh+0x34/0x320 [batman_adv]
<4>[ 95.853231] Status: 1100fc03 KERNEL EXL IE
<4>[ 95.857474] Cause : 0080000c (ExcCode 03)
<4>[ 95.861529] BadVA : 00000000
<4>[ 95.864443] PrId : 00019750 (MIPS 74Kc)
<4>[ 95.868411] Modules linked in: ath9k ath9k_common pppoe ppp_async iptable_nat batman_adv ath9k_hw ath10k_pci ath10k_core ath xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD pppox ppp_generic nf_nat nf_flow_table_hw nf_flow_table nf_conntrack mac80211 ipt_REJECT cfg80211 xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG slhc nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_filter ip_tables crc_ccitt compat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 sha256_generic libsha256 seqiv jitterentropy_rng drbg hmac ghash_generic gf128mul gcm ctr cmac ccm fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug usbcore nls_base usb_common crc16 aead crypto_null cryptomgr crc32c_generic crypto_hash
<4>[ 95.944865] Process batctl (pid: 2323, threadinfo=92bd6437, task=bec29b92, tls=77e34dcc)
<4>[ 95.953057] Stack : 86f5d600 869b31c8 869b4b64 86f5d600 869b31c8 869b4b64 86f5d600 8042ab40
<4>[ 95.961532] 80691570 86d7dc54 00000000 00000002 00000000 86d7dc6c 868cea00 00000913
<4>[ 95.970005] 616ebdec 00000913 86f5d600 86f5d610 86f5d614 868cea00 8068fed8 86e85460
<4>[ 95.978480] 00000000 86d7dc6c 86f5d600 8689f3c0 fffffffc 8042a96c 80670000 00000000
<4>[ 95.986955] 00000000 00000000 8042962c 80429e2c 8689f3c0 00000006 00000006 8042b2dc
<4>[ 95.995429] ...
<4>[ 95.997905] Call Trace:
<4>[ 96.000709] [<8699f200>] batadv_netlink_set_mesh+0x40/0x320 [batman_adv]
<4>[ 96.007823] [<8042ab40>] genl_rcv_msg+0x1d4/0x4b0
<4>[ 96.012595] [<80429e2c>] netlink_rcv_skb+0xb0/0x160
<4>[ 96.017540] [<8042a630>] genl_rcv+0x30/0x48
<4>[ 96.021799] [<80429538>] netlink_unicast+0x1a4/0x298
<4>[ 96.026836] [<8042990c>] netlink_sendmsg+0x2e0/0x3c8
<4>[ 96.031877] [<803b8790>] ____sys_sendmsg+0xc4/0x26c
<4>[ 96.036823] [<803b9144>] ___sys_sendmsg+0x7c/0xcc
<4>[ 96.041595] [<803ba3fc>] sys_sendmsg+0x4c/0x94
<4>[ 96.046098] [<8006e1ac>] syscall_common+0x34/0x58
<4>[ 96.050867] Code: 0002102b ae22000c 24020017 <a4020000> 8e020014 8c4400a8 10800004 00000000 02202825
<4>[ 96.060751]
<4>[ 96.062319] ---[ end trace ff181e2552b1e823 ]---
<0>[ 96.077613] Kernel panic - not syncing: Fatal exception
Decoding the stack trace
Entries like batadv_netlink_set_mesh+0x40/0x320
hard to understand when
not knowing what source code line this would be. Various tools can be
used here:
binutils’ addr2line
The addr2line
can be used when the relative address is known. In our
example, the batman-adv module was loaded at address 0x86e80000 (see
/proc/modules
). The relative address is therefore 0x0001f200
(0x8699f200 - 0x86980000
)
$ mips-linux-gnu-addr2line -f -e ./build_dir/target-mips_24kc_musl/linux-ath79_generic/batman-adv-2021.3/net/batman-adv/batman-adv.ko 0x0001f200
batadv_netlink_set_mesh
./build_dir/target-mips_24kc_musl/linux-ath79_generic/batman-adv-2021.3/net/batman-adv/netlink.c:448
gdb
gdb(-multiarch) can also parse the symbols and is able to decode the symbolic name + offset.
$ gdb-multiarch -q -ex 'list *(batadv_netlink_set_mesh+0x40)' -ex quit ./build_dir/target-mips_24kc_musl/linux-ath79_generic/batman-adv-2021.3/net/batman-adv/batman-adv.ko
Reading symbols from ./build_dir/target-mips_24kc_musl/linux-ath79_generic/batman-adv-2021.3/net/batman-adv/batman-adv.ko...
0x1f240 is in batadv_netlink_set_mesh (./build_dir/target-mips_24kc_musl/linux-ath79_generic/batman-adv-2021.3/net/batman-adv/netlink.c:448).
443 if (info->attrs[BATADV_ATTR_AGGREGATED_OGMS_ENABLED]) {
444 attr = info->attrs[BATADV_ATTR_AGGREGATED_OGMS_ENABLED];
445
446 atomic_set(&bat_priv->aggregated_ogms, !!nla_get_u8(attr));
447 attr = NULL;
448 attr->nla_len = 23;
449 }
450
451 if (info->attrs[BATADV_ATTR_AP_ISOLATION_ENABLED]) {
452 attr = info->attrs[BATADV_ATTR_AP_ISOLATION_ENABLED];
Or when trying to inspect the actual instructions together with the source code:
$ gdb-multiarch -q -ex 'disassemble /m *(batadv_netlink_set_mesh+0x40)' -ex quit ./build_dir/target-mips_24kc_musl/linux-ath79_generic/batman-adv-2021.3/net/batman-adv/batman-adv.ko
Reading symbols from ./build_dir/target-mips_24kc_musl/linux-ath79_generic/batman-adv-2021.3/net/batman-adv/batman-adv.ko...
Dump of assembler code for function batadv_netlink_set_mesh:
[...]
442
443 if (info->attrs[BATADV_ATTR_AGGREGATED_OGMS_ENABLED]) {
0x0001f21c <+28>: lw v0,20(a1)
0x0001f220 <+32>: lw a0,164(v0)
0x0001f224 <+36>: beqz a0,0x1f244 <batadv_netlink_set_mesh+68>
0x0001f228 <+40>: nop
444 attr = info->attrs[BATADV_ATTR_AGGREGATED_OGMS_ENABLED];
445
446 atomic_set(&bat_priv->aggregated_ogms, !!nla_get_u8(attr));
0x0001f22c <+44>: jal 0x1d41c <nla_get_u32+8>
0x0001f230 <+48>: nop
0x0001f234 <+52>: sltu v0,zero,v0
447 attr = NULL;
448 attr->nla_len = 23;
0x0001f23c <+60>: li v0,23
0x0001f240 <+64>: sh v0,0(zero)
449 }
[...]
Linux’s faddr2line
The kernel has a script which can be used to decode one or multiple line entries - similar to what can also be done with gdb(-multiarch).
$ LXBASE=./build_dir/target-mips_24kc_musl/linux-ath79_generic
$ CROSS_COMPILE=mips-linux-gnu- $LXBASE/linux-5.4.152/scripts/faddr2line --list $LXBASE/batman-adv-2021.3/ipkg-mips_24kc/kmod-batman-adv/lib/modules/5.4.152/batman-adv.ko 'batadv_netlink_set_mesh+0x40/0x320'
batadv_netlink_set_mesh+0x40/0x320:
batadv_netlink_set_mesh at ./build_dir/target-mips_24kc_musl/linux-ath79_generic/batman-adv-2021.3/net/batman-adv/netlink.c:448
443 if (info->attrs[BATADV_ATTR_AGGREGATED_OGMS_ENABLED]) {
444 attr = info->attrs[BATADV_ATTR_AGGREGATED_OGMS_ENABLED];
445
446 atomic_set(&bat_priv->aggregated_ogms, !!nla_get_u8(attr));
447 attr = NULL;
>448< attr->nla_len = 23;
449 }
450
451 if (info->attrs[BATADV_ATTR_AP_ISOLATION_ENABLED]) {
452 attr = info->attrs[BATADV_ATTR_AP_ISOLATION_ENABLED];
453
Linux’s decode_stacktrace
The faddr2line has no real functional benefits when comparing it with gdb. But the has another useful script which can decode the whole stacktrace in a single run. It can even handle multiple modules when they are under a common folder:
$ LXBASE=./build_dir/target-mips_24kc_musl/linux-ath79_generic
$ CROSS_COMPILE=mips-linux-gnu- $LXBASE/linux-5.4.152/scripts/decode_stacktrace.sh $LXBASE/linux-5.4.152/vmlinux $LXBASE/linux-5.4.152/ $LXBASE < dmesg-ramoops-1
[...]
<1>[ 95.770888] CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 8699f200, ra == 8699f1f4
<4>[ 95.781691] Oops[#1]:
<4>[ 95.783999] CPU: 0 PID: 2323 Comm: batctl Not tainted 5.4.152 #0
<4>[ 95.790082] $ 0 : 00000000 00000001 00000017 00000000
<4>[ 95.795379] $ 4 : 86f5d61c 86d7dc08 86d7dc08 869b3218
<4>[ 95.800677] $ 8 : 00000034 805464c8 86d7dc6c 00000002
<4>[ 95.805975] $12 : fffffffd 00000402 80691574 00000040
<4>[ 95.811273] $16 : 86d7dc08 86e85460 86f5d600 8689f3c0
<4>[ 95.816569] $20 : 00000000 86d7dc6c 00000000 8068fed8
<4>[ 95.821867] $24 : 00000000 86d7dde4
<4>[ 95.827164] $28 : 86d7c000 86d7dbc8 868cea00 8699f1f4
<4>[ 95.832463] Hi : 00000000
<4>[ 95.835375] Lo : 00000003
<4>[ 95.838627] epc : 8699f200 batadv_netlink_set_mesh (./build_dir/target-mips_24kc_musl/linux-ath79_generic/batman-adv-2021.3/net/batman-adv/netlink.c:448) batman_adv
<4>[ 95.846091] ra : 8699f1f4 batadv_netlink_set_mesh (./build_dir/target-mips_24kc_musl/linux-ath79_generic/batman-adv-2021.3/net/batman-adv/netlink.c:446) batman_adv
<4>[ 95.853231] Status: 1100fc03 KERNEL EXL IE
<4>[ 95.857474] Cause : 0080000c (ExcCode 03)
<4>[ 95.861529] BadVA : 00000000
<4>[ 95.864443] PrId : 00019750 (MIPS 74Kc)
<4>[ 95.868411] Modules linked in: ath9k ath9k_common pppoe ppp_async iptable_nat batman_adv ath9k_hw ath10k_pci ath10k_core ath xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD pppox ppp_generic nf_nat nf_flow_table_hw nf_flow_table nf_conntrack mac80211 ipt_REJECT cfg80211 xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG slhc nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_filter ip_tables crc_ccitt compat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 sha256_generic libsha256 seqiv jitterentropy_rng drbg hmac ghash_generic gf128mul gcm ctr cmac ccm fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug usbcore nls_base usb_common crc16 aead crypto_null cryptomgr crc32c_generic crypto_hash
<4>[ 95.944865] Process batctl (pid: 2323, threadinfo=92bd6437, task=bec29b92, tls=77e34dcc)
<4>[ 95.953057] Stack : 86f5d600 869b31c8 869b4b64 86f5d600 869b31c8 869b4b64 86f5d600 8042ab40
<4>[ 95.961532] 80691570 86d7dc54 00000000 00000002 00000000 86d7dc6c 868cea00 00000913
<4>[ 95.970005] 616ebdec 00000913 86f5d600 86f5d610 86f5d614 868cea00 8068fed8 86e85460
<4>[ 95.978480] 00000000 86d7dc6c 86f5d600 8689f3c0 fffffffc 8042a96c 80670000 00000000
<4>[ 95.986955] 00000000 00000000 8042962c 80429e2c 8689f3c0 00000006 00000006 8042b2dc
<4>[ 95.995429] ...
<4>[ 95.997905] Call Trace:
<4>[ 96.000709] batadv_netlink_set_mesh (./build_dir/target-mips_24kc_musl/linux-ath79_generic/batman-adv-2021.3/net/batman-adv/netlink.c:448) batman_adv
<4>[ 96.007823] genl_rcv_msg (./build_dir/target-mips_24kc_musl/linux-ath79_generic/linux-5.4.152/net/netlink/genetlink.c:631 ./build_dir/target-mips_24kc_musl/linux-ath79_generic/linux-5.4.152/net/netlink/genetlink.c:654)
<4>[ 96.012595] netlink_rcv_skb (./build_dir/target-mips_24kc_musl/linux-ath79_generic/linux-5.4.152/net/netlink/af_netlink.c:2481)
<4>[ 96.017540] genl_rcv (./build_dir/target-mips_24kc_musl/linux-ath79_generic/linux-5.4.152/net/netlink/genetlink.c:667)
<4>[ 96.021799] netlink_unicast (./build_dir/target-mips_24kc_musl/linux-ath79_generic/linux-5.4.152/net/netlink/af_netlink.c:1306 ./build_dir/target-mips_24kc_musl/linux-ath79_generic/linux-5.4.152/net/netlink/af_netlink.c:1331)
<4>[ 96.026836] netlink_sendmsg (./build_dir/target-mips_24kc_musl/linux-ath79_generic/linux-5.4.152/net/netlink/af_netlink.c:1920)
<4>[ 96.031877] ____sys_sendmsg (./build_dir/target-mips_24kc_musl/linux-ath79_generic/linux-5.4.152/net/socket.c:637 ./build_dir/target-mips_24kc_musl/linux-ath79_generic/linux-5.4.152/net/socket.c:657 ./build_dir/target-mips_24kc_musl/linux-ath79_generic/linux-5.4.152/net/socket.c:2286)
<4>[ 96.036823] ___sys_sendmsg (./build_dir/target-mips_24kc_musl/linux-ath79_generic/linux-5.4.152/net/socket.c:2342)
<4>[ 96.041595] sys_sendmsg (./build_dir/target-mips_24kc_musl/linux-ath79_generic/linux-5.4.152/./include/linux/file.h:30 ./build_dir/target-mips_24kc_musl/linux-ath79_generic/linux-5.4.152/net/socket.c:2388 ./build_dir/target-mips_24kc_musl/linux-ath79_generic/linux-5.4.152/net/socket.c:2395 ./build_dir/target-mips_24kc_musl/linux-ath79_generic/linux-5.4.152/net/socket.c:2393)
<4>[ 96.046098] syscall_common (./build_dir/target-mips_24kc_musl/linux-ath79_generic/linux-5.4.152/arch/mips/kernel/scall32-o32.S:101)
<4>[ 96.050867] Code: 0002102b ae22000c 24020017 <a4020000> 8e020014 8c4400a8 10800004 00000000 02202825
All code
========
0: 0002102b sltu v0,zero,v0
4: ae22000c sw v0,12(s1)
8: 24020017 li v0,23
c:* a4020000 sh v0,0(zero) <-- trapping instruction
10: 8e020014 lw v0,20(s0)
14: 8c4400a8 lw a0,168(v0)
18: 10800004 beqz a0,0x2c
1c: 00000000 nop
20: 02202825 move a1,s1
...
Code starting with the faulting instruction
===========================================
0: a4020000 sh v0,0(zero)
4: 8e020014 lw v0,20(s0)
8: 8c4400a8 lw a0,168(v0)
c: 10800004 beqz a0,0x20
10: 00000000 nop
14: 02202825 move a1,s1
...
<4>[ 96.060751]
<4>[ 96.062319] ---[ end trace ff181e2552b1e823 ]---
<0>[ 96.077613] Kernel panic - not syncing: Fatal exception