Mechanism to detect kernel hang at bootup

August 31, 2025 — BarryK

This is an unresolved problem in Easy Excalibur; random hang at bootup. It gets to displaying "Loading kernel modules...", which is inside /etc/rc.d/rc.sysinit, and that's it, stuck there.

The kernel has a mechanism to detect hung processes, described here:

https://blog.cloudflare.com/es-la/searching-for-the-cause-of-hung-tasks-in-the-linux-kernel/

So, I have compiled the 6.12.44 kernel with this configuration, in the "Kernel hacking -> Debug Oops, Lockups and Hangs" section:

CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=60
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set

In /etc/rc.d/rc.sysinit, inserted this at line 636:

#20250831 if rc.sysinit completes, renamed to .sysinit.log see further down...
syslogd -O /mnt/wkg/.syslogd.log.${$}
klogd

Then much later in the script:

#20250831
killall klogd
killall syslogd
mv -f /mnt/wkg/.syslogd.log.${$} /mnt/wkg/.syslogd.log
#...see /etc/init.d/00sys_logger

If the kernel hangs in between those two, then wait several seconds, then after another reboot, that /mnt/wkg/.syslogd.log.${$} will still exist. This is the kernel log, and can be studied for hanging or timeout reports.

I also edited /etc/init.d/00sys_logger, which will execute only if execution gets past that above second code block:

#!/bin/sh

case $1 in
 start)
  #20250831 .syslogd.log created in /etc/rc.d/rc.sysinit
  if [ -f /mnt/wkg/.syslogd.log ];then
   cat /mnt/wkg/.syslogd.log > /var/log/messages
  fi
  syslogd #writes to /var/log/messages
  klogd
 ;;
 stop)
  killall klogd
  killall syslogd
 ;;
esac

...the second startup of syslogd and klogd will append to /var/log/messages.

My Lenovo PC has not hung at bootup for at least a week, and I thought, hey, when will it happen again. Well, serendipity, rebooted after setting up the above, and it hung, right at "Loading kernel modules...".

I rebooted, and booted up Easy Scarthgap, just to be cautious not to modify that /mnt/wkg/.syslogd.${$} (where that $$ is 335 in my case), looked at the file, and very interesting, just keeps repeating this, over and over:

Aug 30 22:03:45 (none) daemon.warn kernel: [ 1271.605276] udevd[420]: slow: 'ata_id --export /dev/sr0' [475]
Aug 30 22:03:46 (none) daemon.err kernel: [ 1272.606540] udevd[420]: timeout: killing 'ata_id --export /dev/sr0' [475]

That 'ata_id' is a binary executable called by a udev rule. I don't think that the kernel hung detection has anything to do with that, as it has a timeout of 60 seconds (see above). Instead, what is happening is the 20 second timeout in udevd, see line 700 in rc.sysinit:

udevd --daemon --resolve-names=early --children-max=32 --event-timeout=20 >/tmp/udevd-debug.log 2>&1

What seems to be happening is that udevd tries to kill 'ata_id' but fails, and it just keeps retrying. At least, that is what seems to be happening. I need to study ata_id, what it does.

Anyway, we have progress.

EDIT:
I have removed /usr/lib/udev/rules.d/60-persistent-storage.rules; this is what calls /usr/lib/udev/ata_id

Actually, I had removed it sometime ago, as was suspicious of it; but an EasyOS user asked why /dev/disk folder was missing, so I put it back. EasyOS does not need that folder. Google's AI says this:

udev_ata_id is a callout program for the udev device manager that reads product and serial numbers from ATA drives to provide udev with unique, stable identifiers. Udev then uses this information to create symbolic links in /dev/disk/by-id/ and /dev/disk/by-label/.

Nor do users. You can use the 'blkid' utility to find out that information.

Tags: easy