[Bug 229106] intr_event_handle is unsafe with respect to interrupt

Discussion:

(too old to reply)

b***@freebsd.org

2018-06-18 11:36:38 UTC

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229106

Bug ID: 229106
Summary: intr_event_handle is unsafe with respect to interrupt
handler list
Product: Base System
Version: CURRENT
Hardware: Any
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: ***@FreeBSD.org
Reporter: ***@FreeBSD.org

Created attachment 194354
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=194354&action=edit
code changes to greately increase likelihood of the race

I must state upfront that I discovered the issue through code review and I had
to make special arrangements to provoke the problem.
The core of the issue is that intr_event_handle iterates the list of handlers,
ie_handlers, without any protection whatsoever. Also, removal and installation
of a filter-only handler does not make any attempt to synchronize with with
intr_event_handle.

As such, it is possible (although very improbable) that intr_event_handle may
iterate into an element just before it is removed and derefence its pointer to
a next element after the former element is freed and the pointer is
overwritten.

This problem is only for a shared interrupts. When an interrupt is not shared,
then it should be disabled before its handler is torn down.

Here is a stack trace of the crash:
fault virtual address = 0xffffffffffffffff
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80b64ff0
stack pointer = 0x28:0xfffffe0000434970
frame pointer = 0x28:0xfffffe00004349b0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 11 (idle: cpu2)
trap number = 12
panic: page fault
cpuid = 2
time = 1529319165
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0000434630
vpanic() at vpanic+0x1a3/frame 0xfffffe0000434690
panic() at panic+0x43/frame 0xfffffe00004346f0
trap_fatal() at trap_fatal+0x35f/frame 0xfffffe0000434740
trap_pfault() at trap_pfault+0x62/frame 0xfffffe0000434790
trap() at trap+0x2ba/frame 0xfffffe00004348a0
calltrap() at calltrap+0x8/frame 0xfffffe00004348a0
--- trap 0xc, rip = 0xffffffff80b64ff0, rsp = 0xfffffe0000434970, rbp =
0xfffffe00004349b0 ---
intr_event_handle() at intr_event_handle+0xa0/frame 0xfffffe00004349b0
intr_execute_handlers() at intr_execute_handlers+0x58/frame 0xfffffe00004349e0
lapic_handle_intr() at lapic_handle_intr+0x6d/frame 0xfffffe0000434a20
Xapic_isr1() at Xapic_isr1+0xd0/frame 0xfffffe0000434a20
--- interrupt, rip = 0xffffffff80bd3b49, rsp = 0xfffffe0000434af0, rbp =
0xfffffe0000434bb0 ---
sched_idletd() at sched_idletd+0x4a9/frame 0xfffffe0000434bb0
fork_exit() at fork_exit+0x84/frame 0xfffffe0000434bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000434bf0

This is what I did to get the crash.
1. Found hardware with shared interrupts. Specifically, I had three USB OHCI
controllers sharing PCI IRQ 18.
2. Modified the ohci driver, so that it installed a dummy filter instead of its
ithread handler. This made the driver non-functional, of course.
3. Modified IO-APIC code, so that it kept re-raising the interrupt thus
increasing the chances of getting the race within a reasonable time frame.
4. Re-compiled kern_intr.c with QUEUE_MACRO_DEBUG_TRASH to make the race more
probable by immediately corrupting a removed handler.
5. Triggered the interrupt storm for IRQ 18.
6. Ran a continuous loop of devctl detach followed by devctl attach for ohci
driver instances sharing the interrupt.

All the code modifications are in the attachment.
The devctl command line was:
while true ; do devctl detach ohci3 && devctl attach pci0:0:19:1 ; devctl
detach ohci4 && devctl attach pci0:0:20:5 ; done

The rate of interrupts was about 570K per second:
569k ohci2 ohci

The stack trace in kgdb:
(kgdb) bt
#0 __curthread () at ./machine/pcpu.h:231
#1 doadump (textdump=1) at /usr/devel/svn/head/sys/kern/kern_shutdown.c:366
#2 0xffffffff80ba33e2 in kern_reboot (howto=260) at
/usr/devel/svn/head/sys/kern/kern_shutdown.c:446
#3 0xffffffff80ba39c3 in vpanic (fmt=<optimized out>, ap=0xfffffe00004346d0)
at /usr/devel/svn/head/sys/kern/kern_shutdown.c:863
#4 0xffffffff80ba3a13 in panic (fmt=<unavailable>) at
/usr/devel/svn/head/sys/kern/kern_shutdown.c:790
#5 0xffffffff8107c6ff in trap_fatal (frame=0xfffffe00004348b0,
eva=18446744073709551615) at /usr/devel/svn/head/sys/amd64/amd64/trap.c:892
#6 0xffffffff8107c772 in trap_pfault (frame=0xfffffe00004348b0,
usermode=<optimized out>) at /usr/devel/svn/head/sys/amd64/amd64/trap.c:728
#7 0xffffffff8107bd7a in trap (frame=0xfffffe00004348b0) at
/usr/devel/svn/head/sys/amd64/amd64/trap.c:427
#8 <signal handler called>
#9 intr_event_handle (ie=0xfffff80003349300, frame=0xfffffe0000434a30) at
/usr/devel/svn/head/sys/kern/kern_intr.c:1180
#10 0xffffffff811f2118 in intr_execute_handlers (isrc=0xfffff800033845b0,
frame=0xfffffe0000434a30) at /usr/devel/svn/head/sys/x86/x86/intr_machdep.c:285
#11 0xffffffff811f841d in lapic_handle_intr (vector=49,
frame=0xfffffe0000434a30) at /usr/devel/svn/head/sys/x86/x86/local_apic.c:1270
#12 <signal handler called>
#13 sched_idletd (dummy=<optimized out>) at
/usr/devel/svn/head/sys/kern/sched_ule.c:2803
#14 0xffffffff80b62204 in fork_exit (callout=0xffffffff80bd36a0 <sched_idletd>,
arg=0x0, frame=0xfffffe0000434c00) at
/usr/devel/svn/head/sys/kern/kern_fork.c:1039