Discussion:
[Bug 229106] intr_event_handle is unsafe with respect to interrupt
(too old to reply)
b***@freebsd.org
2018-06-18 11:36:38 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229106

Bug ID: 229106
Summary: intr_event_handle is unsafe with respect to interrupt
handler list
Product: Base System
Version: CURRENT
Hardware: Any
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: ***@FreeBSD.org
Reporter: ***@FreeBSD.org

Created attachment 194354
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=194354&action=edit
code changes to greately increase likelihood of the race

I must state upfront that I discovered the issue through code review and I had
to make special arrangements to provoke the problem.
The core of the issue is that intr_event_handle iterates the list of handlers,
ie_handlers, without any protection whatsoever. Also, removal and installation
of a filter-only handler does not make any attempt to synchronize with with
intr_event_handle.

As such, it is possible (although very improbable) that intr_event_handle may
iterate into an element just before it is removed and derefence its pointer to
a next element after the former element is freed and the pointer is
overwritten.

This problem is only for a shared interrupts. When an interrupt is not shared,
then it should be disabled before its handler is torn down.

Here is a stack trace of the crash:
fault virtual address = 0xffffffffffffffff
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80b64ff0
stack pointer = 0x28:0xfffffe0000434970
frame pointer = 0x28:0xfffffe00004349b0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 11 (idle: cpu2)
trap number = 12
panic: page fault
cpuid = 2
time = 1529319165
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0000434630
vpanic() at vpanic+0x1a3/frame 0xfffffe0000434690
panic() at panic+0x43/frame 0xfffffe00004346f0
trap_fatal() at trap_fatal+0x35f/frame 0xfffffe0000434740
trap_pfault() at trap_pfault+0x62/frame 0xfffffe0000434790
trap() at trap+0x2ba/frame 0xfffffe00004348a0
calltrap() at calltrap+0x8/frame 0xfffffe00004348a0
--- trap 0xc, rip = 0xffffffff80b64ff0, rsp = 0xfffffe0000434970, rbp =
0xfffffe00004349b0 ---
intr_event_handle() at intr_event_handle+0xa0/frame 0xfffffe00004349b0
intr_execute_handlers() at intr_execute_handlers+0x58/frame 0xfffffe00004349e0
lapic_handle_intr() at lapic_handle_intr+0x6d/frame 0xfffffe0000434a20
Xapic_isr1() at Xapic_isr1+0xd0/frame 0xfffffe0000434a20
--- interrupt, rip = 0xffffffff80bd3b49, rsp = 0xfffffe0000434af0, rbp =
0xfffffe0000434bb0 ---
sched_idletd() at sched_idletd+0x4a9/frame 0xfffffe0000434bb0
fork_exit() at fork_exit+0x84/frame 0xfffffe0000434bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000434bf0

This is what I did to get the crash.
1. Found hardware with shared interrupts. Specifically, I had three USB OHCI
controllers sharing PCI IRQ 18.
2. Modified the ohci driver, so that it installed a dummy filter instead of its
ithread handler. This made the driver non-functional, of course.
3. Modified IO-APIC code, so that it kept re-raising the interrupt thus
increasing the chances of getting the race within a reasonable time frame.
4. Re-compiled kern_intr.c with QUEUE_MACRO_DEBUG_TRASH to make the race more
probable by immediately corrupting a removed handler.
5. Triggered the interrupt storm for IRQ 18.
6. Ran a continuous loop of devctl detach followed by devctl attach for ohci
driver instances sharing the interrupt.

All the code modifications are in the attachment.
The devctl command line was:
while true ; do devctl detach ohci3 && devctl attach pci0:0:19:1 ; devctl
detach ohci4 && devctl attach pci0:0:20:5 ; done

The rate of interrupts was about 570K per second:
569k ohci2 ohci

The stack trace in kgdb:
(kgdb) bt
#0 __curthread () at ./machine/pcpu.h:231
#1 doadump (textdump=1) at /usr/devel/svn/head/sys/kern/kern_shutdown.c:366
#2 0xffffffff80ba33e2 in kern_reboot (howto=260) at
/usr/devel/svn/head/sys/kern/kern_shutdown.c:446
#3 0xffffffff80ba39c3 in vpanic (fmt=<optimized out>, ap=0xfffffe00004346d0)
at /usr/devel/svn/head/sys/kern/kern_shutdown.c:863
#4 0xffffffff80ba3a13 in panic (fmt=<unavailable>) at
/usr/devel/svn/head/sys/kern/kern_shutdown.c:790
#5 0xffffffff8107c6ff in trap_fatal (frame=0xfffffe00004348b0,
eva=18446744073709551615) at /usr/devel/svn/head/sys/amd64/amd64/trap.c:892
#6 0xffffffff8107c772 in trap_pfault (frame=0xfffffe00004348b0,
usermode=<optimized out>) at /usr/devel/svn/head/sys/amd64/amd64/trap.c:728
#7 0xffffffff8107bd7a in trap (frame=0xfffffe00004348b0) at
/usr/devel/svn/head/sys/amd64/amd64/trap.c:427
#8 <signal handler called>
#9 intr_event_handle (ie=0xfffff80003349300, frame=0xfffffe0000434a30) at
/usr/devel/svn/head/sys/kern/kern_intr.c:1180
#10 0xffffffff811f2118 in intr_execute_handlers (isrc=0xfffff800033845b0,
frame=0xfffffe0000434a30) at /usr/devel/svn/head/sys/x86/x86/intr_machdep.c:285
#11 0xffffffff811f841d in lapic_handle_intr (vector=49,
frame=0xfffffe0000434a30) at /usr/devel/svn/head/sys/x86/x86/local_apic.c:1270
#12 <signal handler called>
#13 sched_idletd (dummy=<optimized out>) at
/usr/devel/svn/head/sys/kern/sched_ule.c:2803
#14 0xffffffff80b62204 in fork_exit (callout=0xffffffff80bd36a0 <sched_idletd>,
arg=0x0, frame=0xfffffe0000434c00) at
/usr/devel/svn/head/sys/kern/kern_fork.c:1039
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2018-06-18 14:35:42 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229106

--- Comment #1 from John Baldwin <***@FreeBSD.org> ---
This is an old race. I have a very old p4 branch that dedicates a spin lock to
protecting the list of handlers (and then uses that as the thread_lock of idle
interrupt threads so that the number of spin locks taken for an interrupt
remains the same) that fixes this.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2018-06-18 14:40:34 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229106

--- Comment #2 from Andriy Gapon <***@FreeBSD.org> ---
Yes, it's an old one.
I think that using a spinlock is perfectly fine.
I also have a work-in-progress that tries to solve the problem with some atomic
magic so that intr_event_handle is completely wait / spin free (if the
interrupt thread does not need to run, of course). I guess that a difference
between those two approaches would be negligible.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2018-06-18 15:37:00 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229106

Conrad Meyer <***@freebsd.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@freebsd.org

--- Comment #3 from Conrad Meyer <***@freebsd.org> ---
Just FYI for next time, if you're making code changes, fail_point(9) can make
this kind of scenario much easier to hit. In particular, you can expand races
by adding in an empty KFAIL_POINT_CODE() and then inducing a delay with sysctl
my.failpoint="50%delay(1000)" (i.e., DELAY(1000) 50% of times we hit that
failpoint; the 50% can help avoid excessive foot-shooting).
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2018-06-18 15:38:37 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229106

--- Comment #4 from Conrad Meyer <***@freebsd.org> ---
Is this list a good candidate for epoch(9)-based reclamation? :-)
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2018-06-18 17:10:14 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229106

--- Comment #5 from John Baldwin <***@FreeBSD.org> ---
I suspect you cannot use epoch(9) in primary interrupt context (e.g. filters)
just as you can't use regular mutexes there.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2018-06-18 17:12:01 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229106

--- Comment #6 from Andriy Gapon <***@FreeBSD.org> ---
But a similar approach might work.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2018-06-18 17:43:22 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229106

--- Comment #7 from Conrad Meyer <***@freebsd.org> ---
Where does epoch(9) take a regular mtx? I may just be missing it, but I see
only spin mutexes.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2018-06-18 19:46:12 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229106

--- Comment #8 from Andriy Gapon <***@FreeBSD.org> ---
(In reply to Conrad Meyer from comment #7)
I don't know epoch(9) / ck_epoch implementation details, but given the
constraints[*] of interrupt handling, I think that it could be an overkill. I
mean, it probably has some overhead that's avoidable. And for that latency
sensitive code it could matter.

[*] The interrupt handler list can be iterated only by a single thread /
context at a time. Also, we reduce modifications to a single thread at a time
using a mutex. Performance of adding and removing interrupt handlers is not
critical.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2018-06-18 20:11:45 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229106

--- Comment #9 from Conrad Meyer <***@freebsd.org> ---
(In reply to Andriy Gapon from comment #8)
Yes, performance of add/remove doesn't matter. But performance of
intr_event_handle() matters very much, and adding a contested spin lock to it
seems like a bad idea. I suppose it only needs to be taken for shared
interrupts, not MSI, so maybe the impact is not so bad.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2018-06-18 22:56:10 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229106

--- Comment #10 from John Baldwin <***@FreeBSD.org> ---
To be clear, we are not talking about adding a spin mutex. We are talking
about replacing the existing sched_lock mutex used to schedule the ithread
anyway with a dedicated spin mutex that protects the list of handlers and can
be used as the THREAD_LOCK when scheduling the ithread since ithreads are
changed in this patch to set td_lock to the new mutex when they are idle. You
can't get away from the mutex to call sched_add(), but this lets you change
what lock is now used for that in essence.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2018-06-18 23:20:12 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229106

--- Comment #11 from Conrad Meyer <***@freebsd.org> ---
Sure. It'd still be a single global spin lock, right?
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2018-06-19 00:47:11 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229106

--- Comment #12 from John Baldwin <***@FreeBSD.org> ---
No, it is kind of per-IRQ, not global.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2018-06-19 01:48:10 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229106

--- Comment #13 from Conrad Meyer <***@freebsd.org> ---
Oh, that's much better.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2018-06-19 15:27:36 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229106

--- Comment #14 from Andriy Gapon <***@FreeBSD.org> ---
I have created https://reviews.freebsd.org/D15905
It's a hybrid between the approach I mentioned in comment #2 and what John
mentioned (comment #1 and others) and what the current code has.
--
You are receiving this mail because:
You are the assignee for the bug.
Loading...