[Rtai] debugging information on my problem

Alex Ivanov aivanov at ligo.caltech.edu
Sun Sep 28 01:25:40 CEST 2008



On Sat, 27 Sep 2008, Emmanuel Pacaud wrote:

> Cheers from another gravitational wave detector experiment !

Hi, Emmanuel, good to hear from Virgo! :)

> On sam, 2008-09-27 at 12:12 -0700, Alex Ivanov wrote:
> > I just wanted to pass some debuggin information I am getting from the
> > kernel. Here is what I get sometimes after the things lock up on my
> > system when I stasrt my "infinite" loop code (I am running kernel with
> > isolcpus=1 and NO maxcpus parameter this time):
> > 
> > BUG: soft lockup - CPU#3 stuck for 11s! [bash:4511]
> > 
> > Pid: 4511, comm: bash Tainted: PF       (2.6.24 #5)
> > EIP: 0060:[<c0418f78>] EFLAGS: 00000297 CPU: 3
> > EIP is at native_smp_call_function_mask+0xec/0x112
> > EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
> > ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: c3031278
> >  DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068
> > CR0: 8005003b CR2: 0811a2f8 CR3: 36c2b000 CR4: 000006d0
> > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > DR6: ffff0ff0 DR7: 00000400
> >  =======================
> > BUG: soft lockup - CPU#2 stuck for 11s! [sshd:4512]
> > 
> > Pid: 4512, comm: sshd Tainted: PF       (2.6.24 #5)
> > EIP: 0060:[<c06290b5>] EFLAGS: 00200286 CPU: 2
> > EIP is at _spin_lock+0x7/0xf
> > EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
> > ESI: 00000000 EDI: 00000000 EBP: 0000000f ESP: c3027278
> >  DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068
> > CR0: 8005003b CR2: 081229ef CR3: 36d0c000 CR4: 000006d0
> > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > DR6: ffff0ff0 DR7: 00000400
> >  =======================
> 
> During the development on our new software using a RTAI/Linux platform,
> we've tried to minimize the latencies by using one of the isolated for
> an infinite polling loop.
>
> But we've faced the same issue as yours: after a short time, the machine
> is completely frozen, and we have the same kernel warnings.
> 
> We've seen that if we stop the infinite loop, the system becomes
> responsive again. It's as if a kernel task on the non-isolated cpu has
> to communicate with the isolated ones, and is waiting them for being
> available. My knowledge of the kernel internals is too weak to explain
> what's really going on.

Same here, but I am eager and wiling to learn! I really hope we can figure
this one out, otherwise we would have to pay thousands of dollars per year
to get licensed version of real-time Linux. That's not good...

> One possible workaround we've tried was, in the infinite loop, to give
> the control back to the kernel from time to time.
> 
> The use of the cpuset API sounds promising, but we don't have
> investigated it yet (we can live with a regular interrupt handler and
> its latency for now).

I have tried cpusets as suggested by Bernhard, but it didn't make any
difference. I think it is equivalent to isolcpu kernel parameter.

Cheers, Alex



More information about the Rtai mailing list