Bug #228
closedWorkqueue: bat_events batadv_send_outstanding_bat_ogm_packet
Added by Paulo da Silva almost 9 years ago. Updated almost 8 years ago.
0%
Description
Since a few days I have several kernel-crashes in the same point of code. I'm using 2015.1 from linux-image-4.2.0-0.bpo.1-amd64/4.2.6-3~bpo8+2 debian/8.2
Attached a kernel-trace.
Files
trace-2016-01-09-01.txt (3.82 KB) trace-2016-01-09-01.txt | Trace | Paulo da Silva, 01/09/2016 02:33 PM | |
Bildschirmfoto vom 2016-01-09 12_40_22.png (16.3 KB) Bildschirmfoto vom 2016-01-09 12_40_22.png | console screen-shot | Paulo da Silva, 01/09/2016 10:28 PM |
Updated by Sven Eckelmann almost 9 years ago
- Assignee set to Simon Wunderlich
This is not a crash but a warning shown when a queued packet (ogm) for an interface doesn't "belong" anymore to its original batman interface: https://kernel.googlesource.com/pub/scm/linux/kernel/git/tytso/ext4/+/v4.2.6/net/batman-adv/bat_iv_ogm.c#522
This warning was introduced by: https://git.open-mesh.org/batman-adv.git/commit/29b9256e6631876d4f1719f4d5e13d7ee140c61b
Updated by Paulo da Silva almost 9 years ago
Sorry! I looked at the trace again and found, that short after that warning, there was an other similar warning. I have no experience with kernel traces, but it looks like a crash.
Unfortunately I have only a screen-shot. Maybe this is related?
Updated by Paulo da Silva almost 9 years ago
I have setup both server to save crashes in /var/crashes in case someone will need that information for debugging. Server are crashing ca. once a day.
Updated by Stefan Hoffmann almost 9 years ago
We have the same issue. Every Supernode is Crashing with the same Kernel Panic every time a Interface is removed from batman.
Updated by Antonio Quartulli almost 9 years ago
Stefan,
not sure you have tried all the latest bugfixes?
In any case, they can be found in the new 2016.0 release.
Updated by Stefan Hoffmann almost 9 years ago
Hi,
i,ve tested the Release today, but i have the same issue.
Updated by Paulo da Silva almost 9 years ago
I changed the design, so that interfaces are not added to bat0, but to a bridge that is added to bat0. This should be a feasible workaround. Will report in some days.
Updated by Antonio Quartulli almost 9 years ago
@Stefan Hoffmann: is your kernel crash exactly the same as the one shown by Paulo in Bildschirmfoto vom 2016-01-09 12_40_22.png ? If not, could you please share the stacktrace?
Sounds like something you can reproduce easily, right ?
Would you mind explaining a bit more about your setup?
What do you mean with "Supernode"?
How are its interfaces configured?
What do you exactly do to create the crash?
How long does it take to crash ?
@Paulo: this is not really the same because batman-adv will consider all the peers behind the bridge like being on the same link and won't be able to distinguish the interfaces. But it might be a temporary workaround to avoid the crash.
Updated by Paulo da Silva over 8 years ago
Sorry. I lost my test-bed. The system is „productive“ now with 300 nodes connected. As I described, I connect all subinterfaces to a bridge, witch is „stable“ connected to the batman-interface. I'll try to get an new system, but not sure, if I'll try the old setup, as the new one is working very stable.
Updated by Sven Eckelmann over 8 years ago
Maybe waiting for v2016.2 is also not a bad idea. At least I hope to get following patch (or a variant of it) merged for this release: https://patchwork.open-mesh.org/project/b.a.t.m.a.n./patch/1464588694-19855-1-git-send-email-sven@narfation.org/
It tackles a weird memory corruption problem. A memory corruption problem like the one you may have here.
Updated by Sven Eckelmann over 8 years ago
- Related to Bug #223: Kernel Crash when using more than one interface in bat0 added
Updated by Sven Eckelmann over 8 years ago
- Related to Bug #217: Oops: "Unable to handle kernel paging request" in batadv_tt_local_remove added
Updated by Sven Eckelmann over 8 years ago
- Status changed from New to Feedback
- Assignee changed from Simon Wunderlich to Paulo da Silva
batman-adv 2016.2 was released last week. I suspect that this release fixes this problem. At least I have reports from Freifunk Darmstadt and Freifunk Chemnitz that an included patch solved a similar problem for them.
This ticket doesn't seem to show a lot activity anymore and thus I would like to close it soon to avoid a dead but still open ticket without a chance to mark it as fixed. I will wait until mid of July for feedback but will close this ticket if nothing happens.
Updated by Sven Eckelmann over 8 years ago
- Status changed from Feedback to Closed
Closing due to inactivity (and success reports from #223)