Project

General

Profile

Actions

Bug #228

closed

Workqueue: bat_events batadv_send_outstanding_bat_ogm_packet

Added by Paulo da Silva almost 9 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Normal
Target version:
Start date:
01/09/2016
Due date:
% Done:

0%

Estimated time:

Description

Since a few days I have several kernel-crashes in the same point of code. I'm using 2015.1 from linux-image-4.2.0-0.bpo.1-amd64/4.2.6-3~bpo8+2 debian/8.2

Attached a kernel-trace.


Files

trace-2016-01-09-01.txt (3.82 KB) trace-2016-01-09-01.txt Trace Paulo da Silva, 01/09/2016 02:33 PM
Bildschirmfoto vom 2016-01-09 12_40_22.png (16.3 KB) Bildschirmfoto vom 2016-01-09 12_40_22.png console screen-shot Paulo da Silva, 01/09/2016 10:28 PM

Related issues 2 (0 open2 closed)

Related to batman-adv - Bug #223: Kernel Crash when using more than one interface in bat0Closed08/20/2015

Actions
Related to batman-adv - Bug #217: Oops: "Unable to handle kernel paging request" in batadv_tt_local_removeClosedA Z06/04/2015

Actions
Actions #1

Updated by Sven Eckelmann almost 9 years ago

  • Assignee set to Simon Wunderlich

This is not a crash but a warning shown when a queued packet (ogm) for an interface doesn't "belong" anymore to its original batman interface: https://kernel.googlesource.com/pub/scm/linux/kernel/git/tytso/ext4/+/v4.2.6/net/batman-adv/bat_iv_ogm.c#522

This warning was introduced by: https://git.open-mesh.org/batman-adv.git/commit/29b9256e6631876d4f1719f4d5e13d7ee140c61b

Actions #2

Updated by Paulo da Silva almost 9 years ago

Sorry! I looked at the trace again and found, that short after that warning, there was an other similar warning. I have no experience with kernel traces, but it looks like a crash.

Unfortunately I have only a screen-shot. Maybe this is related?

Actions #3

Updated by Paulo da Silva almost 9 years ago

I have setup both server to save crashes in /var/crashes in case someone will need that information for debugging. Server are crashing ca. once a day.

Actions #4

Updated by Stefan Hoffmann almost 9 years ago

We have the same issue. Every Supernode is Crashing with the same Kernel Panic every time a Interface is removed from batman.

Actions #5

Updated by Antonio Quartulli almost 9 years ago

Stefan,
not sure you have tried all the latest bugfixes?
In any case, they can be found in the new 2016.0 release.

Actions #6

Updated by Stefan Hoffmann almost 9 years ago

Hi,

i,ve tested the Release today, but i have the same issue.

Actions #7

Updated by Paulo da Silva almost 9 years ago

I changed the design, so that interfaces are not added to bat0, but to a bridge that is added to bat0. This should be a feasible workaround. Will report in some days.

Actions #8

Updated by Antonio Quartulli almost 9 years ago

@Stefan Hoffmann: is your kernel crash exactly the same as the one shown by Paulo in Bildschirmfoto vom 2016-01-09 12_40_22.png ? If not, could you please share the stacktrace?

Sounds like something you can reproduce easily, right ?
Would you mind explaining a bit more about your setup?
What do you mean with "Supernode"?
How are its interfaces configured?
What do you exactly do to create the crash?
How long does it take to crash ?

@Paulo: this is not really the same because batman-adv will consider all the peers behind the bridge like being on the same link and won't be able to distinguish the interfaces. But it might be a temporary workaround to avoid the crash.

Actions #9

Updated by Sven Eckelmann over 8 years ago

Did anyone try v2016.1?

Actions #10

Updated by Paulo da Silva over 8 years ago

Sorry. I lost my test-bed. The system is „productive“ now with 300 nodes connected. As I described, I connect all subinterfaces to a bridge, witch is „stable“ connected to the batman-interface. I'll try to get an new system, but not sure, if I'll try the old setup, as the new one is working very stable.

Actions #11

Updated by Sven Eckelmann over 8 years ago

Maybe waiting for v2016.2 is also not a bad idea. At least I hope to get following patch (or a variant of it) merged for this release: https://patchwork.open-mesh.org/project/b.a.t.m.a.n./patch/1464588694-19855-1-git-send-email-sven@narfation.org/

It tackles a weird memory corruption problem. A memory corruption problem like the one you may have here.

Actions #12

Updated by Sven Eckelmann over 8 years ago

  • Related to Bug #223: Kernel Crash when using more than one interface in bat0 added
Actions #13

Updated by Sven Eckelmann over 8 years ago

  • Related to Bug #217: Oops: "Unable to handle kernel paging request" in batadv_tt_local_remove added
Actions #14

Updated by Sven Eckelmann over 8 years ago

  • Status changed from New to Feedback
  • Assignee changed from Simon Wunderlich to Paulo da Silva

batman-adv 2016.2 was released last week. I suspect that this release fixes this problem. At least I have reports from Freifunk Darmstadt and Freifunk Chemnitz that an included patch solved a similar problem for them.

This ticket doesn't seem to show a lot activity anymore and thus I would like to close it soon to avoid a dead but still open ticket without a chance to mark it as fixed. I will wait until mid of July for feedback but will close this ticket if nothing happens.

Actions #15

Updated by Sven Eckelmann over 8 years ago

  • Status changed from Feedback to Closed

Closing due to inactivity (and success reports from #223)

Actions #16

Updated by Sven Eckelmann almost 8 years ago

  • Target version set to 2016.2
Actions

Also available in: Atom PDF