Project

General

Profile

Actions

Bug #163

closed

Starving routes since "batman-adv: avoid temporary routing loops by being strict on forwarded OGMs"

Added by Linus Lüssing over 12 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
Start date:
09/02/2012
Due date:
% Done:

0%

Estimated time:

Description

In a four node setup (see attached topology.png/dia) I'm experiencing starving routes. Node A frequently looses track of mostly node D, but also C. This seems to happen as soon as node B switches its route towards D from one interface to the other.

Checking with 'batctl td' it looks like before commit:f76d019 any OGM leading to a "Changing route towards..." event got forwarded. With commit:f76d019 they do not get forwarded anymore.

Additionally looking at the provided logs it looks weird, that the OGM of node D received the second time via C's other interface at node B results in such a "Changing route towards..." event at all even though this new one has the same TQ - 225 - as the old one. This then leads to route flapping with every new pair of OGMs of originator D.

These two things together very often (that is when no packet loss is present which is very often the case in this setup) lead to node A not getting any OGM at all from originator D for instance. The route from node A to D starves.

The attached number-of-route-changes-B-to-D.svg visualizes when a lot of route change events happened for the provided logs. So for instance at about 1500s (or uptime 113499280 or seqno 2259557124) in nodeB.log is a time when the route towards D starves for node A.

I tried reproducing the same setup with virtual machines and wirefilter and a '-d 50' parameter to simulate the second, slower wifi interface from the real setup, but mostly unsuccessfully so far. There are some OGMs not getting through on the route between A and D but only for a few seconds. Instead of a second route switch with the OGM from the alternate interface, like in the physical setup, I'm getting a "Drop packet: packet within seqno protection time (sender: fe:fe:00:00:03:04)" on node B (which is the intended behaviour?).


Files

bat-hosts (324 Bytes) bat-hosts Linus Lüssing, 09/02/2012 08:30 PM
nodeA.log.xz (322 KB) nodeA.log.xz The node which has starving routes (towards C/D) Linus Lüssing, 09/02/2012 08:30 PM
nodeB.log.xz (992 KB) nodeB.log.xz The node creating the route starvation Linus Lüssing, 09/02/2012 08:30 PM
nodeC.log.xz (539 KB) nodeC.log.xz The node which is sometimes missing for node A Linus Lüssing, 09/02/2012 08:30 PM
nodeD.log.xz (546 KB) nodeD.log.xz The node which is frequently missing for node A Linus Lüssing, 09/02/2012 08:30 PM
topology.dia (1.89 KB) topology.dia Topology of the virtual and real setup Linus Lüssing, 09/02/2012 08:30 PM
topology.png (6.21 KB) topology.png Topology of the virtual and real setup Linus Lüssing, 09/02/2012 08:30 PM
number-of-route-changes-B-to-D.svg (155 KB) number-of-route-changes-B-to-D.svg Linus Lüssing, 09/02/2012 09:00 PM
Actions #2

Updated by Linus Lüssing over 12 years ago

Update:

  • commit:716c8c9a8bb7ac1e30e959e50ed74caa7dabe60a: Fixed the observed "Drop packet: packet within seqno protection" issue, making things reproduceable within kvm.
  • [PATCH] batman-adv: Fix symmetry check...: Fixes the observed route flapping issues.

With these changes, the cause of the starving route issue seems to become clearer:

This issue occures every time node B switches to the slower (i.e. higher latency) link towards C (i.e. the -d50 wirefilter link in kvm). (Which happens when a single OGM ocasionally gets lost on the faster link, I guess, even in a kvm/wirefilter setup with no packet loss configured.)

This then results in:

while (no packet loss on: fast link && slow link):
  • OGM via fast link gets accepted, seqno updated, but no route switch and not rebroadcasted [bc. of (!is_from_best_next_hop && !is_single_hop_neigh) in batadv_iv_ogm_forward -> 2nd return statement]
  • OGM via slow link gets dropped as a duplicate, does not get rebroadcasted either

Which means no OGM ever gets forwarded to A until a packet loss on the slow link occures.

Actions #3

Updated by Antonio Quartulli over 11 years ago

  • Status changed from New to Resolved

fixed in batman-adv 2013.3.0

Actions #4

Updated by Linus Lüssing over 11 years ago

Just for the record, this commit has very likely fixed it: commit:3d999e5116f44b47c742aa16d6382721c360a6d0

Actions #5

Updated by Antonio Quartulli over 11 years ago

  • Status changed from Resolved to Closed
Actions #6

Updated by Sven Eckelmann almost 8 years ago

  • Target version set to 2013.3.0
Actions

Also available in: Atom PDF