Advanced Ethernet Client Detection

General

This feature adds a mechanism to monitor wired clients connected status through periodic arp request and responses.

As per earlier design, ethernet clients connected to OpenSync nodes age out from MAC learning table because of inactivity timeout. This causes cloud controller to remove the device from its client list. This feature adds automated intelligent logic to keep the device active in the list as long as it is connected.

This feature works for both OVS and Linux SDN mode of operation.

Forwarding Database and OVS_MAC_Learning Table

At the core is layer 2 forwarding database (FDB), which is partially, as far as it is about locally connected ethernet clients, reflected in OVS_MAC_Learning table.
There is a different command to dump the FDB for OVS and Linux SDN bridge, ovs-appctl fdb/show and brctl showmacs .

OVS example:

root@node:~# ovs-appctl fdb/show br-home port VLAN MAC Age 204 0 aa:aa:aa:aa:aa:aa 166 210 700 bb:bb:bb:bb:bb:bb 161 30 0 cc:cc:cc:cc:cc:cc 57 LOCAL 0 dd:dd:dd:dd:dd:dd 14 root@node:~#

Linux SDN example:

Linux SDN example: root@node:~# brctl showmacs br-home port no mac addr is local? ageing timer ... 4 aa:aa:aa:aa:aa:aa yes 0.00 6 bb:bb:bb:bb:bb:bb yes 0.00 2 cc:cc:cc:cc:cc:cc no 8.89 7 dd:dd:dd:dd:dd:dd no 192.35 7 ee:ee:ee:ee:ee:ee no 4.01 7 ab:ab:ab:ab:ab:ab no 81.55 1 ac:ac:ac:ac:ac:ac yes 0.00 root@node:~#

From the many entries in the FDB only those filtered as ethernet clients are then inserted in the OVS_MAC_Learning table.

root@node:~# ovsh s OVS_MAC_Learning ------------------------------ _uuid | 2057~a172 | _version | 4b09~3f89 | brname | br-home | hwaddr | dd:dd:dd:dd:dd:dd | ifname | eth1 | vlan | 0 | ------------------------------ root@node:~#

Packet Age Out

Note column Age in OVS resp. ageing timer in Linux SDN FDB. This is updated upon every new packet received. If there is no new packets, Age / ageing timer is increasing. An entry is being deleted when its Age / ageing timer reaches certain age out limit.

Setting Age / ageing timer limit

Typical default value for Age / ageing timer limit is 300 seconds. On reaching this value, an entry will be deleted from FDB.
It is possible to set the value different to 300.
For OVS it is set through updating Bridge table, column other_config, key mac-aging-time. For Linux SDN the command is brctl setageing .

OVS Example:

First inspect Bridge other_config, then update with mac-aging-time value:

If mac-aging-time key-value pair is not present in other_config, it means the age out limit value is the default - 300 seconds. Changing it now to 30 seconds, note adding `["mac-aging-time","30"]` :

Linux SDN example:

To check current age out limit, check ageing time value in brctl showstp br-home output:

To change it to 30 seconds, for example, launch brctl setageing

Why Not Simply Set Longer Packet Age Out Limit

Setting a longer ageing time may solve the problem but it could also generate false positives if the client gets disconnected in between.

What About Having Short Lease Time

Having DHCP lease time set to less than 5 minutes would force clients to send out IP renew requests more frequently, hopefully in shorter intervals than FDB age out limit. However this is not an ideal solution as many times DHCP server would be provisioned and managed externally and also for ethernet based clients lease time is typically longer than FDB packet age out time.

Solution

Sending some packets to the inactive ethernet client so that it needs to respond, resolves the issue.

The question is what packets to send. The most straightforward would be a broadcast ping - simply ping 255.255.255.255 , hoping everyone receiving that packet would reply. Unfortunately broadcast ping is dropped by default by the vast majority of devices for security reasons. For example, https://github.com/ThomasHabets/arping/blob/arping-2.x/README#L113 . Linux devices should explicitly set /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts 0 to respond. No way this can be done with our ethernet clients.

So it seems we need to know IP address of our ethernet client, i.e. destination IP for our traffic. But there is no IP address neither in FDB nor in OVS_MAC_Learning table nor in any other OVSDB table in general case.
Also, for example, looking at ARP table: if there was no traffic with IP endpoints at the client and the node, even though the node can forward lots of traffic from the ethernet client, its IP address will not land in the ARP table - simply because the OpenSync node was never in a need to know that specific IP address of its ethernet client.

Move to the Gateway

As explained, for a general OpenSync node with ethernet client connected, there is no guarantee that besides the client MAC address from FDB also its IP address will be available in any way.

We can move to the gateway node and read from DHCP_leased_IP table to have both. However, in the gateway we are not able to distinguish what kind of client it is - is it an ethernet or a wifi client.

Besides, when reading FDB packet age out limit to calculate how often to touch the client with some traffic, this can be different on the gateway and on the node where the ethernet client is actually connected to. Luckily, it seems so far these values are in practice kept default 300s - no use cases were detected either in OpenSync nor Cloud controller software code where these values get configured.

To resume, good to be aware of the following trade-offs and assumptions:

  • to rescue ethernet clients, we need to treat all clients, as we are not able to distinguish between wifi and ethernet clients in the DHCP_leased_IP or anywhere else on the gateway. At this stage we do not try any kind of communication between the gateway and other OpenSync nodes.

  • assuming FDB packet age out limit is the same on all OpenSync nodes in the location

For more optimal treatment, a higher level view would be needed, perhaps could be orchestrated from the Cloud or through some other way of communication between OpenSync nodes

Arping

Now that we have client MAC and IP address, some traffic needs to be sent that the client is willing to respond to. Without client responding, there will be no FDB packet age reset.
The simplest and the shortest seems to be ARP protocol - sending ARP Request who-has <IP> .
Arping from Busybox can do that:

Here is how this looks on the ethernet client:

 

We could also choose normal ICMP ping instead of arping, however ICMP pings are quite often dropped by firewalls at the receiving side, even if it is not a broadcast ping. So arping is preferred over ICMP ping.

Arpinging Efficiently

As the need to arping all the clients due to being unable to distinguish between ethernet and wifi on the gateway is demanding enough by itself, care must be taken to do the arpinging in a way to disturb other functionalities as little as possible. In other words, arpinging must be done as quick as it goes.

For this reason, taking arping of the system shell is not acceptable. Not only we can save time avoiding shell command overhead, we can also save time by not waiting to receive arping response. Also, given that majority of ARP frame content is the same, only destination MAC and destination IP address vary per client. This can also be considered to save time, in particular at all the frames using the same socket. Doing some rough measurements through recording timestamps calling OpenSync’s clock_mono_usec it was found QCA platforms do arpings slower than BCM. In particular, closing file descriptor after all arpings sent can take around 40 ms at QCA and 8 ms at BCM. Altogether, sending 200 arpings should take around 500 ms.

Feature Flag to Switch Arpinging OFF / ON

If desired, arpinging clients can be disabled. Operation requires device reboot. Currently it can only be done in ssh session, support from cloud pending. The feature flag is called nm2_arping_clients. Here are the commands:

How to Debug

On the gateway node, log_severity TRACE has to be set for manager NM. For example:

Corresponding log readings can be isolated grepping arping. For example, acquiring two cycles with default period 150 seconds:

 

Northbound API

No additional changes are required

Southbound API

No additional changes are required

Requirements

No additional requirements for this feature