Cloudless Router Operation - WiFi clients

Design

Cloudless router operation provides the following:

  1. On loss of internet access for any length of period, the node will continue to work on the LAN side. The node keeps the network in the same state which means any possible leaf connections are kept.

  2. On a single or multiple reboots of a node with no internet access, the node configures itself using a stored subset of the OVSDB config (provisioned by the SDN controller when the node was still connected). This config provides a limited set of functionalities. It sets up the default home LAN, so that the node continues to work on the LAN side:

    1. Further use of the previously active bridge/router configuration

    2. Home SSID & primary PSK (only a single zone and no other zones)

    3. Existing static DHCP reservations and user-configured subnet

    4. User-configured primary/secondary DNS servers

Features that are not supported (including but not limited to):

  • Port forwarding entries

  • IPv6 LAN support

  • Multi-PSK passwords

  • Onboarding or backhaul SSID bring-up

  • OpenFlow rules besides the default

  • FSM configuration

  • Steering configuration

OpenSync 2.4 only supports cloudless router operation for WiFi clients. Support for Ethernet clients has been introduced with OpenSync 3.2.

Implementation Details

When the gw_offline feature flag is enabled and when a GW node is normally connected to the controller, the node monitors a subset of the OpenSync OVSDB, and actively stores it to persistent storage if the node detects changes in the subset of the monitored config. The OpenSync 2.0 osp_ps persistent storage API is used. For optimization and saving of flash cycles, the actual saving to persistent storage is done only if the new config differs from the saved config. The config is saved in JSON format.

The feature is currently implemented within the Platform Manager (PM).

However, Connectivity Manager (CM) triggers the gw_offline state because CM has information about controller connection availability and stability. CM only triggers the gw_offline state after certain connectivity checks fail within certain timeouts.

CM triggers the gw_offline state in two cases:

  • If the node was (re)booted without controller connectivity. In this case, PM applies the stored config.

  • If the node had normal controller connectivity (is configured by the controller) and connectivity is lost without a reboot. In this case, PM does not apply the stored config because the node already runs in a configured state. However, PM still enters the gw_offline state. In this state, the node handles some of the offline tasks.

To restore a saved configuration in OpenSync OVSDB while in gw_offline mode: certain OVSDB tables or rows can be simply replayed into OVSDB.

OVSDB Interface

The gw_offline feature monitors a subset of OpenSync OVSDB config on the GW node, and also configures the OpenSync OVSDB config on the GW node. When in gw_offline mode, the config is applied from the stored config.

The gw_offline feature has OVSDB interface for:

  • Enabling/disabling the feaure

  • Enabling/disabling the monitoring of OVSDB config

  • OVSDB interface for CM ↔ PM communication. OVSDB uses a flag for CM to trigger the gw_offline mode in PM, and a flag for PM to report:

    • The feature status (on/off)

    • Whether CM has the config available or if the config has actively been applied.

Node_Config Table

Node_Config::module/key

Values

Set by

Description

Node_Config::module/key

Values

Set by

Description

Node_Config::PM/gw_offline_cfg

[false/true] (persistent)

Controller

The gw_offline on/off flag. 
PM persists this value and restores it even if the node is rebooted with no internet/controller connectivity.

To actually make this value persistent (done by PM), the cloud should set the persist value to true.

Node_Config::PM/gw_offline_mon

[false/true]

Controller

Enables reading and monitoring of a subset of the OVSDB config. Also stores the config to persistent storage.

It is recommended to set the values after SSID/PSK/LAN/etc. have been configured, but that's not a requirement. It can be set anytime after the gw_offline_cfg has been set to true.

Node_Config::PM/gw_offline

[false/true]

CM

Trigger applying the stored subset of the OVSDB config (valid if Node_Config::PM/gw_offline_cfg==true && Node_State::PM/gw_offline_status==ready).

Node_State

Node_State::module/key

Values

Set by

Description

Node_State::module/key

Values

Set by

Description

Node_State::PM/gw_offline_cfg

[false/true]

PM

Acknowledge enablement of the feature by PM.

Node_State::PM/gw_offline_status

[disabled/enabled/ready/active/error]

PM

Provides the status of the feature, such as it is disabledenabled but no config in storage, ready means enabled and has config in storage, and active means it has actively applied it's config. error means it failed to load&&apply stored config or failed to read&&store config or other errors.

PM will indicate "ready" in Node_State::PM/gw_offline_status when the feature is enabled (Node_Config::PM_gw_offline_cfg==true -- persistent) and it has stored config available.

PM will indicate "active" in Node_State::PM/gw_offline_status when the feature is enabled (Node_Config::PM_gw_offline_cfg==true -- persistent) and CM has triggered config apply by setting Node_Config::PM_gw_offline:=true and PM has successfuly applied it's config.

Flow

Node_Config

  1. Cloud enables gw_offline_cfg because a user requires this feature, and enables gw_offline_mon at the same time.

  2. PM immediately goes and collects all the necessary info to perform the initial writing to persistent storage.

  3. PM monitors OVSDB for any updates to that config, and updates persistence whenever there is a change in values.

  4. System is power cycled.

  5. PM comes up with gw_offline_cfg == enabled based on previously saved value, but gw_offline_mon == false.

  6. CM waits and gets controller connection, so that gw_offline never gets set to true.

  7. Cloud sets gw_offline_mon = true, then configures SSID/PSK/LAN/etc.

  8. PM gets those updates, and checks if those values differ from what's in config. If yes, re-write config, if no, leave config alone.

  9. System is power cycled.

  10. PM comes up with gw_offline_cfg == enabled based on the previously saved value, but gw_offline_mon == false.

  11. CM waits and does not get a controller connection, and sets gw_offline == true.

  12. PM reacts to gw_offline == true by applying the configuration from persistent storage.

  13. Once controller connection comes back, and CM sets gw_offline == false.

  14. PM can either:

    1. Do nothing if the controller properly supports over-writing the OVSDB (which could be the case).

    2. As a backup plan, PM could decide that it's best to trigger a restart of managers. This leaves all the offline logic in one place instead of splitting it between PM and CM.

  15. User disables feature, and cloud sets gw_offline_cfg = false.

  16. PM gets update that gw_offline_cfg = false and will then persist this value as false, so the feature is disabled.

  17. PM has two options:

    1. Leave the persistent config on flash, as the feature isn't enabled anyway and the config is replaced as soon as it gets enabled again

    2. Remove the persistent config altogether so there's no remnants of the previous config.

Node_State

  • PM::gw_offline_cfg (true/false): This field gets the same value as Node_Config once applied.

  • PM/gw_offline_status (disabled/enabled/ready/active/error): This field provides the status of the feature, such as disabled, enabled but no config in storage, ready meaning enabled and has config in storage, and active meaning it has actively applied its config.

The GW offline configuration is applied when:

  • Configuration for GW offline was created by PM.

  • Each skip of recovery mechanism triggers a counter value increase

    • When counter == CM2_GW_OFFLINE_RETRY_THRESHOLD (defined as 3), GW offline configuration is applied.

Added definition of threshold to finish the GW offline state. It is a protection from perpetually keeping a node in an unexpected state.

CM2_GW_SKIP_RESTART_THRESHOLD (default is 360 iterations). Usually it is called every 2 min so recovery will be triggered after 12 h.

Applying the Config Procedure (Simplified Overview)

  • Start on a prepopulated OVSDB (at managers start-up)

  • Delete all rows of Wifi_VIF_Config.

  • Update Wifi_Inet_Config--where if_name==LAN_BRIDGE with stored config.

  • Update: Wifi_Radio_Config with the stored config.

  • Create VIFs for home APs according to stored config:

    • Insert into: Wifi_VIF_Config.

    • Add created VIF uuid to Wifi_Radio_Config (restore/setup references between tables).

    • Add this home VIF interface to lan bridge (ovs-vsctl).

  • Add Home AP entries to Wifi_Inet_Config from stored config.

  • Update DHCP_reserved_IP from the stored config.

Supported Config Parameters to be Saved and Restored

Description

Type

OVSDB Tables

Details

Description

Type

OVSDB Tables

Details

WiFi Radios

Replay
Device Logic

Wifi_Radio_Config
Wifi_VIF_Config

Wifi_Radio_Config can be replayed.

Device logic needs to restore the references between tables: For VIFs configured in Wifi_VIF_Config, uuid references need to be restored/added in Wifi_Radio_Config.

If DFS channels were configured, they might require some additional device logic as they might not be always available. However by default driver will handle DFS automatically in the usual way, i.e., do CAC, run away to another channel, stop operation if it runs out of channels etc.

Home SSID

Replay 
Device logic

Wifi_VIF_Config
OVS tables

HomePass example: Wifi_VIF_Config where bridge==LAN_BRIDGE is restored (replay).

Device logic: Configured home-ap VIF interfaces need to be added to LAN_BRIDGE (ovs-vsctl).

WPA Mode

Replay

 

The WPA mode is a setting in the Wifi_VIF_Config table. The device must ensure that the same WPA mode is used when re-creating the SSID.

DFS Mode

Not Applicable

 

 

Network Mode: Bridge, Router, Auto

Replay

Wifi_Inet_Config
OVS_Tables

There's no corresponding setting on the device. Re-applying network setting will generally realize the last network mode.

IP Reservation

Replay

DHCP_reserved_IP

This is a trivial replay of the DHCP_reserved_IP table.

LAN Subnet

Replay

Wifi_Inet_Config where if_name==br-home

A replay of the Wifi_Inet_Config entry for br-home.