Robustness of WAN configuration (VLAN, PPPoe, ...)

General

This feature adds additional validation and rollover mechanism to OpenSync node from wrong network configuration on the WAN side

Explicitly it handles two scenarios:

  • If newly configured IP or gateway address is wrong and it results the node not to be able to connect to the gateway, it would rollback to previous configured settings (earlier it was defaulted to DHCP mode)

  • The validation goes beyond reachability to gateway and check if it could reach to internet

Previous model and state machine

First, we need to understand three relevant components of the existing WANO architecture: a) plugins, b) configurations, and c) plugin pipeline.

A plugin is an abstraction over the type of network protocol used for interface provisioning. At the time of writing the following WAN interface plugins are implemented: VLAN, PPoE, static IPv4, CMTS, DHCPv4, DHCPv6 (listed in the order of highest to lowest priority). In addition, there’s also an eth client plugin running at the highest priority which is tasked with detecting an ethernet client connected to the interface being provisioned.

A configuration is a cached entry from the WAN_Config table which holds information needed by different plugins, e.g. PPoE credentials or static IPv4 address. WAN config cache is kept in sync with the actual WAN_Config table, meaning that WANO considers all existing table entries as valid configs. Importantly, just like plugins configs are ordered and considered by their priority as well.

Plugin pipeline is the main state machine driving the WAN port provisioning by providing a common state to different WANO components. Upon entering START state, restarted plugins are placed in a waiting queue. If the interface is enabled and carrier is detected, state machine enters PLUGIN_SCHED and a waiting plugin with the highest priority is put into a running queue and eventually run when the PLUGIN_RUN state is entered.

wano_ppline-20231109-194705.png

 

 

If the plugin reports success before a timeout is reached, WANO updates the Connection_Manager_Uplink table to green-light CM with establishing a cloud connection. If plugin times-out or reports a failure, state machine transitions to PLUGIN_SCHED and eventually runs the plugin with the next highest priority. Importantly, if there is no plugin left to run, IDLE state is entered which triggers a configuration update, so that when START state is entered next, plugins are run in the order from the highest to the lowest priority with the next configuration. Configurations are also selected in the order of their priority, starting with the highest. If all the plugins are run with the last configuration, i.e. configuration with the lowest priority, but none reports success, a rollover event occurs and the highest priority configuration is selected next, so the process moves through the priority sorted list of configurations again in the same order until a plugin succeeds.

Enhancement to the existing model

  1. An immediate and straightforward improvement of additional static checks to the configuration parsing components of respective plugins was added. For example, PPoE plugin checks that strings containing credentials are not empty.

  2. A plugin reports success by dispatching a success message via the

wano_ppline_event_dispatch(...) function. The messages are modeled as asynchronous events (libev signals) managed by a callback/handler function wano_ppline_status_async_fn(...). Currently, no additional checks are performed after success is reported and the plugins are responsible for their own validation. Second proposition is to impose an additional check in the code handling the plugin success signal (see simplified pseudo-definition below):

void wano_ppline_status_async_fn(...) { switch (PPLINE_EVENT) { /* Plugin reported success */ case WANP_OK: { /* Stop the timeout timer for the plugin */ ev_timer_stop(PLUGIN_TIMEOUT_TIMER); /* INSERT WAN PROBE HERE */ ... /* Update Connection_Manager_Uplink table */ WANO_CONNMGR_UPLINK_UPDATE( update_ifname, .if_type = iftype, .has_L2 = WANO_TRI_TRUE, .has_L3 = WANO_TRI_TRUE, .loop = WANO_TRI_FALSE)); ... /* Notify upper layers */ wano_ppline_event_dispatch(WANO_PPLINE_OK); return; } ... } }

 

A WAN probe could be a simple blocking (synchronous) attempt to ping a known hostname (e.g., executing ping -w 5 -c 2 duckduckgo.com in a child process and observing its exit code). In case of a successful probe the same path is taken as before. In case of failure we trigger a timeout timer associated with the plugin. This will result in the state machine scheduling next plugin or configuration, treating the plugin as if it would report a failure. This way we introduce an additional check that is completely opaque to the plugins and the rest of the WANO code, making it as non-intrusive and self-contained as possible.

  1. Currently, DHCP plugins will are run at the end of the pipeline for every configuration. This way introducing an invalid configuration to a location with a working non-DHCP configuration will likely result in the intended plugin failing, but then falling-back to DHCP instead of the previously working configuration. This can be solved by only running DHCP plugins at the rollover event, before trying again with the highest-priority configuration. While this will generally lead to a more consistent fallback to a previously working configuration, there are cases where this will not be the case. For example, a location might have two non-DHCP configurations, such that WANO failed to provision a WAN port with the highest-priority configuration, but succeeded in doing so with the second one. If a third configuration is introduced and fails, it could happen that the configuration with the previously highest priority would succeed, before WANO falling back to the previously working configuration. While this might not follow the requirements to the letter, it is arguably a more correct approach.

Northbound API

No Northbound API changes envisioned.

Southbound API

Only internal logic of WANO was modified, the interface with other modules/system stays the same.

The following improvements were implemented:

  1. Input validation was added for PPoE and IPv4 configurations in wano_wan.c :: wano_wan_config_from_schema():

    + uname_len = strlen(username); + pword_len = strlen(password); + + /* Check that credentials' lengths are valid. */ + if (uname_len < 1 || uname_len > 128 || pword_len < 1 || pword_len > 128) + { + LOG(ERR, "wan_config: Invalid PPPoE `username` or `password`."); + return false; + } ... + /* Check if the gateway is reachable with the current settings */ + int prefix = osn_ip_addr_to_prefix(&netmask); + osn_ip_addr_t ipnet = ipaddr; + osn_ip_addr_t gwnet = gateway; + + ipnet.ia_prefix = prefix; + gwnet.ia_prefix = prefix; + + ipnet = osn_ip_addr_subnet(&ipnet); + gwnet = osn_ip_addr_subnet(&gwnet); + + if (osn_ip_addr_cmp(&ipnet, &gwnet) != 0) + { + LOG(ERR, "wan_config: Static IPv4 gateway/ipaddr subnet mismatch."); + return false; + }

     

  2. A DNS probe was added to wano_ppline.c :: wano_ppline_status_async_fn. It uses c-ares library to try to resolve address from the AWLAN_Node::redirector_addr and upon failure to do so, refuses to accept a plugin reporting a success. DNS resolution is performed synchronously, albeit using an async resolver library (c-ares). The library was used because we wanted to be able to bind the probe to a specific interface, which doesn’t seem to be possible with getaddrinfo(3).

  3. Logic was added to check if there are still pending plugins other than DHCP with wano_wan_is_last_config, but the essential change is in wano_ppline.c :: wano_ppline_state_PLUGIN_RUN where this information is used to only run DHCP plugin if all other plugins have already been exhausted (otherwise, a rollover event occurs and other plugins are tried before DHCP):

    + /* Only upon exhaustion of all plugins for all configs, try with DHCP */ + if (wano_wan_is_last_config(self->wpl_wan) && wano_ppline_runq_start_dhcp(self)) + { + break; + }

Requirements