/
Healthcheck Service

Healthcheck Service

General

The OpenSync Healthcheck Service is a component responsible for monitoring the health and connectivity status of key system components, access points, and other general network functionalities. It ensures that devices remain operational and connected to the cloud by periodically verifying critical functionalities. If one or more of those functionalities is not working as intended for an extended period of time, a vendor-specific fatal action, such as a reboot, is triggered in an attempt to recover the device.

When enabled, the service runs as a daemon which is started along with other processes in init.d/ . The entire service consists of shell scripts which is why it is completely independent from other OpenSync processes and cloud interaction. It does however verify the values of many OVSDB tables which is why it will most likely report failures in case OpenSync is not running correctly.

The service iterates through the list of before mentioned shell scripts, runs them, and reports a failure if one of the scripts exits with a non-zero status code. The scripts are located on the file system path:

/usr/opensync/scripts/healthcheck.d/

Here are a few specific examples of the shell scripts' functionalities:

  • checking that computer clock time has been synchronized with a time server

  • checking that access point interfaces found in Wifi_Radio_Config are up and associated with radio interface

  • performing a DNS check

  • checking that there is sufficient space left on the device

Test cases details are in the bash scripts and high level description is on the top, where other details as comments or plain code. Healthcheck scripts are divided into 3 different section or github repositories:

  1. OpenSync core - generic for all platforms opensync/rootfs/kconfig/SERVICE_HEALTHCHECK/INSTALL_PREFIX/scripts/healthcheck.d at master · plume-design/opensync

  2. Platform specific (Qualcomm, Broadcom, OpenWRT, RDK, …)

  3. Vendor specific (each ODM can add their own tests)

The Healthcheck Service is managed by kconfig option SERVICE_HEALTHCHECK and is enabled by default.

The service can also be disabled in runtime only by calling the init script with the stop parameter:

/etc/init.d/healthcheck stop

It can be reenabled by calling the same script with the start parameter:

/etc/init.d/healthcheck start

Note: Healthcheck Service will not remain disabled after reboot when it was disabled in runtime.

Check interval is set to 1 minute and after 10 failed tests, it will write reboot reason (failed test case) and reboot the device. Interval and number of failed test cases are hardcoded in healthcheck.service as part of this release.

Northbound API

It reuses the reboot reason feature from OpenSync 2.0 and will mention type as HEALTH_CHECK and populate reason with test case in OVSDB.

Southbound API

The Healthcheck Service uses no platform-specific low-level functions since it is comprised entirely of shell scripts.

Requirements

The requirements for this feature to work is the presence of the ovsh tool on the device and common BusyBox commands such as arp, ping, timeout and others.

Related content

Node Uptime Reporting
Node Uptime Reporting
More like this
Watchdog Proxy Daemon
Watchdog Proxy Daemon
More like this
OpenSync 5.4 Release Notes
OpenSync 5.4 Release Notes
More like this
Check Compatibility
Check Compatibility
More like this
OpenSync 5.2 Release Notes
OpenSync 5.2 Release Notes
More like this
3rd Party Service Manager
3rd Party Service Manager
More like this