sunMAX - Local Administration & Monitoring of Solar Gateway

Overview

This article will describe how to directly access, configure, and retrieve data from a Solar Gateway.

In the current product, the Solar Gateway does two things in parallel:

  1. It repeatedly reads the current state of every microinverter and BLE-capable end-run connector (SM-EC-EU) it knows about, every 60 seconds, via Bluetooth Low Energy (BLE). Each of these status updates is called a sample.
  2. It attempts to connect to the UBNT sunMAX cloud backend and forward all the samples it has received. Additionally, the solar gateway receives its configuration from the cloud: a whitelist of microinverters and end-runs, and country-specific settings to program onto them.

The top-level Ubiquiti firmware daemon responsible for the above is called solar_agent. When it has no cloud connection (due to interrupted internet service, or when the cloud functionality is explicitly turned off), solar_agent stores samples to the internal 1GB Flash drive, writing one file a day. When the connection comes back up, samples are streamed to the cloud and the files are deleted. If there is no internet connection at all and the volume fills up, the oldest files are deleted as necessary to make room for new samples. (Some BLE activity, including panel metrics, is also written to non-persistent log files in memory.)

The Solar Gateway also has a basic web UI for configuration and firmware updates. However, the functionality we’ll cover in this article is only available through the command line interface (CLI). The factory default username and password are ubnt/ubnt. Please set a new username and password before proceeding. In the web UI, you can do this on the Configuration page, under System Account.

suntop

In the tradition of the UNIX top command, suntop shows a full-screen view of the solar_agent’s current state. Initially, suntop will show a list of metrics from all the sunMAX devices (microinverters and endruns) the gateway knows about. This gives you a quick view of the system status:

Initial View.

This shows the most recent metrics from the currently whitelisted microinverters. (See the next section if you need to add microinverters to the whitelist.)

You can use the cursor keys to move up and down, and [enter] to open a side panel with a list of all the values corresponding to that device. If your screen isn’t large enough to see everything, you can use the left and right keys to scroll horizontally, and [tab] to move the cursor back and forth between the device grid and the “All metrics” panel. [esc] closes the metrics panel if it is open; otherwise, it exits the application.

Device metrics.

The metrics starting with ‘ble.’ are pretty self-explanatory -- they show the number of times different BLE events occurred (connections, reads, and various kinds of errors) as well as the most recent signal strength of that device (ble.rssi).

The metrics starting with ‘mi.’ come from the actual sample data read from a device. ‘mi.fac’, for example, is the instantaneous AC line frequency at the time of the sample. Most of the metrics are instantaneous, but two of them are historical:

  1. ‘mi.energy_ac_wsec’ is the total accumulated AC (output) energy, measured in watt-seconds.
  2. ‘mi.time_on_sec’ is the total time in seconds, that a microinverter has been producing AC.

For a full list of metrics, consult the Appendix listed at the bottom of this article.

The first item in the device list is special: it contains the global BLE metrics -- total numbers of reads, writes, errors, etc. since the start of the day.

BLE daemon metrics.

The console command

If you need to manually whitelist microinverters or BLE-capable end-runs, ssh to the Solar Gateway and run console. This will open a Python console (it’s equivalent to typing telnet localhost 2020). For security reasons, this console may only be accessed from the gateway itself (port 2020 is not accessible via Ethernet). Let’s dive in:

This console connects you directly to the solar_agent. Be careful when executing long-running functions, as they may interfere with normal system operation. This guide will concentrate on a handful of useful functions. (note: you can exit the console at any time, by pressing Ctrl-D, or by typing exit()and hitting [return]).

status() – displays the current whitelist and log levels:

whitelist(address) – adds a device to the local whitelist:

Pass whitelist() the full MAC address of the microinverter or endrun (with or without colons between the octets). Within a minute, the Solar Gateway will begin reading samples from the newly-whitelisted device.

clear_whitelist() – clears the local whitelist:

This won’t affect the whitelist automatically sent by the cloud – it will only remove the devices you have manually whitelisted.

Any changes you make to the whitelist with the above commands will persist across reboots – solar_agent stores its whitelist data in /mnt/stash/solar_agent/whitelist.

Solar Gateway log files

All logging is done to /var/log. There are a few logfiles that contain useful information:

  • /var/log/messages: High-level logs from standard system daemons, solar_agent, and sunstatsd. The level of log verbosity can be adjusted for Ubiquiti daemons -- see LOG_LEVEL in /etc/solar_agent/solar_agent.conf and  /etc/solar_agent/sunstatsd.conf.
  • /var/log/ble: Written to directly by the low-level BLE daemon. Includes human-readable listing of most of the sample fields
  • /var/log/solar_agent.crashlog: Reports stack traces and other debug information to diagnose solar_agent crashes/resets.

Each logfile is rotated automatically when it reaches a set size (in firmware 0.9.16, this threshold is 10mb), and compressed via gzip.

Microinverter state periodically written to /var/log/ble:

Integrating with an external monitoring system

So far, we’ve discussed ways to look at the data currently on a Solar Gateway. If you want to monitor a sunMAX installation over the long term locally, the Solar Gateway can also be configured to stream data to a statsd server. This is a UDP-based protocol, popular for hooking up to a wide variety of data graphing solutions.

In addition to solar_agent, a Ubiquiti system daemon called sunstatsd runs on the Solar Gateway. Until configured to run, it simply sleeps. It has its own configuration file with a couple important settings:

ENABLE_SUNSTATSD:

To turn on metrics exporting, set to True.

STATSD_ADDR:

IP address or hostname to send statsd packets to.

GATEWAY_ID:

ID string to use in place of the device’s hardware address in metric paths.

After you modify the config file, sunstatsd needs to be restarted in order to use the new settings. The slow way to make sure this happens is to hit the reset button or run reboot. A faster way is to directly restart sunstatsd by invoking its init script:

/etc/init.d/sunstatsd restart

sunstatsd acts as a proxy: when it’s enabled, it opens a websocket connection to solar_agent’s metrics endpoint. Whenever it receives metrics, it converts them into the statsd format, then immediately sends them to the configured STATSD_ADDR.

If you decide to use sunstatsd, it’s up to you to pick a timeseries database and/or graphing package to connect it to. One way we found to get basic monitoring up and running quickly is this Docker image: kamon.io's StatsD + Graphite + Grafana 2 example.

Appendix: all metrics in firmware 0.9.16

There are two basic categories of metric published by solar_agent: the ones beginning with ‘ble.’ refer to Bluetooth events, errors, etc., and the ones beginning with ‘mi.’ are actual sample data retrieved from microinverters.

A note on metric types: gauge values are simply the most recent value of some constantly-changing parameter (like the measurement of DC input voltage). Counters, on the other hand, are treated as monotonically increasing. sunstatsd converts counter values into actual counts: if it receives a value of 56 for a counter metric, then receives a value of 57, it will emit a count of 1. If input values go backwards (likely because the daemon that generated them was restarted), it will count from the new value.

‘mi.*’: microinverter metrics

 

Metric Name

Type

Notes

mi.ble_fw_version

gauge

16-bit value representing the BLE chip’s firmware version.

mi.cert_params_crc

gauge

crc16 of the currently programmed certification parameter data.

mi.country_code

gauge

The currently programmed 16-bit country identifier.

mi.energy_ac_wsec

gauge

The “odometer”: lifetime accumulated output energy in watt-seconds.

mi.error1

gauge

32-bit microinverter fault word.

mi.error2

gauge

Data-error word.

mi.error3

gauge

Reserved for future use.

mi.fac

gauge

Floating-point value: current AC frequency in Hz.

mi.iac

gauge

Floating-point value: current output current in milliamps.

mi.pac

gauge

Floating-point value: current output power in watts.

mi.st_fw_version

gauge

16-bit value representing the power-control MCU’s firmware version.

mi.state

gauge

16-bit value describing the power-control MCU’s current operating state.

mi.temp

gauge

Floating-point value: current temperature (deg C).

mi.time_on_seconds

gauge

Lifetime accumulated running time in seconds.

mi.vac

gauge

Floating-point value: AC output (V).

mi.vpv

gauge

Floating-point value: DC panel input (V).

‘ble.*’: BLE metrics

Metric Name

Type

Notes

ble.bad_handles

counter

Rare; indicates noise on the serial line between processors or spurious BLE reset.

ble.connect_errors

counter

Number of failed BLE connections (not uncommon in noisy environments).

ble.connects

counter

Number of successful BLE connections.

ble.daemon_data_errors

counter

Number of microinverter samples that appear incorrect (for example, if the lifetime AC output energy counter is 0 or moves backwards), as seen by the BLE daemon.

ble.disconnect_errors

counter

Number of failed BLE disconnections.

ble.link_layer_failures

counter

Number of BLE link layer errors.

ble.lmp_errors

counter

Number of BLE link manager protocol errors (these result in a reset of the BLE chip).

ble.mi_data_errors

counter

Number of microinverter samples that appear incorrect (for example, if the lifetime AC output energy counter is 0 or moves backwards), as seen by the BLE chip on the microinverter.

ble.peak_recv_size

gauge

Maximum packet size received over the BLE serial connection.

ble.peak_send_size

gauge

Maximum packet size sent over the BLE serial connection.

ble.read_errors

counter

Number of failed BLE reads (quite common in noisy environments.)

ble.reads

counter

Number of successful BLE reads.

ble.reboots

counter

Number of resets of the BLE chip.

ble.rssi

gauge

Running average of the signal strength of a given BLE device (microinverter or SM-EC-EU).

ble.sample_reads

counter

Number of microinverters samples successfully read via BLE.

ble.uart_framing_errors

counter

Number of corrupted packets seen on the serial line (very rare).

ble.uart_recvs

counter

Number of packets received on the serial line.

ble.uart_resends

counter

Number of packets re-sent on the serial line due to corruption or a reset.

ble.valid_samples

counter

Number of valid samples read via BLE (samples without data errors).

ble.write_errors

counter

Number of failed BLE writes (common when doing firmware upgrades in a noisy environment).

ble.writes

counter

Number of successful BLE writes.

Powered by Zendesk