This article will help users diagnose a Controller running under a high load, as well as describe modifications that can be done for optimal performance.
Table of Contents
There are a few things to take into consideration when managing large UniFi installs with hundreds of devices and several sites connected to a single UniFi Controller. A Controller running under such a high load might encounter some issues if configured the same way as a Controller managing a much lighter load. There are some symptoms that can be recognized, diagnosed and worked on to improve the controller's performance.
|Warning: Before proceeding with the configurations outlined below, make sure to create backups of your machine and controller configurations. A typo error could break the system.|
Symptom: High CPU Usage
One of the most important metrics to monitor, is CPU usage on your UniFi controller. High CPU usage is the first indication that there is an issue. Unfortunately, there is no silver bullet for this and just increasing the CPU is not necessarily the answer.
Before increasing the size of your box, try increasing the `XMX` and `XMS` options. By default, a UniFi Controller has these set to 1 GB. They can be increased by making the following entries in the system.properties file (see Related Articles below for more information on the system.properties file).
Units are in MB.
The changes above would increase the memory the UniFi Controller is allowed to consume from 1 GB to 2GB. The reason why not having enough memory could cause high CPU usage, is because the Java Virtual Machine would be spending too many CPU cycles collecting garbage to stay within the 1 GB of memory allocated to the controller. So before moving to a machine with more CPU, it is recommended to max out the available memory on that machine with the above settings. Then observe if CPU usage decreases.
However, if high CPU usage does continue after memory increase, a larger machine with more CPU cores and more memory to handle the workload may be necessary.
Symptom: Heartbeat Missed or Slow to Provision
No matter the number, all devices will try to inform back to the controller. By default, the controller can handle 200 simultaneous connections from devices, so devices with heartbeats missed shouldn't be an issue unless a single controller is managing thousands of devices. If it is only managing a few hundred devices the adjustment below can be tried out, but it may not have the desired results.
The number of simultaneous inform messages that can be processed can be set in system.properties by adjusting the following:
The default value is 200 and max_keep_alive_requests should always be lower than num_thread. Try adjusting up from there: a device stability increase should be seen, and by pushing the configuration out to devices becoming even more stable.
Database Connection Tuning
When running a large UniFi installation, it may be desired to run an external mongo cluster to be able to scale the database independently from the UniFi Controller application. Discussion on that can be found here on our community forum.
If high CPU usage is seen on the mongo process, it can indicate the need of a bigger box or the need to separate the mongodb process as mentioned above. Once that is done, the following can be tuned to see if it results in better application performance.
The best way to increase performance and provide stability for large installations is to ensure you have monitoring on your system resources. Beyond that, practices such as offloading the database workload, and increasing memory can allow the UniFi Controller to serve more clients and devices. This should be reflected in the resource usage and the performance of the UniFi Controller UI.
UniFi now offers a management solution that can take care of all this and more, find more information about UniFi Elite here.