There are few things more important to a data center than continuous power. Without it, a data center will experience prolonged downtime, significant financial loss, a damaged reputation and other damaging effects. It is for this reason that data centers focus a lot of their time and energy on power redundancy and ensuring that there is a properly functioning uninterruptible power supply (UPS). A UPS will sit waiting and, should it be needed due to a power failure, will supply necessary power to keep data center infrastructure up and running. There are a variety of UPS sizes to accommodate assorted power loads and many data centers implement multiple UPS systems to ensure they are protecting against downtime. It is important that a UPS be prepared to function at a moment’s notice so that there is not significant loss of data. The problem is, many data centers experience UPS failure and, the majority of times a UPS fails, it is due to lack of proper maintenance and servicing.
A power failure can occur for a variety of reasons – power outage, power surge, power sag and more. Whatever causes a power fluctuation or outage, even a few moments of downtime can bring with it severe costs. Should any power fluctuation or outage occur, a UPS will pick up right where the power supply left off, eliminating downtime, data loss, and damage to infrastructure. A UPS is often thought of as a “dependable” power supply in case of emergency but, if it is not properly maintained and serviced, it may not be particularly dependable.
To be able to determine how to best maintain your data center UPS system, you must first understand why UPS systems fail from time to time. Just like that 10 year old battery in your junk drawer may not have very much life left in it, UPS batteries diminish over time. Even if you have not needed to use your UPS, the battery that powers it will lose capacity over time and not have as much life as originally intended. UPS battery deterioration is often further expedited because of the often high temperatures inside data centers. Fans occasionally fail because certain components such as ball bearings dry out or fans lose power from continuous use. Additionally, power surges such as those caused by lightning or other transient spike can diminish a UPS battery. Dust accumulation on UPS components can diminish UPS efficacy. Further, the UPS battery discharge cycle (how many times the battery has been discharged and recharged) will shorten the overall life of a UPS battery. A typical 3-phase UPS has an average lifespan of 10 years and without proper maintenance it could be much shorter.
If you think you are doing enough by occasionally checking your UPS battery, you may be leaving your data center exposed to an outage and downtime. Government Technology explains just how many data centers are experiencing downtime due to UPS failure and preventable human errors, “Data center outages remain common and three major factors — uninterruptable power supply (UPS) battery failure, human error and exceeding UPS capacity — are the root causes, according to a new study released earlier this month. Study of Data Center Outages, released by the Ponemon Institute on Sept. 10, and sponsored by Emerson Network Power, revealed that 91 percent of respondents experienced an unplanned data center outage within the last 24 months, a slight dip from the 2010 survey results, when 95 percent of respondents had reported an outage…Fifty-five percent of the survey’s respondents claimed that UPS battery failure was the top root cause for data center outages, while 48 percent felt human error was the root cause.” By correcting human error and properly maintaining your UPS system, you can dramatically decrease your data center’s risk of downtime.
To prevent UPS failure, it is imperative that you regularly maintain and service your UPS as part of your Data Center Infrastructure Management (DCIM) plan. There are a few key components of proper UPS maintenance and service but physical inspection is at the core. If you are not physically checking on your UPS system on a regular basis, there is no way to know if there is something visibly wrong or problematic that could lead to a failure. The best thing you can do is create a UPS maintenance and service checklist and keep a detailed log of all maintenance and service to ensure that maintenance does not fall behind. Your checklist should include checking the UPS battery including testing it to ensure it is working, the UPS capacitors, the ambient temperature around the UPS, calibration of equipment, performing any service that might be required (check air filters, clean and remove dust), verify load share and make any necessary adjustments, and more.
If UPS battery failure is one of the most common causes of UPS failure and thus downtime, it is only logical that this should be one of the most important parts of your UPS maintenance checklist. Battery discharge should be routinely checked to ensure that it is not diminished and incapable of handling the necessary power load in the event of a failure. It is also important to visually inspect the area around the UPS and the battery itself for any obvious obstructions, dust collection or other things that may prevent adequate cooling. If you are seeing a warning that the battery is near discharge perform necessary maintenance. Further, the AC input filter capacitors should be checked, along with the DC filter capacitors and AC output capacitors for open fuses, swelling or leakage. Next should you visually inspect all components for any obvious problems. Inspect the major assemblies, wiring, circuit breakers, contacts, switch gear components, and more. Should you see obvious damage, perform necessary maintenance and service.
Next, because data centers operate at a high temperature due to the energy output of the infrastructure, it is important to check the ambient temperature around the UPS system because a high temperature can diminish the battery capacity. Schneider Electric explains best practices for maintaining ambient temperature around UPS for maximum battery life, “It is recommended that the UPS be installed in a temperature controlled environment similar to the intended application. The UPS should not be placed near open windows or areas that contain high amounts of moisture; and the environment should be free of excessive dust and corrosive fumes. Do not operate the UPS where the temperature and humidity are outside the specified limits. The ventilation openings at the front, side or rear of the unit must not be blocked… All batteries have a rated capacity which is determined based on specified conditions. The rated capacity of a UPS battery is based on an ambient temperature of 25°C (77°F). Operating the UPS under these conditions will maximize the life of the UPS and result in optimal performance. While a UPS will continue to operate in varying temperatures, it is important to note that this will likely result in diminishing the performance and lifespan of your battery. A general rule to remember is that for every 8.3°C (15°F) above the ambient temperature of 25°C (77°F), the life of the battery will be reduced by 50 percent. Therefore, keeping a UPS at a comfortable temperature is crucial to maximizing UPS life and capabilities.”
Visual inspection should include dust and dirt removal on the UPS system. UPS system will sit and accumulate dust over time but dust could interfere with proper heat transfer so dust should be promptly removed to ensure the UPS system will function properly when needed. Further, check all air filters for dust accumulation. Dust accumulation on filters could lead to inefficiency and even overheating. Clean and replace filters as needed to properly maintain your UPS. Capacitors are also an integral component of UPS systems. Capacitors aid in the transition of power in the event of an outage so if they fail, the UPS will likely fail. Capacitors need to be routinely checked because they will dry out from wear and tear so they need to be replaced every few years to ensure proper UPS function.
Though much of the suggested UPS maintenance and service strategy may sound basic, even obvious, the fact of the matter is that UPS failure continually remains a primary source of data center downtime. And, when you couple that with human error, it is easy to see that many data centers simply are not properly maintaining their UPS systems to prevent failure. All of these tasks do not need to be completed every day or even every week, certain tasks can be performed weekly while others can be monthly, quarterly, semi-annually, and annually. By breaking it up you ensure that your UPS system is being frequently and routinely checked while making routine maintenance a far more achievable task. Additionally, by maintaining a detailed log you can see if UPS maintenance and service has fallen behind and immediately address any concerns. When data center technicians routinely check the UPS system, they will become familiar with what looks normal and what looks concerning so that, should anything look problematic, it can be addressed and remedied immediately for peace of mind that your UPS will be there when you need it and prevent costly downtime.