A data center is not very useful if it cannot maintain uptime. Maximizing uptime is easier said than done. But, make no mistake, to stay competitive in today’s marketplace, it must be done. While unplanned outages affect just about every data center, minimizing the frequency and duration of those outages must be prioritized by data center managers.
What is Causing Data Center Outages?
It is a common mistake to believe that the majority of unplanned data center outages are fused by uncontrollable factors like weather-related problems. While outages do absolutely occur because of weather, there are other factors that often cause data center outages and those are often factors that can be addressed by better data center infrastructure management.
You cannot prevent data center outages from occurring in your facility if you do not first understand the cause of your outages. Every data center and its pain points are unique so it will require digging in a bit to look at what has historically caused outages, as well as where you anticipate vulnerability moving forward.
Human Error Commonly Causes Data Center Outages
Many studies have found that, commonly, data center outages were completely preventable. Considering that data center outages can cause significant financial loss as well as jeopardize data security, this is a major problem that can and should be addressed. Many of these human errors are seemingly small but can have significant consequences. Something as simple as labeling and properly protecting emergency power off buttons or not allowing food and drink near electronics can be the difference that prevents a data center outage.
One of the most common causes info data center outages, believe it or not, is human error! Human error could be the result of negligence or simply a mistake. There are various ways to prevent human error from occurring that are both cost-effective and efficient, making them easy to implement. It is vital that you properly train your employees and always correct an employee when there are deviations from established processes. Uptime Institute routinely collects information from data centers regarding downtime and their research shows human error to be a major preventable problem as well – but it is one that management must address with improved systems and procedures for staff to correct, “Some industry experts report numbers as high as 75%, but Uptime Institute generally reports about 70% based on the wealth of data we gather continuously…a quick survey of the issues suggests that management failure — not human error — is the main reason that outages persist. By under-investing in training, failing to enforce policies, allowing procedures to grow outdated, and underestimating the importance of qualified staff, management sets the stage for a cascade of circumstances that leads to downtime.”
Backup Power Failure is Another Common Cause of Data Center Outages
Your UPS system and any other backup power systems you have in place are susceptible t9 failure, particularly when not routinely maintained and tested. It also points to an important strategy some data centers have yet to take advantage of – redundant power supplies. Data centers grow over time and, in the case of the recent few years, very rapidly. When this happens, you may surpass your UPS’s capacity for support without even realizing it if you are not routinely maintaining and testing your backup power supply. It is critically important that you ensure your UPs can manage your current power demands (ideally with room to spare), ensuring that as you move forward you can scale as needed and be prepared to support your full IT load in the event of a power failure.
Data Center Security Issues May Also Lead to Power Outages
With ever-evolving, constantly unique, and difficult to detect threats, cybersecurity is more important than ever before. Data centers are an obvious target for cyber attacks so it is no wonder that they can be a common cause of unplanned outages. DDoS (distributed denial of service) are commonly used methods to attack data center systems which is why having DDoS security solutions in place can help defend against some of the most sophisticated of attacks. Data Center Knowledge explains the complexity of cyber attacks against power systems and how catastrophically they can impact a data center if proper protection, as well as systems and procedures, are not in place prior to an attack, “’The majority of power equipment in the data center can be remotely controlled and configured,’ Bob Pruett, security field solutions executive at SHI International…told Data Center Knowledge in an interview. ‘So, a malicious bad actor could take control of these devices and interrupt the power to a data center or a specific device on your network’…Some of these control systems could fall into the category of the Internet of Things…Attacks against IoT devices increased by 100 percent last year, according to a report by San Francisco-based cybersecurity vendor Darktrace…In most types of attacks, cybersecurity teams can isolate traffic or even entire compromised systems. But industrial controls are a special case…If the devices and computers controlling a data center’s power supply have been compromised, taking them down could turn off power to the entire facility…There’s another reason to be particularly careful about protecting access to power systems. Attackers who get control over a data center’s power supply can shut down a data center – but they can also cause a power surge that destroys equipment.”
There are many reasons a data center could experience a power outage and the aforementioned factors are certainly not the only ones posing a risk to data centers. For this reason, it is imperative that data center managers implement robust DCIM strategies that are consistent and adhered to by all staff. In doing so, you drastically reduce the risk your data center will experience a costly and concerning power outage, thereby maximizing uptime, protecting sensitive information, and saving significant money in the long run.