Data centers function with a continuous goal of maximizing uptime. It is important to avoid outages at all cost while constantly trying to improve energy efficiency and maximize data storage and speed. There are a variety of factors that influence data center outages but the bottom line is that, from time to time, they do happen. The problem is that, when outages occur, they are not only frustrating; they can result in data loss and significant financial loss. So, what is a data center to do? Are these outages simply unavoidable, aggravating occurrences? No. In fact, Emerson Network Power notes just how preventable these outages can be, “According to the 2013 Study on Data Center Outages by the Ponemon Institute, sponsored by Emerson Network Power, 71% of survey respondents said some or all of unplanned outages experienced within the last 24 months were preventable.” Below, we discuss 2 common types of data center outages that are, by and large, preventable.
- Human Error
- Human error is, unfortunately, one of the most highly cited reasons for data center outages. This can be avoided with simple measures such as shielding “emergency off” buttons. Emergency Power Off buttons are often not labeled correctly or protected properly and by simply shielding and labeling them, data center outages can be avoided. Additionally, well-communicated operating instructions and procedure methods can help reduce errors that occur from lack of information or knowledge. Finally, what may seem like a no-brainer – strict food and drink policies. Even a small liquid or food spill on critical equipment could lead to an outage so it is important to have strict regulations in place.
- UPS/Battery Failure
- Power supplies can fail for a number of reasons – age, local power outages, storms, surges, and more. For this reason it is critical that an uninterruptible power supply be used but, perhaps even more importantly, it is necessary to have redundancy. Have a power supply that is adequate size for your entire capacity and power load, as well as a backup power supply that is also adequate and be certain to perform proper UPS and battery maintenance routinely. Green House Data describes the importance of a proper DCIM, “As data centers become more and more dense, they are drawing more power at each rack. Don’t allow your UPS design to fall below your average IT load. A Data Center Infrastructure Management (DCIM) platform can help you evaluate power draw throughout a given period. Redundant UPS systems are also a necessity to achieve the goal of 100% uptime.”
Posted in computer room maintenance, Data Center Battery, Data Center Design, data center equipment, Data Center Infrastructure Management, data center maintenance, DCIM, Facility Maintenance, Uninterruptible Power Supply, UPS Maintenance
Tagged data center maintenance, Uninterruptible Power Supply, Uninterruptible Power Systems, UPS, UPS Batteries, UPS maintenance
What will the data center look like in 5 years or even 10 years? It may sound impossible to predict but experts are weighing in and providing their predictions for the future of data centers. The storage systems and servers of today will be a distant memory. Cloud computing will take on a whole new life. While 5 or 10 years may sound far off, it is important for the data centers of today to start anticipating these changes and preparing for the future so that they can stay ahead of the game and not fall by the wayside. Storage needs are changing daily so it is easy to understand that they will be significant in the future. Many experts see data centers making the switch to being scale data centers by 2025. Data Centers Knowledge elaborates on what “scale data centers” are, “Scale data centers are data centers designed the same way web giants like Google, Microsoft, and Facebook design their facilities and IT systems today. Intel isn’t saying most data centers will be the size of Google or Facebook data centers, but it is saying that most of them will be designed using the same principles, to deliver computing at scale.”
Delivering computing at scale is not a simple concept or an easily achievable task but it is necessary to meet the expected demands of technology and users of the future. Data Center Knowledge goes on to explain the future demands that will necessitate scale data centers, “Things like the three major forms of cloud computing (IT infrastructure, platform, or software delivered as subscription services), connected cars, personalized healthcare, and so on, all require large scale. “If you’re doing a connected-car type of solution, that’s not a small-scale type of deployment,” Waxman said. “If you’re doing healthcare and you’re trying to do personalized medicine, that’s a large-scale deployment.’” As data volumes increase, data centers must be able to scale non-disruptively. For data centers, infrastructure must be carefully managed to be capable of scaling up on demand. The costs to meet these demands can be managed more easily by gradually scaling up data centers. Schneider Electric also notes that scale will be the future of data centers, ““We’ll see a dominance of at scale wholesale data centers, with a movement towards at scale cloud providers and the verticalization and specialization of the smaller providers in between,” he says. “There will also be a secondary movement to the edge.” He defines “at scale” as at least 15MW or more, a size needed to support cost effective IoT and big data deployments — two of the drivers changing the market according to Doug. “Big data, derived in large from the IoT, is helping shape the way companies develop, improve and bring products to market and serve consumers and customers,” said Doug, “Ultimately, all that data resides in a data center where there must be enough power to process and analyze it.”
Data center cooling is a topic that could be discussed endlessly. What works best for one data center may not work well for another depending on a variety of factors including data center location, size of data center and type of building. Cooling with water is an eco-friendly and exceptionally effective means of cooling and what many are finding is that chilled water may be even more effective. It remains the goal of most data centers to effectively cool while also being efficient and eco-friendly. When using a chilled water system, a water chiller is used to produce chilled water which is then pumped into the CRAH (computer room air handler) and then then circulates around chilled coils and cools the air in the computer room by removing the heat from a room. It circulates out and then gets chilled again and sent back through the system, making it a very efficient means of cooling a data center.
In the event of an outage, air cooled chillers can actually return to operation more quickly, making redundancy easier to achieve as well. Additionally, chilled water cooling is easily scalable and adaptable to the ever-changing needs of a data center. In an effort to improve efficiency, many data centers are more closely examining just how cool the chilled water cooling system needs to be. If it can be adjusted by even a degree or two, a significant improvement in energy efficiency can be made. Schneider Electric further examines the advantage of opting to adjust chilled water cooling temperatures in data centers, “In a nutshell, that means many data centers don’t need to be as cool as they used to. Most data centers will find temperatures of 24°-25°C (75°-77°F) will suffer no difference in reliability vs. cooler temperatures… If temperatures inside the data center are higher than in the past, that means the temperature of the chilled water used to cool it – known as the set point for the chillers – can also be higher. As it turns out, that has a profound effect on cooling system efficiency. Raising the chilled water set point from the usual values of 7° to 10°C used in comfort cooling chilled water plants up to 18° to 20°C or higher can result in an operational expense savings of about 40%. That’s because less energy is required to cool the water year-round. In summer, higher evaporating temperatures mean compressors don’t have to work as hard, resulting in improved efficiency. In cooler months, users benefit from many more hours of economizer or “free cooling” operation. A higher set point also results in a capital expense savings of some 30% because chillers don’t have to be as large as at traditional temperatures.” With a re-examination of what temperature your data center needs to maintain to maximize uptime, data centers may be able to adjust their chilled water cooling temperature to save a significant amount of expense and dramatically improve data center energy efficiency.
In the wake of many high profile data breaches, from government institutions to retailers, there is an evolving environment in the data management world. An environment that requires more active security policy to be established in order to reduce the amount of time that sensitive data is unknowingly exposed to malicious sources. Having strictly preventive security policy although at times effective opens the door to a flood of destructive malware as relaxed policy in regards to monitoring of data movement can allow compromised systems to be unpatched for indeterminate periods of time, unnecessarily exposing data systems.
Active Monitoring Systems
Although preventative maintenance is an essential part of security, actively monitoring data systems can result in quicker detection of penetration by malicious software, these breaches may go unnoticed for long periods of time if only preventative security measures are taken in the data center. It’s a given that systems should should be monitored on a daily basis, but dealing with a large flood of data and knowing how to prioritize it is near impossible for a large data center. Especially in the face of remote access by authorized staff from various locations, of whom may unknowingly bring security risks into the operational environment. With such big data coming in so quickly from a variety of secure and insecure networks the only answer to monitoring such a large scale system of data transfer and accompanying network activity is software based analytics. Big data can be sorted and actively monitored in a meaningful manner through the analytics derived from computational algorithms, algorithms of which can sort malicious activity based on potential risk to reduce false positives or non factor threats that will be blocked by preventative security systems, giving security personnel a more focused view of malicious activity in the network. Any and all detection can be stored and logged for future reference to increase efficiency of automated detection systems. Software based analysis and monitoring of network activity can help identify issues as they stream in, with sorting of priority and potential risk security personnel are able to catch threats immedietly. This reduces liability as security breaches are detected on the fly, reducing exposure of sensitive data and the time of which malicious software has access to said data systems.
Winter is soon approaching and with it comes the concern of not just managing the drop in temperature, but also managing the low humidity that comes with it. Ensuring temperature and humidity control systems are set to the industry recommended ranges and are receiving routine maintenance in the coming winter months will help prevent any unexpected impairment from these environmental conditions. Below are some considerations to be made in order to prevent downtime or loss of of data in the winter months.
Temperature Control Systems and Maintenance
Problems arising from improper preparation for these yearly temperature lows can be minor or in some cases catastrophic. Being one step ahead of the incoming chill is an essential preparation to ensure data center up-time. Maintaining thermal management systems is critically important as low temperatures and the accompanying low humidity produces static electricity, which can potentially damaging sensitive electronics systems, creating a situation where indiscriminate data loss is a real possibility. However, maintaining environmental conditions within industry standard guidelines is not not as simple as running systems until the desired effects are achieved. With the large fluctuations in temperature and humidity that come during the winter months air conditioning units and humidity control systems will be under duress attempting to keep the desired stable conditions within the data center. If these systems don’t receive proper servicing they can cause leakage of coolant or water into the data center through failure of internal components, creating a multitude of issues as not only will flooding or leakage have to be dealt with, but also unregulated thermal conditions during their repair. Receiving routine maintenance of these systems to check for faulty hardware is essential as always, but a pre-winter diagnostic for thermal management systems during this time of year can help prevent a potential disaster and ensure operations continue unhindered.
Automated Thermal Management Systems
Keeping the temperature and humidity in a narrow range is not an easy accomplished task. Luckily there are automated control systems that can act as hive-mind for a network of thermal management systems, ensuring the ideal temperature and humidity zones are reached. These systems bring a big picture to temperature and humidity control by ensuring individual systems in the building don’t counteract each other’s purpose and they can give proper warning of inefficiency, such as in the case of one unit futility counteracting another by humidifying while another is dehumidifying. This expansive view into the cooling systems within a data center is a great diagnostics system and helps reduce the workload and potential for human error, in turn reducing costs through better overall efficiency of the system.
The most underrated force that leads to downtime or inefficiency in the data center is personnel. Even when systems are functioning optimally human error can lead to unexpected consequences due to carelessness or forgetfulness. As systems become more automated and self reliant the human factor is stronger than ever in the reasons why downtime is experienced. There are some considerations that should be made in how to manage human error in the data center and reduce downtime associated with it.
Proper documentation in a task mannered step by step checklist is a great way to reduce risks associated with routine tasks. Even the most experienced IT worker may fall out of step if procedures become too visceral, leaving room for mistakes that result in downtime. This is why it’s critical that guidelines are made for all tasks, in order to ensure that there is reference for anyone who needs it, especially in emergency situations or to rectify a mistake. All equipment, should be labeled properly and diagrams drawn up to ensure procedures can be followed without unneeded time to find the referenced items or areas said in documentation. At a bare minimum critical items such as the emergency off and switching devices should be labeled.
Training and Consistent Policy
Training personnel to follow a standard set of practices within the business is essential for those with access to the facility or data systems. All personnel should be familiar with essential equipment in order to avoid an unexpected shutdown as even with proper documentation carelessness or lack of understanding of importance of systems can lead to mistakes. Security should be tight with a sign in policy that requires observation of non-essential personnel not just to protect the equipment, but to ensure nothing is inadvertently damaged. As with any electronics liquids and foods are a huge risk on a daily basis and should be kept away from any rooms with critical equipment, proper signage should be in place and this policy should be enforced thoroughly.
Providing energy to servers is a substantial part of a data center’s costs. In many cases this is due to servers running consistently at peak performance in preparation for peak capacity. This creates a lot of unnecessary expenditure as these systems are not always needed to run at such high performance levels. This creates inflated minimum power requirements for maintaining critical systems, forcing unnecessary expansion of power management systems and further increased costs.
Dynamic or Scheduled Performance
Protecting functionality while ensuring peak performance is a huge challenge, but one that should not be ignored as there are potentially substantial returns in capital from adjustable performance in these systems. Servers waste a tremendous amount of power by running at peak performance under times of low demand, especially if the suite of applications are technically demanding and require powerful systems. Deactivating servers or resizing clusters on a schedule of known usage or under dynamically controlled systems, of which can detect potential shifts in usage and need for more functionality, can help dramatically reduce power consumption. Reducing idle power consumption is a significant way to cut costs and even the largest business can benefit through dynamic management of performance. These methods reduce servers drain on power while ensuring there is no downside on the user end, even if there is unusually large loads of traffic.
Load balancing with Multiple Data Centers
As a business grows and it’s pool of users expand it may become beneficial to have data centers strategically located, running applications only in areas of which are located in times of off peak hour. Off peak hours provide significantly reduced prices in power due to less demand outside of the typical business hours. Concerns with latency might make this an issue for some businesses, but for the majority of computational tasks running applications with a few hundred milliseconds of latency is not a concern. In certain cases redirecting traffic with needs for low latency to a first tier of high cost data centers and redirecting those with no latency concerns to cheap power areas would be ideal. Integrating functionality across multiple data centers allows capacity and latency to be shifted with user demand determining performance, saving capital in the process. Such varying location can also offer more stability as systems aren’t isolated in their power supplier and servers draw power independently increasing the reliability of these platforms.
With the increase in remotely performed operations, and the need for less work crew on site due to automated procedures, there are a wide range of locations that can be selected for construction of your data center. Below are some considerations to be made in regards to choosing a site for future construction.
Cheap Primary Power Source
The most crucial part of any data center is it’s primary power source. Being connected to a large power grid and the accompanying infrastructure can help reduce costs of kilowatt per hour usage. The downside is that searching for a cheap power source may restrict a multitude of other factors that should be considered when choosing a site for construction, but under certain conditions the potential benefits from cheap power may make up for other potential detriments. The extra working capital from a cheap power source would be of primary concern for data centers which have an unexpected rate of growth and may require substantially more power in the future.
Low Population Areas
Typically areas out of the urban environments allow for easier construction and expansion due to more relaxed zoning laws. Less population density also allows for easier security monitoring and better isolation of the data center, providing more acute risk management, as less human factor is introduced. These areas may also have lower living expenses and in turn potentially provide cheaper labor.
Damage To Operations From Climate
It’s ideal to find climates that are consistent when selecting a site for your data center. Being on the lower side of the temperature range without being in the extremes is also a great way to allow for passive cooling, which affords the opportunity to cut costs further. Avoiding areas prone to devastating acts of nature is a crucial concern as these events can devastate and destroy not just potentially your data center, but also the surrounding infrastructure. Therefore Excessively rainy climates prone to flooding should be avoided, and considerations of other environmental hazards unique to some regions should be made. In most cases arid cool environments in low population areas with access to proper roadways are the preferred choice for a data center.
According to Gartner, Inc 20.8 billion objects will have connectivity by 2020. The Internet of Things has been highly spoken of as the force behind this predicted growth in devices and data, but silently behind it the Industrial Internet of Things has churned along waiting to revolutionize the world of industry through the cloud.
The Industrial Internet of Things
As various sectors of business attempt to automate or enhance productivity through embedded electronics the integration of physical devices into cloud based computation and storage environments is becoming more common everyday. The Industrial Internet of Things brings the brains of embedded electronics and their cloud based systems to the largest machines in a way not seen before. An article on the GE blog titled “Industrial Internet Success: Union Pacific Railroad” references a particularly interesting utilization of the industrial internet by Union Pacific, of whom installed systems on their trains to provide real time sensory information, which was in turn coupled with weather data, to process and evaluate train line safety by giving warnings of potential derailments weeks before they happen.
Big Data Reduces Costs
The case examined above doesn’t necessarily show the entire scope of the Industrial Internet of Things, but it’s an indication that quantifiable data from hardware systems in an industrial setting is potentially revolutionary. Analysis of big data provided from industrial systems has huge implications for capital investment. This is due to the fact that such systems will lead to a reduction in working costs, through identification of inefficiency and elimination of potential risks, through objective forecasting of environmental hazards. Through collaboration of data from industrial machines and cloud based storage and computational environments there is great potential for optimizations of business capabilities through the Industrial Internet of Things, as it’s been shown industrial data can cut liabilities dramatically.
These cloud based systems could prove a pivotal moment for large scale industry as issues in the operating environment that have gone unnoticed for extended periods of time, at great cost, can be identified by such big data analytics. It’s only a matter of time before the Industrial Internet of Things becomes a dominant force in the cloud computing world as businesses operating industrial machines begin seeking efficiency provided by analytics of industrial machines operating in cloud based environments.
Ensuring hardware is protected from unexpected physical consequences is an essential and often overlooked part of maintaining a data center. The strategies below will reduce the risk of downtime and potential damage to hardware.
The Uninterruptible Power Supply (UPS) provides power when utility services fail, keeping essential equipment functional at all times. In order to guarantee uptime and adequate power needs in a prolonged outage backup UPS systems should also be installed on essential systems. Any remote monitoring equipment should also be fitted with a UPS to allow for continued monitoring of data center operations. A Power Distribution Unit (PDU) should also be installed to the main power source to protect against critical loads. Power distribution to all systems should run through a UPS, connected to a PDU. Together the UPS and PDU ensure equipment is protected and active throughout any power fluctuations.
Cooling equipment needs are dependent on the particular environment at hand, although equipment may handle a large range of temperatures such fluctuation should be avoided. To ensure components are adequately cooled or heated keep temperatures in a range of high 60’s to low 70’s. Finding and eliminating hotspots in racks with tray fans is a great way to cut down on unnecessary use of air conditioning, but with proper organization and management of equipment this should not be needed. Airflow considerations should be made around all equipment, but particular devices such as the UPS are more prone to potential degradation if exposed to excess heat. If the room is too compact to allow for cool airflow to the UPS it should be kept outside the area.
Organization and setup is an essential part of protecting critical equipment and avoiding long term problems. If pressed for time and proper organization during setup isn’t possible, then arrangements should be made for a fully configured system. Poorly arranged systems can lead to over cooling situations in which spot fans are unable to maintain ideal temperatures and over usage of air conditioning is needed. Considerations should also be made in regards to the potential for water damage from the environment. Avoiding any room with water pipes or any areas that could potentially flood, such as basements, is a must. If in a rainy climate having an umbrella installed over server racks can protect from any unexpected leaks from the roof.
In short, well prepared power management and cooling systems are the best insurance against any unexpected downtime.