Server Room Fire Suppression Best Practices

datacenter45Data centers must delicately balance the need for infrastructure and equipment that runs all day and maximizes uptime with the need to manage heat and fire risk associated with electronic equipment.  This is particularly true in server rooms.  Server rooms are the heart of a data center, the hub of information.  If a server room experiences a disaster of any kind, it typically leads to downtime. Server rooms must have proper air conditioning but that is not enough, they must also have appropriate fire suppression measures in place to reduce the risk of damage, injury, and downtime.  There are many threats to data center operations but perhaps one of the most significant is fire.  Other threats may pose a risk of significant downtime but are likely to only result in moments of downtime.  Fire, on the other hand, can cause permanent damage to equipment, injury to personnel, and prolonged downtime as a result.  When it comes to fire suppression in data centers, negligence to implement suppression is simply unacceptable – a true recipe for disaster.

The statistics surrounding fires in data centers may not sound all that scary at first glance – according to NetworksAsia, only about 6% of infrastructure failures are caused by fires.  That kind of statistic may make you feel comfortable, like you do not really need to worry much about the risk of fire since you take appropriate precautions to ensure that your server rooms are cooled correctly.  But, make no mistake, the risk is real and if it happens to you, it may not just lead to downtime, but to your data center closing its doors.  Data Center Knowledge provides a wakeup call to all data centers about the very real risk of data center fires, “A small data center in Green Bay, Wisconsin was wiped out by a fire earlier this month, leaving a number of local business web sites offline. The March 19 fire destroyed 75 servers, routers and switches in the data center at Camera Corner/Connecting Point, a Green Bay business offering IT services and web site hosting…But it took 10 days to get customer web sites back online, indicating the company had no live backup plan…While the company discussed the usefulness of its fire alarms, it didn’t address whether the data center had a fire suppression system. But it doesn’t sound like it. The Green Bay Press Gazette describes “racks of blackened, melted plastic and steel.” We’ve previously looked at data center fire suppression tools and how they have evolved with the industry’s recent focus on environmental considerations.”

server room fire

Image of Server Room Fire via: Bangkok Post

Fire prevention and fire suppression should be a part of any data center disaster recovery plan.  It is important to consider what types of fire your data center is most at risk of, as well as the size of your data center, to determine the appropriate fire suppression system for your disaster recovery plan.  Your data center’s form of backup and the specific strategies for your disaster recovery plan will heavily influence the type of fire suppression system that you use.  If you have a minimal or “bare bones” disaster recovery plan, you may want the most elaborate and effective fire suppression system because you need it to work as effectively and quickly possible.  If you have a comprehensive disaster recovery plan and robust backup/redundancy, uptime is less dependent on your fire suppression system.  But, in the end, every single server room must have a fire suppression system that is more effective and comprehensive than “calling 911.”

To understand fire suppression needs and make an informed decision when choosing a fire suppression method, it is important that you understand what types of fires can occur in a server room or data center.  TechTarget explains the types of fires data centers are at risk of:

“In North America, there are five fire classes:

  • Class A: Fire with combustible materials as its fuel source, such as wood, cloth, paper, rubber and many plastics
  • Class B: Fire in flammable liquids, oils, greases, tars, oil-base paints, lacquers and flammable gases
  • Class C: Fire that involves electrical equipment
  • Class D: Fire with ignitable metals as its fuel source
  • Class K: Fire with cooking materials such as oil and fat at its fuel source

No matter where your data center is located, fire can be considered a potential disaster. Data center environments are typically at risk to Class A, B or C fires.”

sprinklerThere are two primary types of fire suppression systems: water sprinklers and gaseous agent fire suppression solution.  Water sprinklers are a very traditional type of fire suppression system and they are the most common type.  They are particularly popular because they are low cost, may already exist in the server room in the first place, and they are effective.  Once they have been activated they will continue to expel water until they have been shut off. The main problem with water sprinklers is that they can cause significant damage to the equipment.  With the goal of remaining operational and maximizing uptime while preventing catastrophic fire, dramatic water damage could still lead to downtime.  Additionally, water sprinklers could accidentally become activated and cause unnecessary damage. And, while sprinklers systems are inexpensive, the water damage that they cause is not.

For this reason, many data centers and server rooms implement pre-action water sprinklers.  Pre-action water sprinklers work in a similar way but take extra steps to prevent accidental activation and the ensuing damage.  In traditional water sprinklers, the water is kept in the pipes, right at the nozzle awaiting activation.  With pre-action water sprinklers, the water is not kept in the pipes all the way to the nozzle.  The upside is that it is still a low cost system and traditional water sprinklers can be converted to pre-action systems.  Pre-action systems require two events/alarms to activate the system, rather than one, significantly reducing the risk of accidental activation.

fire suppressionGaseous agent fire suppressant solutions are a newer technology and are more effective in suppressing a wider and more significant range of fires. Gaseous agents are delivered in a similar fashion to water sprinklers – the agent is stored in a gas tank and then piped into overhead nozzles and administered when activated.  This is the preferred method of fire suppression for server rooms because it is more effective at fire suppression when electrical equipment is involved.  The Data Center Journal describes exactly how gaseous agent fire suppressant systems work, “The Inert Gas Fire Suppression System (IGFSS) is comprised of Argon (Ar) or Nitrogen (N) gas or a blend of those gases. Argon is an inert gas, and nitrogen is also unreactive. These gases present no danger to electronics, hardware or human occupants. The systems extinguish a fire by quickly flooding the area to be protected and effectively diluting the oxygen level to about 13–15%. Combustion requires at least 16% oxygen. The reduced oxygen level is still sufficient for personnel to function and safely evacuate the area. Since their debut in the mid 1990s, these systems have proven to be safe for information technology equipment application.”  In essence, they are able to suppress fires while minimizing risk to electronic equipment.  The problem with Halon gaseous agent use is that it is no longer in production due to being a health risk and environmental danger.  But, there are Halon replacement agents available that work in a similar fashion without the risk to health or environment.  Though more effective than sprinkler systems for certain types of fire suppression, and though they carry less risk of damage, they are more expensive and cannot run continuously until shut off.  They will only run as long as the gaseous agent is available. Once the tank is empty – fire suppression will stop.

Server rooms pose the most significant risk of fire in a data center because they typically have the highest concentration of electricity and contain combustible materials.  It is absolutely imperative that, should a fire be sensed, fire suppression begins immediately and alarms sound, alerting personnel that it is time to evacuate and take disaster recovery action.  A server room is, at its core, the heart of a company’s information structure.  If the server room experiences a fire, downtime is highly likely.  But, if suppression methods are effectively and efficiently activated, downtime and damage maybe avoidable.

Posted in Data Center Build, Data Center Construction, data center cooling, Data Center Design, data center equipment, Data Center Infrastructure Management, Data Center Security, DCIM, Facility Maintenance | Tagged , , , | Comments Off

Flywheel vs. Battery UPS

flywheel vs. Battery UPS imageEvery data center utilizes a UPS – Uninterruptible Power Supply – to ensure that power is always available, even in there is a power interruption.  Minimizing downtime while maximizing energy efficiency is a primary goal of any data center or enterprise which is why choosing the right UPS is so important.  The UPS begins supplying power immediately upon sensing that the primary power source has stopped functioning.  This is important because it maximizes uptime which helps prevent frustration and financial loss, as well as prevents the loss of data.  The UPS stores power and sits in waiting until it is needed but it requires things like maintenance and testing to ensure it is ready to be used when needed.  There are two primary types of UPS: Flywheel and Battery and there are pros and cons to each that a data center must carefully weigh.

A flywheel UPS (or sometimes referred to as a “rotary” UPS) is an older type of UPS but is still a viable option for modern data centers.  Flywheel UPS and battery UPS provide the same essential function, but the way that function is achieved, the way energy is stored, is different.  Flywheel batteries store kinetic energy that remains waiting for when it is needed.  Flywheel systems pack a large energy density in a small package.

Flywheel UPS systems tend to be significantly smaller than battery UPS systems.  This can be an advantage when data center square footage is a premium.  Further, Flywheel UPS systems are easier to store – they do not need as much ventilation, require less maintenance, and do not need special disposal arrangements to be made when their lifespan is complete. Flywheel UPS systems can literally last decades with a minimal amount of maintenance which is a stark contrast to battery UPS systems.

batteryOne of the most significant drawbacks of a flywheel UPS system is its power output capacity when compared with battery UPS systems.  TechTarget explains this key difference, “The UPS reserve energy source must support the UPS output load, while UPS input power is unavailable or substandard. This situation normally occurs after the electrical utility has failed and before the standby power system is online. As you determine whether flywheels are appropriate for a project, the amount of time that the reserve energy must supply the UPS output is key. For comparable installed cost, a flywheel will provide about 15 seconds of reserve energy at full UPS output load, while a storage battery will provide at least 10 minutes. Given 15 seconds of flywheel reserve energy, the UPS capacity must be limited to what one standby generator can supply.”  Though flywheels cannot deliver the same length of power output that battery UPS systems can, multiple parallel flywheels can be installed so that they all supply backup power in the event that they are needed.

Something important to consider is the type of data center.  If your data center is part of a larger network of data centers then if power failure occurs, another data center could take over the data load and support your data center for a short time until you are back online.  Many data centers are employing this network structure as a better means of maximizing uptime and efficiency.  If this is the case, something like a flywheel UPS system may be ideal because you do not need a prolonged power supply in the event of an emergency.  A shorter UPS runtime is all that is needed.  But, make no mistake; many data center managers still want the maximum amount of time possible when it comes to UPS capacity.  Further, some data centers are opting for a hybrid UPS system that employs both battery and flywheel. While the initial investment in a hybrid UPS system may be more, it should pay for itself in a matter of a few years.

Another important consideration is energy efficiency since many data centers are trying to become more “green.”  Though flywheel UPS systems are often thought of as the green option, Schneider Electric points out that this common assumption may be incorrect, “The results may come as a surprise to many. In almost all cases, VRLA batteries had a lower overall carbon footprint, primarily because the energy consumed to operate the flywheel over its lifetime is greater than that of the equivalent VRLA battery solution, and the carbon emissions from this energy outweighs any carbon emissions savings in raw materials or cooling. Of course, the tool lets users conduct their own comparison to see for themselves. This analysis and tool are a good reminder that decisions around energy storage needs to factor in a number of variables.”  It is more apparent than ever before that ever data center must evaluate their unique, individual needs, as well as their energy goals and uptime goals when choosing which type of UPS system is best.

A battery UPS system supplies electrical power through a chemical reaction that happens within the battery, unlike a flywheel system that uses kinetic energy.  Battery UPS systems are often favored by data centers because they can provide a much longer supply of power than a flywheel UPS.  The exact length of time available will depend heavily on the battery’s age, how well it has been maintained, etc. but for reference, a battery UPS may be able to provide 5+ minutes of power (and sometimes much more depending on a variety of factors as mentioned above) vs. a flywheel UPS that may only be able to provide less than a minute of backup power.

data center maintenanceThough a battery UPS provides longer power supply when it is needed, it is not without its drawbacks.  UPS batteries must be routinely maintained.  This includes visual inspection, ensuring adequate cooling and ventilation, cleaning and more to ensure that they will work properly in the event that they are needed.  Additionally, UPS batteries have a shorter lifespan than flywheel UPS systems.  This is because the chemicals within the batteries diminish over time and ultimately lead to battery failure.  For this reason, UPS batteries must be not just routinely maintained but frequently checked to ensure that they are still working and capable of supplying power.

Further, a UPS battery has a limited number of discharge cycles.  Though it can recharge, if it is frequently discharged and then recharged, it will diminish its “expected” capacity and lifespan over time.  For flywheel UPS systems, this is not a problem (though it should be noted that flywheels can only discharge a limited number of times in a short time frame, but multiple discharges over a long period of time is not problematic).  Additionally, UPS batteries contain hazardous materials that must be safely and correctly disposed of when no longer needed.  This means that UPS batteries require special disposal methods that flywheel UPS systems do not require.

As we discussed earlier, because there are advantages and drawbacks to both flywheel and battery UPS systems, many data centers are opting for a hybrid approach.  Data Center Knowledge explains the advantages of having a hybrid system that employs the use of both flywheel and battery power, “According to Kiehn, while the general trend is toward lower-cost systems with shorter runtimes, the size of the market that still wants 5 minutes or more shouldn’t be underestimated. “A lot of customers are still asking for 5 minutes,” he said. They include colocation providers, financial services companies, as well as some enterprises…There are also reliability and TCO benefits to having both flywheel and batteries in the data center power backup chain. When utility power drops, the flywheel will react first and in most cases will never transfer the load to batteries, since the flywheel’s runtime is enough for a typical generator set to kick into gear, Anderson Hungria, senior UPS product manager at Active Power, explained. Because the batteries are rarely used, initial and replacement battery costs are lower. Theoretically, it may also extend the life of the battery, but the vendor has not yet tested for that. As two alternative energy storage solutions, the flywheel and the batteries act as backup for each other, making the overall system more reliable.”

In the technology world, processes and products that are the “old” way of doing things tend to go away quickly in favor of the latest and greatest advancements.  But, when it comes to flywheel UPS systems, they are getting a new life, particularly in the form of hybrid UPS systems.  Flywheels are not an alternative to UPS batteries when it comes to energy efficiency or length of power supply – but that does not mean they are not a viable option for many data centers.  Depending on unique data center needs, they should be considered both from a standalone perspective or as part of a hybrid UPS system to ensure better backup power supply that maximizes uptime and efficiency.


Posted in Back-up Power Industry, computer room maintenance, Data Center Battery, data center equipment, Data Center Infrastructure Management, data center maintenance, DCIM, Facility Maintenance, Power Management, Uninterruptible Power Supply, UPS Maintenance | Tagged , , , , , , , , | Comments Off

Proper Maintenance and Service of UPS System is Critical to Preventing Failure

UPS Maintenance Image-withlogo

There are few things more important to a data center than continuous power.  Without it, a data center will experience prolonged downtime, significant financial loss, a damaged reputation and other damaging effects.  It is for this reason that data centers focus a lot of their time and energy on power redundancy and ensuring that there is a properly functioning uninterruptible power supply (UPS).  A UPS will sit waiting and, should it be needed due to a power failure, will supply necessary power to keep data center infrastructure up and running.  There are a variety of UPS sizes to accommodate assorted power loads and many data centers implement multiple UPS systems to ensure they are protecting against downtime.  It is important that a UPS be prepared to function at a moment’s notice so that there is not significant loss of data.  The problem is, many data centers experience UPS failure and, the majority of times a UPS fails, it is due to lack of proper maintenance and servicing.

A power failure can occur for a variety of reasons – power outage, power surge, power sag and more.  Whatever causes a power fluctuation or outage, even a few moments of downtime can bring with it severe costs.  Should any power fluctuation or outage occur, a UPS will pick up right where the power supply left off, eliminating downtime, data loss, and damage to infrastructure.  A UPS is often thought of as a “dependable” power supply in case of emergency but, if it is not properly maintained and serviced, it may not be particularly dependable.

To be able to determine how to best maintain your data center UPS system, you must first understand why UPS systems fail from time to time.  Just like that 10 year old battery in your junk drawer may not have very much life left in it, UPS batteries diminish over time.  Even if you have not needed to use your UPS, the battery that powers it will lose capacity over time and not have as much life as originally intended.  UPS battery deterioration is often further expedited because of the often high temperatures inside data centers.  Fans occasionally fail because certain components such as ball bearings dry out or fans lose power from continuous use.  Additionally, power surges such as those caused by lightning or other transient spike can diminish a UPS battery.  Dust accumulation on UPS components can diminish UPS efficacy.  Further, the UPS battery discharge cycle (how many times the battery has been discharged and recharged) will shorten the overall life of a UPS battery.  A typical 3-phase UPS has an average lifespan of 10 years and without proper maintenance it could be much shorter.

batteryIf you think you are doing enough by occasionally checking your UPS battery, you may be leaving your data center exposed to an outage and downtime.  Government Technology explains just how many data centers are experiencing downtime due to UPS failure and preventable human errors, “Data center outages remain common and three major factors — uninterruptable power supply (UPS) battery failure, human error and exceeding UPS capacity — are the root causes, according to a new study released earlier this month. Study of Data Center Outages, released by the Ponemon Institute on Sept. 10, and sponsored by Emerson Network Power, revealed that 91 percent of respondents experienced an unplanned data center outage within the last 24 months, a slight dip from the 2010 survey results, when 95 percent of respondents had reported an outage…Fifty-five percent of the survey’s respondents claimed that UPS battery failure was the top root cause for data center outages, while 48 percent felt human error was the root cause.”  By correcting human error and properly maintaining your UPS system, you can dramatically decrease your data center’s risk of downtime.

To prevent UPS failure, it is imperative that you regularly maintain and service your UPS as part of your Data Center Infrastructure Management (DCIM) plan.  There are a few key components of proper UPS maintenance and service but physical inspection is at the core.  If you are not physically checking on your UPS system on a regular basis, there is no way to know if there is something visibly wrong or problematic that could lead to a failure.  The best thing you can do is create a UPS maintenance and service checklist and keep a detailed log of all maintenance and service to ensure that maintenance does not fall behind. Your checklist should include checking the UPS battery including testing it to ensure it is working, the UPS capacitors, the ambient temperature around the UPS, calibration of equipment, performing any service that might be required (check air filters, clean and remove dust), verify load share and make any necessary adjustments, and more.

If UPS battery failure is one of the most common causes of UPS failure and thus downtime, it is only logical that this should be one of the most important parts of your UPS maintenance checklist.  Battery discharge should be routinely checked to ensure that it is not diminished and incapable of handling the necessary power load in the event of a failure.  It is also important to visually inspect the area around the UPS and the battery itself for any obvious obstructions, dust collection or other things that may prevent adequate cooling.  If you are seeing a warning that the battery is near discharge perform necessary maintenance.  Further, the AC input filter capacitors should be checked, along with the DC filter capacitors and AC output capacitors for open fuses, swelling or leakage.  Next should you visually inspect all components for any obvious problems.  Inspect the major assemblies, wiring, circuit breakers, contacts, switch gear components, and more.  Should you see obvious damage, perform necessary maintenance and service.

Next, because data centers operate at a high temperature due to the energy output of the infrastructure, it is important to check the ambient temperature around the UPS system because a high temperature can diminish the battery capacity.  Schneider Electric explains best practices for maintaining ambient temperature around UPS for maximum battery life, “It is recommended that the UPS be installed in a temperature controlled environment similar to the intended application.  The UPS should not be placed near open windows or areas that contain high amounts of moisture; and the environment should be free of excessive dust and corrosive fumes.  Do not operate the UPS where the temperature and humidity are outside the specified limits.  The ventilation openings at the front, side or rear of the unit must not be blocked… All batteries have a rated capacity which is determined based on specified conditions.  The rated capacity of a UPS battery is based on an ambient temperature of 25°C (77°F).  Operating the UPS under these conditions will maximize the life of the UPS and result in optimal performance.  While a UPS will continue to operate in varying temperatures, it is important to note that this will likely result in diminishing the performance and lifespan of your battery.  A general rule to remember is that for every 8.3°C (15°F) above the ambient temperature of 25°C (77°F), the life of the battery will be reduced by 50 percent.  Therefore, keeping a UPS at a comfortable temperature is crucial to maximizing UPS life and capabilities.”

ups-158315_1280Visual inspection should include dust and dirt removal on the UPS system.  UPS system will sit and accumulate dust over time but dust could interfere with proper heat transfer so dust should be promptly removed to ensure the UPS system will function properly when needed.  Further, check all air filters for dust accumulation.  Dust accumulation on filters could lead to inefficiency and even overheating.  Clean and replace filters as needed to properly maintain your UPS.  Capacitors are also an integral component of UPS systems.  Capacitors aid in the transition of power in the event of an outage so if they fail, the UPS will likely fail.  Capacitors need to be routinely checked because they will dry out from wear and tear so they need to be replaced every few years to ensure proper UPS function.

Though much of the suggested UPS maintenance and service strategy may sound basic, even obvious, the fact of the matter is that UPS failure continually remains a primary source of data center downtime.  And, when you couple that with human error, it is easy to see that many data centers simply are not properly maintaining their UPS systems to prevent failure.  All of these tasks do not need to be completed every day or even every week, certain tasks can be performed weekly while others can be monthly, quarterly, semi-annually, and annually.  By breaking it up you ensure that your UPS system is being frequently and routinely checked while making routine maintenance a far more achievable task.  Additionally, by maintaining a detailed log you can see if UPS maintenance and service has fallen behind and immediately address any concerns.  When data center technicians routinely check the UPS system, they will become familiar with what looks normal and what looks concerning so that, should anything look problematic, it can be addressed and remedied immediately for peace of mind that your UPS will be there when you need it and prevent costly downtime.

Posted in computer room maintenance, data center equipment, Data Center Infrastructure Management, data center maintenance, DCIM, Facility Maintenance, Uninterruptible Power Supply, UPS Maintenance | Tagged , , , , , , , , | Comments Off

The Convergence of IT & OT in Data Centers

IT and OT – though they are two different things, the previous tendency to “divide and conquer” when it came to strategy, management and solutions is going away.  When it comes to IT and OT, their worlds are colliding inside data centers.  Operating as two separate entities without communication and collaboration is not effective, efficient or ideal. Though not all data centers are operating with IT/OT convergence, the transition has begun – IT/OT convergence is already happening in healthcare, energy, aviation, manufacturing, transportation, defense, mining, oil and gas, utilities, natural resources sectors, and more – and it is only a matter of time until it is simply the data center industry standard.

IT_OT Convergence ImageOT (operational technology) has a few primary focuses – maximizing uptime, ensuring the proper function of equipment and infrastructure, and the security and availability of operational assets and processes.  OT is a blend of both hardware and software so that environmental maintenance can occur.  Though some are not even familiar with the name “OT,” OT is essential to the day-to-day operations of a data center. The convergence of IT and OT is happening because the specific technology involved in operational technology (such as communications, software and security) is evolving and OT is integrating more information technology (IT) into their operations.

IT focuses on the use and integrity of data and intellectual property.  Its focus is on things like storage, networking devices, computers, and infrastructure that facilitate improved information storing and security. In contrast to OT, IT (information technology)’s security focus is the protection and preservation of confidential information.  Though they are two different things, they are not mutually exclusive and what data centers are finding is that there is more than just overlap, a convergence is happening.  Schneider Electric elaborates on why IT and OT worlds are colliding, “Security systems are needed to protect facilities. IT is needed to run security systems. Apply a bit of basic math theory to these statements, and it is easy to conclude that IT is then needed to protect facilities. If you are thinking this sounds like OT and IT convergence, you’re right; but security requirements push the boundaries even further to compel departmental collaboration between OT and IT. At the core, lies the need for reliable delivery of clean and continuous power.”

To maintain uptime and maximize security, IT and OT must work together. Think about factors that could lead to downtime or a security breach – problems with infrastructure management, equipment overheating, fire, flood, problems with lighting, problems with the security system, a physical breach of security, a cyber-attack, and more.  Many of these things fall under the OT umbrella but some fall under the IT umbrella.  And, in reality – managing and mitigating them involves both IT and OT.  In order to properly remote-manage a data center, and maintain RTOI (real-time operational intelligence), a proper DCIM must be in place and IT must be able to communicate with monitoring systems so that proper and accurate information is received.  As we have previously discussed, when this information is received in real time, downtime can be significantly reduced.  TechTarget elaborates on why IT and OT are converging in the way that they are now, and how it will improve efficiency and maximize data center operations, “While IT inherently covers communications as a part of its information scope, OT has not traditionally been networked technology. Many devices for monitoring or adjustment were not computerized and those with compute resources generally used closed, proprietary protocols and programmable logic controllers (PLC) rather than technologies that afford full computer control. The systems involved often relied on air gapping for security. Increasingly, sensors and connected systems like wireless sensor and actuator networks (WSANs) are being integrated into the management of industrial environments, such as those for water treatment, electric power and factories. The integration of automation, communications and networking in industrial environments is an integral part of the growing Internet of Things (IOT). IT/OT convergence enables more direct control and more complete monitoring, with easier analysis of data from these complex systems from anywhere in the world.”

AdobeStock_93793795When you integrate infrastructure management systems, your data center information will be able to flow between departments with ease.  Data from IT can and should be an indispensable tool in providing the information OT needs to formulate strategy and make decisions. The result will be increased productivity, improved efficiency, decreased downtime, and enhanced security.  With integration, knowing what your data center needs will be timely and accurate, making effective maintenance far easier.  Your RTOI will be accurate so, should you need to make a quick adjustment – whether large or small – you will hopefully know before you experience any problems or catastrophic events.

So, it seems like a simple solution, right? And, clearly based on the advantages of working together any data center would jump all over it?  Though IT/OT convergence are certainly the future of data centers, it is not necessarily an easy task to bring the two together.  GE elaborates on the challenges of IT/OT convergence, “Many cultural and technological impediments make IT/OT convergence challenging. From the perspective of culture, IT and OT have traditionally been well-separated domains. When smart assets and infrastructure are introduced, it becomes necessary to figure out new ways to divide ownership and responsibility for the management and maintenance of that infrastructure. This can potentially lead to turf wars and blame games. On top of that, OT standards have generally been proprietary and vendor specific, optimized exclusively for specialized tasks. Unifying IT and OT requires implementing well-defined standards that scale all the way from assets to data centers and back. These standards also need to account for enhanced security, since operational assets that were previously disconnected from widespread communication networks could now be vulnerable. It’s all about the enterprise. All that daunting work can be made easier, however, by the concept of “enterprise architecture.” Enterprise architecture is a top-down methodology for developing architecture by focusing first on organizational goals, strategy, vision, and business before delving into the technological specifics. This approach could keep IT/OT deployment aligned with achieving Industrial Internet goals. Going through the process of integrating IT and OT might require some initial effort, but the payoffs are worth it.”

With any changes in data centers, there are growing pains.  Logistical intricacies to fine tune.  Security challenges.  There will always be a list of challenges in implementing change.  But, the convergence of information technology and operational technology is a value-added change.  The specific values will vary amongst industries but, make no mistake, convergence will have a payoff. Though there will be challenges in converging IT and OT, success is very achievable with thorough planning, proper execution and full implementation of an IT/OT strategy.  All data center team members must be fully educated and on board to be properly prepared for the change.  Make no mistake; IT and OT are not the same.  Though they are converging they are different and separate yet, joint structures.  If a harmony and alignment of strategies can be found, IT and OT convergence can be a stunning success.

By converging IT and OT, there will be similar technology and this overlap of sorts will allow the two to work together synergistically.  This will be beneficial in a variety of ways but one of the most prominent ways is that it will be cost-saving. Not only because costly downtime will be reduced but because IT and OT teams can, in some ways, be combined and redundant team members pruned for efficiency. In addition to this, convergence will provide risk reduction because there will be an overlap of security issues and those issues will be able to be simultaneously addressed.  And, perhaps most significantly, data centers will enjoy enhance performance from IT/OT integration.  Bad redundancies (such as similar but separate operations that could be under one umbrella) and good redundancies (such as finding ways in which IT and OT can synergistically work together) enhanced.  Further, convergence will improve performance in the form of enhanced system availability. Better performance that will mean more uptime because of a reduced risk of things like cyber-attack, poor infrastructure management, power failure and more.  Through a collaborative effort, a focus on future technologies, a drive toward maximizing uptime and minimizing security risk, and a desire for improved efficiency, data centers will successfully achieve IT/OT convergence and step into the future of data centers.








Posted in Data Center Design, data center equipment, Data Center Infrastructure Management, data center maintenance, Data Center Security, Datacenter Design, DCIM, Hyper Converged Infrastructure, Power Management | Tagged , , , , , , | Comments Off

Data Center RTOI

Technology is evolving minute by minute and data centers must work to keep up with the lightening-paced evolution.  We have discussed the Internet of Things (IOT) before – the world is becoming increasingly dependent on the internet and every day processes are becoming digitized for efficiency and savings.  But, as more and more of the world becomes digitized, technology advances and data grows, and that data must be effectively and efficiently stored.  Data centers make investments in infrastructure, backup power, security and more so that they can adequately store that growing and evolving data but when things move so quickly, constant monitoring must be happening to ensure that data is not just stored properly but safely and efficiently.  Old methods of collecting and analyzing data are archaic and simply not practical.  Analyzing what went wrong after the fact, or realizing something is about to go wrong when there is not enough time to fix the problem is useless.  And, ultimately, these traditional methods are responsible for a lot of downtime in data centers.  Accurate, actionable information in real time is the only way data centers can effectively operate moving forward.

rtoi imageData centers are notoriously energy-inefficient but most data centers today are making efforts to improve and be more energy efficient.  The undertaking is not simple or straightforward because every data center is different and has unique needs.  Data centers cannot run at capacity because, should capacity change, data centers will be ill-equipped.  But, at the same time, data centers should not run way beyond what is necessary because that is a waste of energy.  More and more data centers managers are realizing the need for Real Time Operational Intelligence (RTOI).  Having access to current, accurate information is the only way to make intelligent and informed decisions about how to best manage the infrastructure of a data center.  What does RTOI look like in a data center? TechTarget provides a brief explanation of what RTOI is in a practical sense, “Real-time operational intelligence (RtOI) is an emerging discipline that allows businesses to intelligently transform vast amounts of operational data into actionable information that is accessible anywhere, anytime and across many devices, including tablets and smartphones. RtOI products turn immense amounts of raw data into simple, actionable knowledge. RtOI pulls together existing infrastructure to manage all of the data that is being pulling from a variety of sources, such as HMI/SCADA, Lab, Historian, MES, and other servers/systems. It then has the ability to organize and connect the data so it’s meaningful to the users. By integrating directly with existing systems and workflows, it can help assets perform better and help workers share more information.”  As more and more people, businesses and data centers are utilizing the cloud, and the cloud’s complexity continues to change, data management needs change and data centers struggle just to keep pace.

RTOI can greatly reduce waste and improve energy efficiency by helping identify what is in use and what is not so that things can be turned off strategically for energy savings.  Just think of all of the infrastructure in a data center that is consuming power even though it is not mission critical or in even in use. For example, determining which servers are in use and which servers can be, at least temporarily, powered down will yield significant energy savings.

One of the most significant advantages of well-executed RTOI is immediate knowledge of potential threats and the ability to deal with them before they cause downtime.  As we have often discussed, downtime is incredibly costly (costing, on average, thousands of dollars per minute).  No data center wants to experience downtime but, unfortunately, the vast majority will face it at one point or another.  Data centers can significantly reduce their risk of downtime with current, accurate, actionable information about what is happening in the data center.  As we have seen, anticipation of problems can only go so far.  Data centers simply cannot properly manage what they do not see or have knowledge about.  That is where RTOI comes in.

RTOI not only aggregates data but it measures it, tracks it, and, if well-executed, puts it easy-to-understand terms and statistics so that you can use the information to make informed decisions as well as to properly manage assets going forward.  RTOI can assist data centers in improving capacity planning, anticipating asset lifecycle and properly planning management, maintain and continuously meet regulatory compliance, optimize energy efficiency and more.

DCIM_RTOI_imagePlanning for data center capacity is far easier at the building stage but, once a data center has been built and is in operation, anticipating capacity needs, particularly as new technology means big data storage, is very challenging.  In fact, it is one of the biggest challenges data centers face today.  Panduit explains why capacity management is such a challenge in data centers, “Proactive capacity management ensures optimal availability of four critical data center resources: rack space, power, cooling and network connectivity. All four of these must be in balance for the data center to function most efficiently in terms of operations, resources and associated costs. Putting in place a holistic capacity plan prior to building a data center is a best practice that goes far to ensure optimal operations. Unfortunately, once the data center is in operation, it is all too common for it to fall out of balance over time due to organic growth and ad hoc decisions on factors like power, cooling or network management, or equipment selection and placement. The result is inefficiency and in the worstcase scenario, data center downtime. For example, carrying out asset moves, adds and changes (MACs) without full insight into the impact of asset power consumption, heat dissipation and network connectivity changes can create an imbalance that can seriously compromise the data center’s overall resilience and, in turn, its stability and uptime…Leveraging real-time infrastructure data and analytics provided by DCIM software helps maximize capacity utilization (whether for a greenfield or existing data center) and reduce fragmentation, saving the cost of retrofitting a data center or building a new one. Automating data collection via sensors and instrumentation throughout the data center generates positive return on investment (ROI) when combined with DCIM software to yield insights for better decision making.”

With accurate information in real time you can manage capacity needs and, in a moment’s notice, add capacity so that there are no problems.  Additionally, that kind of historical information is useful for predicting the need for data center expansion going forward.  For example, data centers often have orphan servers that are sitting doing nothing but collecting dust and sucking up resources like cooling and power.  Without careful and accurate management, these orphan servers could sit like this for weeks, months or even years, wasting resources that could be better allocated.  With real-time statistics about what exactly is going on in your data center, you can find these orphan servers and clean them out, freeing up capacity for other infrastructure.  In fact, carefully managing your data center’s capacity needs and more accurately anticipating future needs can mean saving millions of dollars in the long run.

DCIM and RTOI go hand in hand.  Without a proper plan for data center infrastructure management, and sophisticated monitoring software, RTOI is not achievable.  DCIM tools are necessary to measure, monitor, and manage data center operations including energy consumptions and all IT equipment as well as the facility infrastructure.  Fortunately, there are sophisticated DCIM software products available that will track sophisticated information all the way down to the rack level so that monitoring is made easy, even remotely. As mentioned, it is critical to leave behind old and archaic forms of DCIM, there is simply no way to really keep up.  Data centers, regardless of size, must focus on real-time operational intelligence as a means of accuracy. TechTarget explains why it is critical to focus on RTOI as a way of staying ahead potential problems, “Taking a new big data approach to IT analytics can provide insights not readily achievable with traditional monitoring and management tools, Volk said…For example, particularly with cloud resources, it can be difficult to anticipate how applications and data movement will affect each other. Cloud Physics allows cross-checking of logs and other indicators in real time to achieve that. This new approach is “leading edge, not bleeding edge,” Volk said. Its value to an organization will depend on the maturity and complexity of a given data center. Small and medium-sized businesses and organizations without much complexity will benefit, he said, “but companies with large and heterogeneous data centers will benefit even more.”  RTOI helps data centers provide better service to their customers, minimize downtime, improve efficiency, maximize reputation, and ultimately, save money through vastly improved operations.

Posted in Cloud Computing, data center equipment, Data Center Infrastructure Management, DCIM, Internet of Things, Mission Critical Industry | Tagged , , , , | Comments Off

Data Center Business Continuity

titan-power-business-continuity-infographicWhether you operate a data center or any other business, business continuity is incredibly important.  We all think we are immune to disaster but the reality is, if you have not formed a business continuity plan for disasters, you are leaving your data center at severe risk.  Imagine what it would be like if a disaster struck (flood, fire, etc.) and you could not get into your data center for a few hours – problematic, right?  What if that disaster was really bad and you could not get into your data center for a few days or weeks – huge problem. Business cannot come to a screaming halt so a strategy for maintaining business continuity is a must. A strategically formed, well-thought-through business continuity plan should be a part of any data center’s disaster recovery program.  A disaster recovery plan will be the big umbrella under which we will talk about business continuity because the two are inextricably related.  This is because disaster recovery focuses heavily on data recovery and management but, beyond maintaining and protecting data in the event of a disaster, a data center business and the businesses it serves must be able to continue to meet its most basic objectives.  During a disaster a data center may experience downtime in which all business operations come to a halt.  This is not a small problem – downtime may cost as much as $7,900 per minute.  A disaster recovery plan, along with a business continuity plan, will help a data center reduce downtime in the event of a disaster as well as operate continuously to meet business objectives.

To formulate a business continuity plan we must first outline what makes a successful business continuity plan.  A data center’s business continuity plan will function as a roadmap.  If a disaster strikes, you will hopefully be able to find the type of disaster in your business continuity plan and then begin following the “map” to get to the solution and restore your data center to business as usual. First and foremost, a proper business continuity plan will focus on what can be done to prevent disasters so that business continuity is never interrupted in the first place. Data centers must consider what their unique needs are because there is no such thing as a generic data center business continuity plan – it would never work.  Data centers must identify and asses all mission critical assets and risks.  Once they have been identified it will be far easier to formulate a business continuity plan with specific goals in mind.  You can prioritize your most problematic risks by focusing on the risk they pose to mission critical assets. In considering individual needs it is imperative that data centers determine what applications and processes are mission critical. For example, you’re your mission critical systems be maintained remotely? Additionally, in today’s data center world where security is a top concern, maintaining data security should be an important part of your business continuity plan.

Disaster prevention is a central part of your data center’s business continuity plan.  Identifying business continuity goals and potential problem areas will help you lay out a proper disaster prevention plan.  Depending on your unique data center, certain measures may be beneficial such as increased inspections of infrastructure, better surveillance, enhanced security in various areas including data centers grounds security and rack-based security, increased redundancy, and more.  Think in terms of real problems and real consequences; be specific so that you can make specific business continuity plans and strategies.

Some data centers may want to relocate their data center if a disaster is incredibly large but the logistics of this are far from simple.  Relocating for a disaster safely, rapidly, and securely is no simple task.  And, beyond that, it is expensive which is why many data centers – even large enterprise data centers – do not do this.  To do this properly as part of a business continuity plan, a detailed data center migration plan must accompany the business continuity plan.  Some enterprises may want to utilize regionally diverse data centers that mirror each other but this is also expensive and exceptionally complex to implement – though it can be very effective at maintaining uptime, maximizing security, and optimizing business continuity.

As mentioned, redundancy is an important part of maximizing uptime and maintaining business continuity in a data center. As part of your data center’s business continuity plan, you may want to implement load balancing and link load balancing.  Server load balancing and link load balancing are two strategies that may be used to help prevent the loss of data from an overload or outage in a data center. Continuity Central Archive explains how these two strategies can be used in data centers, “Server load balancing ensures application availability, facilitates tighter application integration, and intelligently and adaptively load balances user traffic based on a suite of application metrics and health checks. It also load balances IPS/IDS devices and composite IP-based applications, and distributes HTTP(S) traffic based on headers and SSL certificate fields. The primary function of server load balancing is to provide availability for applications running within traditional data centers, public cloud infrastructure or a private cloud. Should a server or other networking device become over-utilized or cease to function properly, the server load balancer redistributes traffic to healthy systems based on IT-defined parameters to ensure a seamless experience for end users…Link load balancing addresses WAN reliability by directing traffic to the best performing links. Should one link become inaccessible due to a bottleneck or outage, the ADC takes that link out of service, automatically directing traffic to other functioning links. Where server load balancing provides availability and business continuity for applications and infrastructure running within the data center, link load balancing ensures uninterrupted connectivity from the data center to the Internet and telecommunications networks. Link load balancing may be used to send traffic over whichever link or links prove to be most cost-effective for a given time period. What’s more, link load balancing may be used to direct select user groups and applications to specific links to ensure bandwidth and availability for business critical functions.”

Cloud computing flowchart with businessmanData centers are also utilizing the cloud for their business continuity plans because it is cost-efficient and highly effective.  The cloud platform is exceptionally effective for business continuity, particularly as data centers move more and more towards virtualization.  A cloud service with proper SLA (service level agreement) can ensure that data will be continuously saved and protected even in the event of a disaster.  This is where identifying mission critical applications and information are important.  The entirety of the data center’s workload does not need to be recovered in an instant, only that which has been determined mission critical.

In addition to the cloud, many data centers opt to implement image-based backup for continuity.  Data Center Knowledge provides a helpful description of what image-based backup is and how it can be used uniquely in data centers, “Hybrid, image-based backup is at the core of successful business continuity solutions today. A hybrid solution combines the quick restoration benefits of local backup with the off-site, economic advantages of a cloud resource. Data is first copied and stored on a local device, so that enterprises can do fast and easy restores from that device. At the same time, the data is replicated in the cloud, creating off-site copies that don’t have to be moved physically. Channel partners are also helping enterprises make a critical shift from file-based backup to image-based. With file-based backup, the IT team chooses which files to back up, and only those files are saved. If the team overlooks an essential file and a disaster occurs, that file is gone. With image-based backup, the enterprise can capture an image of the data in its environment. You can get exact replications of what is stored on a server — including the operating system, configurations and settings, and preferences. Make sure to look for a solution that automatically saves each image-based backup as a virtual machine disk (VMDK), both in the local device and the cloud. This will ensure a faster virtualization process.”

While not every data center will experience a “major” disaster where they cannot get into their facility for weeks, many data centers will experience some type of disaster.  And, as mentioned, mere minutes can cost tens of thousands of dollars.  Beyond the bottom line, the inability to continuously maintain data center business may damage your reputation irreparably.  An effective business continuity plan is capable of pivoting around both people and processes depending on the specific circumstances.  Rapidly restoring data and operations is the goal and data centers should take that goal and work backwards from there to determine the best path to maintaining business continuity.

Posted in Back-up Power Industry, Cloud Computing, Computer Room Design, DCIM, Uninterruptible Power Supply | Tagged , , , , , | Comments Off

Controlling Rack Access for Data Center Security

AdobeStock_56769671Stringent security protocols are one of the most important aspects of properly running any data center.  With constant, round-the-clock advancements in technology, the focus of security protocols is often on things like cloud/cyber security, particularly because there have been any significant security breaches recently.  Cyber security is certainly important and nothing to ignore, but it is also important to not forget about physical security.  To provide the optimal and industry-acceptable level of security, data centers must provide security on multiple levels.  This will help dramatically reduce the risk of a security breach, allow data centers to remain compliant to certain industry regulations, and will provide peace of mind to customers that everything is being done to protect data integrity.  Ensuring proper physical security compliance will help data centers avoid costly data breaches, and the resulting penalties that may arise as well.

So often, physical security efforts are focused on access to data center grounds and to the facility itself.  These efforts, while valuable and necessary are not where physical security measures should stop.  Once inside the data center facility itself there should not be unrestricted access to server racks.  There are a wide variety of individuals that must pass through a data center on a daily basis, including internal engineers, external engineers, data center staff, cleaning staff and more.  Unfortunately, many data breaches are actually “inside jobs” and therefore security at the rack level is vitally important.

Colocation data centers must be particularly vigilant with rack level security because they often house multiple businesses’ security within the same data center and some of those businesses may even be in competition.  It may sound like there is a simple solution – locked doors or cages for server racks – right?  Unfortunately, wrong.  Traditional locks can only be so complex and if a threat is able to gain access to data center grounds or get inside a facility, they can likely handle those locks.  To meet industry standards and comply with federal regulations, it simply must go beyond that, as Schneider Electric points out, “Further increasing the pressure on those managing IT loads in such locations, regulations concerning the way data is stored and accessed extends beyond cyber credentialing, and into the physical world. In the US, where electronic health records (EHR) have become heavily incentivized, the Healthcare Insurance Portability & Accountability Act (HIPAA) demands safeguards, including “physical measures, policies, and procedures to protect a covered entity’s electronic information systems and related buildings and equipment, from natural and environmental hazards, and unauthorized intrusion.” Similar measures are also demanded, e.g., by the Sarbanes-Oxley Act and Payment Card Industry Data Security Standard (PCI DSS) for finance and credit card encryption IT equipment. In addition to building and room security, it has become vital to control rack-level security so you know who is accessing your IT cabinets and what they’re doing there.”

biometrics-154662_1280For best security, custom rack enclosures can provide peace of mind that they are far harder to access than standard, “off the shelf” enclosures.  Additionally, many data centers are opting for biometric security, pin pads (where codes are changed frequently) or keycards.  Biometric locks do not use traditional keys, rather, they scan things like fingerprints or handprints. Biometric locking systems have grown significantly in popularity because they provide truly unique access.  Keycards can get lost and pin codes can be shared but a fingerprint or handprint cannot be easily shared or duplicated so it is a far more sophisticated security measure. Many worry about the consistency, accuracy and performance of biometric security but it has become incredibly advanced, as Data Center Knowledge notes, “The time taken to verify a fingerprint at the scanner is now down to a second. This is because the templates – which can be updated / polled to / from a centralized server on a regular basis – are maintained locally, and the verification process can take place whether or not a network connection is present. The enrollment process is similarly enhanced with a typical enroll involving three sample fingerprints being taken on a terminal, with the user then able to authenticate themselves from that point onwards. This level of efficiency, cost effectiveness and all round reliability of fingerprint security means that a growing number of clients are now securing their IT resources at the cabinet level and integrating the data feed from the scanner to other forms of security such as video surveillance.”

These electric locks that restrict rack access provide multiple levels of enhanced security.  For example, with electric locks, when a user scans a fingerprint or inputs a code, a central server validates authenticity and then allows or restricts access. An additional advantage of using this method is that the electronic system will automatically generate a log that details who has accessed what, and when.  This electronic tracking is far more convenient, as well as far more accurate, than manual tracking of access.  These electronic systems can be directly connected to data center facility security systems so that, should there be a problem, systems can go into automatic lockdown and alarms can be sounded in an instant.  Also, there are video surveillance options that come along with electronic-based security and monitoring.  Video surveillance can be programmed to turn on when biometric scanning is being performed, when pin codes are being entered, when security cards are being swiped or more.  Additionally, video surveillance can be programmed so that, when someone is accessing a rack it automatically captures an image of who is accessing the rack and sends it to the data center manager.  The data center manager can then choose to watch the surveillance as it happens for an enhanced level of security. This level of security also may reduce the cost and need for a physical security guard, particularly when each rack is monitored by video surveillance. With this sort of security implemented at the rack level, there will be a detailed log of who is accessing what server and when, and should a problem arise, it will be immediately apparent at which server there has been a security breach.  Further, with advanced electrical-based locking systems, they can be pre-set to only allow access at certain times.  For example, if there should never be access “after hours” to certain racks, they can be set to only allow access for pre-determined times.

Another advantage of advanced electronic locking mechanisms is that they can be easily and effectively remotely monitored.  Having on-site security staff is beneficial but is not always possible and, as previously discussed, it is advantageous to have multiple levels of security which is why remote monitoring is important.  Many government and industry regulations now have strict security parameters that data centers must remain in compliance with or face strong penalties.  These security standards are set to help protect secure financial, health and other sensitive information and they require multiple levels of security and that includes rack level security.  To not protect rack level security means that many data centers will not be in compliance – a major (and costly!) problem.

While cost of implementation may seem prohibitive to some, many are now recognizing that the cost of a breach will likely be far higher.  The same level of security used for facility access points should also be used at the rack level when optimizing data center security protocols.  Whether you are retrofitting an existing data center or building a new data center, and whether your data center has 1 rack or 100 racks, they should each be secured separately at the rack level.  Cyber security is a growing and complex arena, easily grabbing the attention of both the customer and the data center facility manager but it is critically important that physical security not be neglected.  In an age where many businesses are foregoing their enterprise data center in favor of colocation, colocation providers must be stringent in their protection of their customer’s data – not just for peace of mind and best practices, but to remain compliant with federal regulations.  If you think you are immune to a data breach, IBM Security’s most recent study will not put you at ease because they found that the global risk for a data breach in the next 24 months at 26 percent.  And, the cost will not be small!  The average consolidated total cost of a data breach is $4 million.  While the cost to implement state-of-the-art rack level security will not be small, it is will continually pay for itself over time and will likely be far less than the cost of a security breach.


Posted in computer room construction, Computer Room Design, Data Center Construction, Data Center Design, data center equipment, Data Center Infrastructure Management, Data Center Security, DCIM | Tagged , , , , | Comments Off

Strategies For Monitoring UPS Batteries & Preventing Failure

Aside from security, maximizing uptime is likely the top priority of just about any data center, regardless of size, industry or any other factors.  Most businesses today run on data and that data is being facilitated by a data center.  Businesses, and their employees and customers, depend on data being available at all times so that business processes are not interrupted.  Every second a data center experiences downtime, their clients experience downtime as well.  Data center managers and personnel are on a constant mission to prevent downtime and they must be vigilant because downtime can occur for a variety of reasons but one has been and remains the #1 threat – UPS battery failure.

UPS (Uninterruptible Power Supply) is the redundant power supply that is supposed to back up a data center in the event of an energy problem such as power failure, or a catastrophic emergency.  Having an uninterruptible power supply is necessary in any size data center because no batteries last forever and, unfortunately, even the most observant and effective data center managers cannot prevent some power failures.  The UPS also contains a battery that will kick in should the primary power source fail so that a data center (and its clients) can experience continuous operation.  Unfortunately, the very thing that is supposed to provide backup power – the UPS – can sometimes fail as well.  Emerson Network Power conducted a 2016 study to determine the cost of and root causes of unplanned data center outages, “The average total cost per minute of an unplanned outage increased from $5,617 in 2010 to $7,908 in 2013 to $8,851 in this report… The average cost of a data center outage rose from $505,502 in 2010 to $690,204 in 2013 to $740,357 in the latest study. This represents a 38 percent increase in the cost of downtime since the first study in 2010…UPS system failure, including UPS and batteries, is the No. 1 cause of unplanned data center outages, accounting for one-quarter of all such events.”


Batteries lose capacity as they age justifying the need for a preventive maintenance program. Image Via: Emerson Network Power

In order to properly for a strategy for UPS failure prevention, it is important to look at why UPS failure occurs in the first place.  At the heart of the UPS system is its battery which powers its operation.  UPS batteries cannot simply be installed and then left alone until an emergency occurs.  Even if a brand-new battery is installed and the UPS system is never needed, the battery has a built-in lifespan and it will, over time, die.  So even if you think you are safe with your UPS system and your unused battery, if you are not keeping an eye on it, you may be in trouble when a power outage occurs.

Beyond basic life-expectancy in ideal conditions, UPS battery effectiveness may be reduced or batteries may fail for other reasons.  Ambient temperatures around the UPS battery, if too warm, may damage the UPS battery.  Another reason a battery may fail is what is called “over-cycling” – when a battery is discharged and recharged so many times that it reduces capacity of the battery over time.  Further, UPS batteries may fail due to incorrect float voltage.  Every battery brand is manufactured differently and has a specific charge voltage range that is acceptable.  If a battery is constantly charged outside the recommended charge voltage range – whether undercharging or overcharging – it will reduce the battery’s capacity and may lead to battery failure during a power emergency.

Fortunately, many of these UPS failures can be traced back to human errors that are preventable.  This means that data centers looking to prevent UPS failures and maximize uptime can do so by implementing and vigilantly following a UPS failure prevention strategy.  First, it is important to develop a maintenance schedule, complete with checklists for consistency, and actually stick to it.  Don’t let routine battery maintenance fall off of your priority list, while it may not seem urgent, it will feel very urgent if the power fails.

One of the first and most important things that a data center should implement in their strategy is proper monitoring of batteries.  Every battery will have an estimated battery life determined by the manufacturer, some even boast as long of a life cycle as 10 years!  But, as any data center manager knows, UPS batteries do not last as long as their estimated life cycle because of a variety of factors. Just how long they will actually last will vary which is why monitoring is incredibly important. Batteries must be monitored at the cell level on a routine schedule, either quarterly or semi-annually and it is important to also check each string of batteries.  By doing this on a routine schedule, you can determine if a battery is near its end of life cycle or has already reached its end of life cycle and make any necessary repairs or replacements.  If it appears a battery is nearing the end of its life cycle it may be best to simply replace it so as not to risk a potential failure.  In addition to physically checking and monitoring UPS batteries, there are battery monitoring systems that can be used.  While physical checks are still critical, battery monitoring systems can provide helpful additional support that may prevent a UPS failure.  Schneider Electric describes how battery monitoring systems can be a useful tool, “A second option is to have a battery monitoring system connected to each battery cell, to provide daily automated performance measurements. Although there are many battery monitoring systems available on the market today, the number of battery parameters they monitor can vary significantly from one system to another.

- A good battery monitoring system will monitor the battery parameters that IEEE 1491 recommends be measured. The 17 criteria it outlines include:

- String and cell float voltages, string and cell charge voltages, string and cell discharge voltages, AC ripple voltage

- String charge current, string discharge current, AC ripple current

- Ambient and cell temperatures

- Cell internal resistance

- Cycles

With such a system, users can set thresholds so they get alerted when a battery is about to fail. While this is clearly a step up from the scheduled maintenance in that the alerts are more timely, they are still reactive – you only get an alert after a problem crops up.”  Further, as your monitor your batteries it is important to collect and analyze the data so that you can make informed decisions about how to best maximize battery life.

Next, it is important to properly store your battery when not in use to maximize its lifespan which will help it function properly in the event of use.  A UPS battery must be charged every few months while in storage or its lifespan will be diminished.  If you cannot periodically charge your UPS battery while in storage, most experts recommend storing your battery in cooler temperatures – 50°F (10°C) or less – which will help slow down the degradation of your battery.

To keep your UPS battery functioning in optimal conditions, ambient temperature should not exceed 77 degrees Fahrenheit and should stay, generally, as close to that as possible.  It is important to not just prevent temperatures from exceeding that but prevent temperatures from frequently fluctuating because it will greatly tax UPS batteries and reduce their life expectancy.  It is important that your UPS is stored in an area of your data center where temperatures are carefully monitored and maintained to help promote proper function of your UPS in the event of an emergency.  Ideally, your UPS would be maintained in an enclosure with temperature and humidity control.


An increase in the number of annual preventive maintenance visits increases. Image Via: Emerson Network Power Network

While routine maintenance will require attention and dedication, it is not without merit.  In fact, Data Center Knowledge notes that there are statistics that back up the argument that routine maintenance really does prevent UPS failure, “In one study of more than 5,000 three-phase UPS units and more than 24,000 strings of batteries, the impact of regular preventive maintenance on UPS reliability was clear. This study revealed that the Mean Time Between Failure (MTBF) for units that received two preventive maintenance (PM) service visits a year is 23 times better than a UPS with no PM visits. According to the study, reliability continued to steadily increase with additional visits completed by skilled service providers with very low error rates.” Data centers must implement their own unique UPS maintenance strategy, tailored specifically to individual needs, and remain vigilant in their follow through.  Implementing UPS maintenance best practices, including maintaining proper temperatures, maintaining proper float voltage, avoiding over-cycling, properly storing batteries, utilizing UPS battery monitoring systems, and performing routine visual inspections, will help significantly decrease the risk of UPS failure.

Posted in Back-up Power Industry, computer room maintenance, Data Center Battery, data center equipment, Data Center Infrastructure Management, data center maintenance, DCIM, Uninterruptible Power Supply, UPS Maintenance | Tagged , , , , , , | Comments Off

Private Cloud vs. Public Cloud vs. Hybrid Cloud

Cloud computing, in one form or another, is here and it is not going anywhere.  It is for a good many reasons – it provides easy scalability, is less expensive than expanding infrastructure to add storage, is less expensive to maintain because it does require additional power and cooling, makes project deployment easier and quicker, and is easy to create redundancy and reliability.  While these benefits are significant, they are just the surface of the advantages of utilizing the cloud but there is debate over what type of cloud is best – public, private or hybrid.

Both the public cloud and private cloud offer a variety of advantages and drawbacks and what will work best for a data center will have to be decided on a case by case basis.  Cisco’s Global Cloud Index, which forecasts cloud usage for the years 2015-2020, provides some interesting insights into how cloud usage will transform data centers and enterprises going forward, “By 2020, 68 percent of the cloud workloads will be in public cloud data centers, up from 49 percent in 2015 (CAGR of 35 percent from 2015 to 2020)… By 2020, 32 percent of the cloud workloads will be in private cloud data centers, down from 51 percent in 2015 (CAGR of 15 percent from 2015 to 2020).”

Private cloud is essentially an internal, enterprise cloud that is privately managed and maintained.  The data center is responsible for hosting the private cloud within its own intranet and is protected by the data center’s firewall.  The private cloud provides all of the efficiency, agility, and scalability but also provides better control and security.  This is a great option for a data center that already has a robust infrastructure and enterprise set up, but it does demand more than the public cloud.  If a data center employs the private cloud, all management, maintenance and security falls squarely on the data center’s personnel.

One of the distinct advantages of the private cloud is how much control you have over how it works for your unique needs.  Private clouds can be configured to meet your needs, rather than you configuring your applications and infrastructure to meet the needs of the public cloud.  Many data centers have legacy applications that cannot always adapt well to the needs of the public cloud, but with the customizability of the private cloud, a data center can easily adapt the private cloud to meet the needs of the enterprise.

If your data center or enterprise prioritizes control, security, privacy and management visibility and worries about the security and privacy risks of shared resources in a public cloud, the private cloud may be the right fit for you because it will provide peace of mind that you know exactly where your data is and how it is being managed and protected at all times.  However, it is important to note that while having control of cloud management is seen as an advantage by many enterprises, the challenge of adequately managing the cloud can be significant, as noted by RightScale who conducted a study and survey of cloud computing trends for 2016, “26 percent of respondents identify cloud cost management as a significant challenge, a steady increase each year from 18 percent in 2013…Cloud cost management provides a significant opportunity for savings, since few companies are taking critical actions to optimize cloud costs, such as shutting down unused workloads or selecting lower-cost cloud or regions.”

The level of security provided by utilizing the private cloud may be particularly important for those enterprises involved in healthcare or banking/finance because of the strict regulations and requirements placed on security and privacy.  If you possess and work with data that is restricted by security and privacy mandates like HIPAA, Sarbanes Oxley, or PCI, you cannot use the public cloud to secure your data.  For such highly-sensitive information, you are required to store your data on the private cloud to remain compliant or else face high penalties.

The public cloud also provides the fundamental benefits of cloud computing, as well as its own advantages, but offers less control over maintenance, management and security.  Some enterprises may see the requirement of less management and maintenance as an advantage because they simply do not have the resources or personnel to manage the cloud themselves.  By opting for the public cloud, your data is stored in a data center and that data center is responsible for the management and maintenance of the cloud data.  For enterprises that do not have extremely sensitive data, the trade-off of security and control for less management and maintenance may be completely acceptable.  While you do not have as much control of management and security, data does remain separate from other enterprises in the cloud.

The public cloud does save on hardware and maintenance costs that would typically be incurred by your business.  You pay for the public cloud to use storage capacity and processor power so that you do not have to manage or pay for that capacity or power on your own.  Because you are paying for a service, it is easy to scale up or down quickly without much preparation or change on your end. The public cloud often functions on a “pay-per-use” model so you can quickly make changes, scaling up or down in literally a matter of minutes. For small businesses that do not work with highly sensitive data, the public cloud may be ideal.  But ultimately, it all comes down to how much control you need over the management and security of your data.

It is important to not forget that there is actually a third option – the hybrid cloud. The hybrid cloud is a blend of both private and public cloud, offering enterprises a solution that may provide the best of both worlds.  With the use of the hybrid cloud, enterprises can leverage the advantages of both the private and public cloud in partial ways that best suit the needs and resources of the enterprise.  By doing this, all sensitive data can be managed with the private cloud and the private cloud can be customized the suit any less-flexible applications.  Likewise, the public cloud can be used for information that is not as sensitive or governed by privacy and security mandates, and can also be used for on-demand scalability.

Hybrid cloud is a mix and match solution of the best elements of both private and public clouds for those enterprises with diverse needs.  This diversity is what will help many enterprises evolve and be flexible as IT innovations emerge.  What is interesting about a hybrid cloud solution is that it serves both large and small organizations well because of it offers flexibility, scalability, and security on an as-needed basis.  It allows organizations to slowly “dip their toes” in the public cloud pool while maintain control over sensitive data, via the private cloud, that they are not yet ready to put in the public cloud. RightScale’s survey of cloud computing trends for 2016 notes that hybrid cloud usage is on the rise, “In the twelve months since the last State of the Cloud Survey, we’ve seen strong growth in hybrid cloud adoption as public cloud users added private cloud resource pools. 77 percent of respondents are now adopting private cloud up from 63 percent last year. As a result, use of hybrid cloud environments has grown to 71 percent. In total, 95 percent of respondents are now using cloud up from 93 percent in 2015.”  Additionally, the hybrid cloud may be a very cost-effective solution, allowing enterprises to assign available resources to private cloud needs without having to retain vast additional resources that might be necessary if only using private cloud.

How enterprises use the cloud will depend heavily on resources, security and control needs, privacy restrictions, and scalability needs.  If you are struggling to decide what the right fit is for your enterprise, consider carefully what applications you intend to move to the cloud, how you currently use those applications, any regulatory concerns, scalability needs and your ability to adequately manage whatever your choice ultimately is.  If you have the infrastructure and resources to manage your cloud well, as well as security concerns, the private cloud may be the best option for your needs.  However, if you are a smaller organization with lower security concerns, offloading management responsibility by utilizing the public cloud may relieve a lot of the strain that a private cloud might place. Whether using public cloud, private cloud or a hybrid of the two, one thing is certain – almost everyone is using the cloud.  And, if they are not yet, they will likely be using it soon.


Posted in Cloud Computing, data center equipment, Data Center Infrastructure Management, Data Center Security | Tagged , , , , | Comments Off

Do You Have a DCOI Compliance Strategy?

Spring Cleaning Checklist

The recently established Data Center Optimization Initiative (DCOI) is an important mandate for federal data centers that encourages the sharing of information to encourage optimization of infrastructure and reduce inefficiency in data centers.  Nothing is more of a hot topic in data centers than the need to improve efficiency on all levels to remain sustainable and effective.  The White House describes the requirements of DCOI as follows, “The DCOI, as described in this memorandum, requires agencies to develop and report on data center strategies to consolidate inefficient infrastructure, optimize existing facilities, improve security posture, achieve cost savings, and transition to more efficient infrastructure, such as cloud services and inter-agency shared services.”

This initiative focuses on data center consolidation and optimization of existing data centers to reduce redundancy.  These measures will help make data centers more eco-friendly which benefits the environment but by improving efficiency also provides significant cost savings.  Further, DCOI recognizes and encourages the utilization of the cloud to improve efficiency and scale operations without expanding physical footprint.  Undoubtedly, plans and goals will have to be established to meet the demands of DCOI so early adoption is the best approach. Schneider Electric explains the importance of complying with DCOI, “ One of the key requirements for existing data centers is to “achieve and maintain” a PUE score of 1.5 or less…new, proposed data centers must be designed and operated at 1.4 or less with 1.2 being “encouraged”.  Another key requirement is deployment of data center infrastructure management tools (DCIM) in all Federal data centers since manual collection of PUE data will no longer be acceptable.  If Agency CIOs fail to achieve these scores and implement DCIM by September 30th, 2018, “Agency CIOs shall evaluate options for consolidation or closure…”. In other words, comply or be assimilated. Fortunately for these CIOs, legacy data centers often have plenty of room to improve infrastructure efficiency by reducing power and cooling energy losses to bring PUE scores within these limits.  In addition, DCOI targets are expected to result in the closure of approximately 52% of the overall Federal Data Center inventory1.  So it’s important to try to make as many improvements as is feasible even if you’re meeting the required 1.5 (or 1.4 for new) …i.e., increase your odds of survival by being as good as you can be. Agencies should start with an efficiency assessment of the site in question.  Find out where you’re at now and identify areas for improvement.”  DCIM technology should be implemented to monitor energy usage and improve energy consumption and every effort should be made by federal data centers to comply with DCOI going forward.

Posted in Cloud Computing, computer room construction, Computer Room Design, Data Center Build, Data Center Construction, Data Center Infrastructure Management, DCIM, Power Management | Tagged , , , , | Comments Off