How to Know When It is Time to Upgrade Your UPS & How to Do So Effectively


One of the most important components of any data center is their Uninterruptible Power Supply (UPS).  The UPS is tasked with maintaining uptime in a data center should there ever be a power interruption.  It literally provides an uninterruptible power supply to mission critical infrastructure within a data center.  With proper maintenance, a UPS system can save a data center from downtime that is not only incredibly costly but frustrating and very problematic.  Data center UPS systems are not new; they have been used for decades.  And, just like any technology that has been around for that long, the technology has evolved and enhanced over time.  A data center UPS is a long-term investment and transitioning to a new UPS system could mean potential downtime (among other things).  For this reason, many data centers avoid upgrading their UPS system, if for no other reason than to “avoid the headache.”  But eventually, all data centers must upgrade their UPS system – so how do you know when it is time to upgrade and how do you do it in a smooth and successful way?

For data centers, it can be tempting to “leave good enough alone” with their UPS system.  The UPS seems to be doing an adequate job, it is still working, still providing its essential job duty, so what is the harm in leaving it alone?  Well, a primary reason that many data centers decide to upgrade their UPS system is that it will give them increased power capacity when they need it the most.  Data centers are increasing their infrastructure and rack density to accommodate growing server demands and accommodate various other needs.  If a power interruption occurs and the UPS kicks in to provide backup assistance, it has to be able to actually provide adequate power.  The UPS system you had in place 5 years ago may have been more than enough for your needs at the time, but is it really adequate now?  Have you really evaluated your power needs and what your current UPS system can supply?  If not, now is the time to do so.

As you likely did in the past, you need to determine what your current power needs are and anticipate what your future needs may be when choosing to upgrade your UPS.  TechTarget provides some helpful insight for determining power capacity needs for your new UPS system, ““Increased resiliency and MGE’s un-paralleled load protection will benefit our clients the most,” said Yaeger, “while increased power capacity and improved energy help us the most.” Beware of using the nameplate. This is a legality rating, and will usually give a much higher volt-ampere rating than the unit will ever draw. For example, consider a unit with a nameplate that reads 90 – 240 volts at 4 – 8 amps with a 500 watt (W) power supply. First, the numbers are backward. The larger amperage goes with the lower voltage. If you assume a nominal 120 volts at 8 amps, you get 960 VA. A PF of 0.95 would yield 912 watts. No power supply is that inefficient, and a power supply almost never runs at full power. Therefore, it is highly unlikely that this device will ever draw more than 500 watts of power, but if you want to be really conservative, multiply by 1.1 and figure 550 Ws of input power…Once you have a realistic load estimate, plan to run a UPS around 80% of actual rated capacity. That provides headroom for peak operating conditions, gives you capacity to install a duplicate system before you decommission an old one, or lets you absorb a little growth before you outgrow the unit.”

In addition to increasing your UPS capacity to meet the power demands of your data center, many data centers opt to upgrade their UPS system in an effort to improve energy efficiency.  UPS technology has evolved to be far more intelligent than it was even a few years ago.  Today’s UPS systems have more sophisticated monitoring capabilities that can be integrated with your data center infrastructure management and monitoring for a more comprehensive picture of what is going on in your data center.  Data centers are all looking for ways to improve energy efficiency. Better monitoring allows for data center managers to make more accurate and timely decisions about power in their data center, dramatically improving energy efficiency. Even small changes to improve energy efficiency can lead to significant savings over time.  EnergyStar reports that that small improvements in energy efficiency lead to big savings, “DOE estimates that a 15,000-square-foot data center operating at 100W/square foot would save $90,000 by increasing UPS efficiency from 90% to 95%.”

ups-158315_1280Once you have determined what the best UPS system is for your needs, you need to determine how to safely and effectively transition the upgrade so that you do not experience downtime or the downtime that is experienced is anticipated and well-planned-for.  Schneider Electric describes how to successfully transition to a new UPS system, “Replacing an older UPS system with a new one may be more complex and time consuming than upgrading especially if the UPS to be upgraded is already modular in design. Careful planning and execution are required in order to minimize UPS downtime during the swap. Some vendors offer a service to do this work as a turnkey project. If the owner’s operations team does not have the availability or expertise, ask the UPS OEM vendor if they can perform every task associated with this effort including remove/dispose the old system, install the new, startup and commission the system, as well as transition an existing service contract (if one exists) all under one order…At the electrical input of the UPS, verification that the feeder breakers and conductors powering the UPS will support a specific replacement UPS is essential…Verification at a minimum includes: visual inspection of breakers and conductors, confirmation of breaker maintenance, as well as a review of electrical system studies (load flow, short circuit analysis, protection coordination, and arc flash) using electrical characteristics of the replacement UPS as a basis for the study. Operational interaction of the replacement UPS with standby generator(s) should also be included in this analysis.”

Data center with network servers in futuristic room.When it is time to upgrade your data center’s UPS system, preparation is the name of the game.  You will want to assemble a diverse team of people including the data center manager, facilities manager, electrical contractor, other relevant engineers, and more.  By doing so, you have representatives with various knowledge-bases that can provide critical assistance and information during the transition.  If you cannot have any downtime whatsoever, you will need a temporary UPS while you transition to the new UPS.  If a short outage is ok, you can plan exactly when that will occur and anticipate appropriately.  You can install as many of the components of the new UPS system as possible in anticipation of the outage, leaving only the last minute parts of the installation for during the outage.  Whatever is best for your data center, plan for a smooth transition and anticipate any possible things that could arise and arrange contingency plans just in case.

If your data center’s UPS is outdated, inefficient, or incapable of managing your data center’s power needs, it is a clear sign that it is time to do something about your UPS.  The longer you wait, the more money you waste on inefficiency and the higher chance that your data center will experience downtime due to a power interruption.  By anticipating current needs, as well as anticipating the desire to scale to requirements in the future, you can choose the right modern UPS system for your unique data center needs. Making the transition to a new UPS can dramatically improve efficiency and the investment will more than pay for itself over time.



Posted in Back-up Power Industry, data center equipment, Data Center Infrastructure Management, data center maintenance, DCIM, Facility Maintenance, Power Management, Uninterruptible Power Supply, UPS Maintenance | Tagged , , , , , | Comments Off

Hyperconvergence Solutions in the Data Center

hyperconvergence_image3Hyperconvergence is not just a buzzword, it is the future of data center operations.  First there were converged data centers and now there are hyperconverged data center infrastructures that address silos that previously posed a significant problem. With hyperconverged data centers, storage, compute, and network components are optimized for better collaboration that bypasses silos.  Scaling infrastructure as needed is far easier to achieve with a hyperconverged data center.  For this reason, data centers with their eyes on future needs and scalability must be focused on hyperconvergence.


What is Hyperconvergence

Because technology and needs are constantly changing, data centers traditionally have a mixture of components and infrastructure.  Though each component of infrastructure served its purpose, the components did not communicate well with each other, silos were created, and data center operations became more complex and over-wrought than necessary.  BizTech expands on how hyperconvergence is changing data centers for the better, “ Gartner reports that by 2018 hyperconverged integrated systems will represent as much as 35 percent of total converged infrastructure shipments by revenue, up from a low-single-digit base in 2015…Hyperconverged solutions are clusters that typically leverage commodity components to support software-defined environments and related functions,” King observes. “Hyperconvergence delivers a radical simplification of the IT infrastructure,” says Jeff Ready, chief executive officer of Scale Computing, a virtualization and convergence provider. The approach consolidates all required functionality into a single infrastructure stack running on an efficient, elastic pool of processing resources, leading to a data center that is largely software-defined with tightly-integrated computing, storage, networking and virtualization resources. Hyperconvergence stands in contrast to a traditional converged infrastructure, where each of these resources is typically handled by a discrete component that serves only a single purpose. “Hyperconvergence takes the headache out of managing the infrastructure so that IT can focus on running the applications,” Ready says.”

For data center managers, hyperconverged infrastructure may sound like a major change, and maybe even overkill, but the majority of data centers making the switch, it is becoming the industry standard.  And, as soon as data center managers implement hyperconverged infrastructure into their data centers, they quickly realize the resistance was not just futile but that they should have made the switch sooner!  In fact, once hyperconvergence is integrated, configuration and deployment actually reduces headaches and streamlines operations.


Advantages of Hyperconvergences

Network switch and ethernet cables,Data Center Concept.No data centers would be making the change to a hyperconverged infrastructure if it was without advantages.  Fortunately, the advantages are many!  Though the upfront cost may seem significant, the investment will more than pay for itself.  One of the most significant advantages is scalability.  Data centers technology and storage needs are constantly changing and growing.  Many data centers are scrambling just to figure out how to not “outgrow” their current facility and infrastructure.  With the old infrastructure, it is very, very difficult to do so.  With each component or piece of infrastructure added, it may serve as a temporary Band-Aid but make no mistake; it is not a permanent solution.  Hyperconverged infrastructure makes it far easier to scale.  EdTech underscores just how beneficial it is to make it easy for data centers to scale storage and infrastructure, “For many institutions, a chief benefit of hyperconverged solutions is the high degree of scalability they offer, a benefit that has led some tech observers to compare hyperconvergence with the public cloud. “We can add nodes to the initial cluster and leverage all of those resources across the cluster,” Ferguson says. “I can add another node and add more storage. Or, if I really want a compute-focused piece, I can add a very compute-centric node. You can mix and match all of those.” Western Washington University deployed a Nutanix  hyperconverged solution in 2015. Jon Junell, assistant director of enterprise infrastructure services at WWU, agrees that the ability to easily add capacity is a valuable time-saver. “A few clicks, and it sees the other nodes, and away you go,” he says. ‘The management gets out of the way, so you can go back to doing the true value-add work.’”

In addition to scalability, another advantage is the ability to have everything under one umbrella.  Because everything is in “one box” it means that interoperability is a breeze.  Without hyperconverged infrastructure, components not only have a harder time working together but it dramatically increases the potential that something will break down or go wrong.  With hyperconverged infrastructure, this is not the case.  Having everything streamlined will reduce operating costs and dramatically improve data center management through ease of operation.

Further, a major advantage of hyperconvergence is how easy it is to deploy.  As mentioned, many data center managers may bristle at the implementation of something so significant and new but deployment really is outstandingly easy.  Rather than trying to connect and communicate with multiple subsystems, there are hyperconverged options that are literally “plug and play.”  This means that they come in a single box and can literally be deployed by plugging everything in and going.  And, the plug and play doesn’t stop there.  Adding storage is as simple as plugging more in.  The amount of time and money that would typically take to deploy or expand infrastructure is dramatically decreased and significantly streamlined by hyperconvergence.


Different Hyperconvergence Solutions

Hyperconvergence is being deployed in different ways depending on data center size and specific needs.  There are different hyperconvergence solutions to meet needs and each solution can be used in a variety of ways and then scaled as needed.  Data Center Knowledge takes a closer look at how data centers are using hyperconverged infrastructure, “Today, it’s used primarily to deploy general-purpose workloads, virtual desktop infrastructure, analytics (Hadoop clusters for example), and for remote or branch office workloads. In fewer cases, companies use it to run mission critical applications, server virtualization, or high-performance storage. In yet fewer instances, hyperconverged infrastructure underlies private or hybrid cloud or those agile environments that support rapid software-release cycles.”  Though this may be the case for now, industry experts believe that this will change as more and more data centers adopt hyperconverged infrastructure.

AdobeStock_88603767Though there are similarities between converged data centers and hyperconverged data centers, there are some important differences.  While both converged critical resources and allow data centers to increase density, they are managed in a very different way.  Hyperconverged infrastructure employs one important difference that is a significant game changer for data centers – everything is managed virtually which makes cloud setup easy and reduces the complexity of systems for a more streamlined and efficient management of operations.  No data center hyperconvergence solution is necessarily “better” than another, but there are certain solutions that are better suited to your specific data center. There are a variety of data center hyperconvergence solutions and Data Center Knowledge emphasizes the importance of choosing carefully for your unique needs, “Still, make sure you do your research and know which hyperconverged infrastructure technology you’re deploying. Each hyperconverged infrastructure system is unique and has its benefits and drawbacks. For example, make sure your hyperconverged infrastructure system doesn’t only work with one hypervisor. A converged system that only supports a single hypervisor, such as VMware vSphere, unnecessarily locks an organization into a single vendor and associated polices, including licensing fees and migration complexities. Similarly, if you deploy a hyperconverged infrastructure solution which comes with in-line deduplication and compression enabled (and you can’t turn it off), you need to make sure the workloads you host on that infrastructure can work well in that kind of environment.”

Hyperconvergence is not just a compelling notion.  By all industry expert predictions, it is the future of data centers. A hyperconverged infrastructure is not just something that is beneficial for massive data centers from Yahoo or Apple; hyperconvergence infrastructure works for data centers of all sizes.  Not only does hyperconvergence change how data centers can scale and streamline operations management, but it also increases profit and return on investment by increasing capabilities and decreasing waste.  For small to mid-size organizations, it is a cost-effective and efficient way to leverage IT investments and get the most bang for your buck in the long term.  The way size and volume of data, and how it is used is a clear indication that easy scalability is not a luxury but a necessity.  Hyperconvergence will make growing and changing data needs manageable and improve overall data center operations going forward.



Posted in Cloud Computing, Data Center Design, data center equipment, Data Center Infrastructure Management, data center maintenance, DCIM, Hyper Converged Infrastructure | Tagged , , | Comments Off

Server Room Fire Suppression Best Practices

datacenter45Data centers must delicately balance the need for infrastructure and equipment that runs all day and maximizes uptime with the need to manage heat and fire risk associated with electronic equipment.  This is particularly true in server rooms.  Server rooms are the heart of a data center, the hub of information.  If a server room experiences a disaster of any kind, it typically leads to downtime. Server rooms must have proper air conditioning but that is not enough, they must also have appropriate fire suppression measures in place to reduce the risk of damage, injury, and downtime.  There are many threats to data center operations but perhaps one of the most significant is fire.  Other threats may pose a risk of significant downtime but are likely to only result in moments of downtime.  Fire, on the other hand, can cause permanent damage to equipment, injury to personnel, and prolonged downtime as a result.  When it comes to fire suppression in data centers, negligence to implement suppression is simply unacceptable – a true recipe for disaster.

The statistics surrounding fires in data centers may not sound all that scary at first glance – according to NetworksAsia, only about 6% of infrastructure failures are caused by fires.  That kind of statistic may make you feel comfortable, like you do not really need to worry much about the risk of fire since you take appropriate precautions to ensure that your server rooms are cooled correctly.  But, make no mistake, the risk is real and if it happens to you, it may not just lead to downtime, but to your data center closing its doors.  Data Center Knowledge provides a wakeup call to all data centers about the very real risk of data center fires, “A small data center in Green Bay, Wisconsin was wiped out by a fire earlier this month, leaving a number of local business web sites offline. The March 19 fire destroyed 75 servers, routers and switches in the data center at Camera Corner/Connecting Point, a Green Bay business offering IT services and web site hosting…But it took 10 days to get customer web sites back online, indicating the company had no live backup plan…While the company discussed the usefulness of its fire alarms, it didn’t address whether the data center had a fire suppression system. But it doesn’t sound like it. The Green Bay Press Gazette describes “racks of blackened, melted plastic and steel.” We’ve previously looked at data center fire suppression tools and how they have evolved with the industry’s recent focus on environmental considerations.”

server room fire

Image of Server Room Fire via: Bangkok Post

Fire prevention and fire suppression should be a part of any data center disaster recovery plan.  It is important to consider what types of fire your data center is most at risk of, as well as the size of your data center, to determine the appropriate fire suppression system for your disaster recovery plan.  Your data center’s form of backup and the specific strategies for your disaster recovery plan will heavily influence the type of fire suppression system that you use.  If you have a minimal or “bare bones” disaster recovery plan, you may want the most elaborate and effective fire suppression system because you need it to work as effectively and quickly possible.  If you have a comprehensive disaster recovery plan and robust backup/redundancy, uptime is less dependent on your fire suppression system.  But, in the end, every single server room must have a fire suppression system that is more effective and comprehensive than “calling 911.”

To understand fire suppression needs and make an informed decision when choosing a fire suppression method, it is important that you understand what types of fires can occur in a server room or data center.  TechTarget explains the types of fires data centers are at risk of:

“In North America, there are five fire classes:

  • Class A: Fire with combustible materials as its fuel source, such as wood, cloth, paper, rubber and many plastics
  • Class B: Fire in flammable liquids, oils, greases, tars, oil-base paints, lacquers and flammable gases
  • Class C: Fire that involves electrical equipment
  • Class D: Fire with ignitable metals as its fuel source
  • Class K: Fire with cooking materials such as oil and fat at its fuel source

No matter where your data center is located, fire can be considered a potential disaster. Data center environments are typically at risk to Class A, B or C fires.”

sprinklerThere are two primary types of fire suppression systems: water sprinklers and gaseous agent fire suppression solution.  Water sprinklers are a very traditional type of fire suppression system and they are the most common type.  They are particularly popular because they are low cost, may already exist in the server room in the first place, and they are effective.  Once they have been activated they will continue to expel water until they have been shut off. The main problem with water sprinklers is that they can cause significant damage to the equipment.  With the goal of remaining operational and maximizing uptime while preventing catastrophic fire, dramatic water damage could still lead to downtime.  Additionally, water sprinklers could accidentally become activated and cause unnecessary damage. And, while sprinklers systems are inexpensive, the water damage that they cause is not.

For this reason, many data centers and server rooms implement pre-action water sprinklers.  Pre-action water sprinklers work in a similar way but take extra steps to prevent accidental activation and the ensuing damage.  In traditional water sprinklers, the water is kept in the pipes, right at the nozzle awaiting activation.  With pre-action water sprinklers, the water is not kept in the pipes all the way to the nozzle.  The upside is that it is still a low cost system and traditional water sprinklers can be converted to pre-action systems.  Pre-action systems require two events/alarms to activate the system, rather than one, significantly reducing the risk of accidental activation.

fire suppressionGaseous agent fire suppressant solutions are a newer technology and are more effective in suppressing a wider and more significant range of fires. Gaseous agents are delivered in a similar fashion to water sprinklers – the agent is stored in a gas tank and then piped into overhead nozzles and administered when activated.  This is the preferred method of fire suppression for server rooms because it is more effective at fire suppression when electrical equipment is involved.  The Data Center Journal describes exactly how gaseous agent fire suppressant systems work, “The Inert Gas Fire Suppression System (IGFSS) is comprised of Argon (Ar) or Nitrogen (N) gas or a blend of those gases. Argon is an inert gas, and nitrogen is also unreactive. These gases present no danger to electronics, hardware or human occupants. The systems extinguish a fire by quickly flooding the area to be protected and effectively diluting the oxygen level to about 13–15%. Combustion requires at least 16% oxygen. The reduced oxygen level is still sufficient for personnel to function and safely evacuate the area. Since their debut in the mid 1990s, these systems have proven to be safe for information technology equipment application.”  In essence, they are able to suppress fires while minimizing risk to electronic equipment.  The problem with Halon gaseous agent use is that it is no longer in production due to being a health risk and environmental danger.  But, there are Halon replacement agents available that work in a similar fashion without the risk to health or environment.  Though more effective than sprinkler systems for certain types of fire suppression, and though they carry less risk of damage, they are more expensive and cannot run continuously until shut off.  They will only run as long as the gaseous agent is available. Once the tank is empty – fire suppression will stop.

Server rooms pose the most significant risk of fire in a data center because they typically have the highest concentration of electricity and contain combustible materials.  It is absolutely imperative that, should a fire be sensed, fire suppression begins immediately and alarms sound, alerting personnel that it is time to evacuate and take disaster recovery action.  A server room is, at its core, the heart of a company’s information structure.  If the server room experiences a fire, downtime is highly likely.  But, if suppression methods are effectively and efficiently activated, downtime and damage maybe avoidable.

Posted in Data Center Build, Data Center Construction, data center cooling, Data Center Design, data center equipment, Data Center Infrastructure Management, Data Center Security, DCIM, Facility Maintenance | Tagged , , , | Comments Off

Flywheel vs. Battery UPS

flywheel vs. Battery UPS imageEvery data center utilizes a UPS – Uninterruptible Power Supply – to ensure that power is always available, even in there is a power interruption.  Minimizing downtime while maximizing energy efficiency is a primary goal of any data center or enterprise which is why choosing the right UPS is so important.  The UPS begins supplying power immediately upon sensing that the primary power source has stopped functioning.  This is important because it maximizes uptime which helps prevent frustration and financial loss, as well as prevents the loss of data.  The UPS stores power and sits in waiting until it is needed but it requires things like maintenance and testing to ensure it is ready to be used when needed.  There are two primary types of UPS: Flywheel and Battery and there are pros and cons to each that a data center must carefully weigh.

A flywheel UPS (or sometimes referred to as a “rotary” UPS) is an older type of UPS but is still a viable option for modern data centers.  Flywheel UPS and battery UPS provide the same essential function, but the way that function is achieved, the way energy is stored, is different.  Flywheel batteries store kinetic energy that remains waiting for when it is needed.  Flywheel systems pack a large energy density in a small package.

Flywheel UPS systems tend to be significantly smaller than battery UPS systems.  This can be an advantage when data center square footage is a premium.  Further, Flywheel UPS systems are easier to store – they do not need as much ventilation, require less maintenance, and do not need special disposal arrangements to be made when their lifespan is complete. Flywheel UPS systems can literally last decades with a minimal amount of maintenance which is a stark contrast to battery UPS systems.

batteryOne of the most significant drawbacks of a flywheel UPS system is its power output capacity when compared with battery UPS systems.  TechTarget explains this key difference, “The UPS reserve energy source must support the UPS output load, while UPS input power is unavailable or substandard. This situation normally occurs after the electrical utility has failed and before the standby power system is online. As you determine whether flywheels are appropriate for a project, the amount of time that the reserve energy must supply the UPS output is key. For comparable installed cost, a flywheel will provide about 15 seconds of reserve energy at full UPS output load, while a storage battery will provide at least 10 minutes. Given 15 seconds of flywheel reserve energy, the UPS capacity must be limited to what one standby generator can supply.”  Though flywheels cannot deliver the same length of power output that battery UPS systems can, multiple parallel flywheels can be installed so that they all supply backup power in the event that they are needed.

Something important to consider is the type of data center.  If your data center is part of a larger network of data centers then if power failure occurs, another data center could take over the data load and support your data center for a short time until you are back online.  Many data centers are employing this network structure as a better means of maximizing uptime and efficiency.  If this is the case, something like a flywheel UPS system may be ideal because you do not need a prolonged power supply in the event of an emergency.  A shorter UPS runtime is all that is needed.  But, make no mistake; many data center managers still want the maximum amount of time possible when it comes to UPS capacity.  Further, some data centers are opting for a hybrid UPS system that employs both battery and flywheel. While the initial investment in a hybrid UPS system may be more, it should pay for itself in a matter of a few years.

Another important consideration is energy efficiency since many data centers are trying to become more “green.”  Though flywheel UPS systems are often thought of as the green option, Schneider Electric points out that this common assumption may be incorrect, “The results may come as a surprise to many. In almost all cases, VRLA batteries had a lower overall carbon footprint, primarily because the energy consumed to operate the flywheel over its lifetime is greater than that of the equivalent VRLA battery solution, and the carbon emissions from this energy outweighs any carbon emissions savings in raw materials or cooling. Of course, the tool lets users conduct their own comparison to see for themselves. This analysis and tool are a good reminder that decisions around energy storage needs to factor in a number of variables.”  It is more apparent than ever before that ever data center must evaluate their unique, individual needs, as well as their energy goals and uptime goals when choosing which type of UPS system is best.

A battery UPS system supplies electrical power through a chemical reaction that happens within the battery, unlike a flywheel system that uses kinetic energy.  Battery UPS systems are often favored by data centers because they can provide a much longer supply of power than a flywheel UPS.  The exact length of time available will depend heavily on the battery’s age, how well it has been maintained, etc. but for reference, a battery UPS may be able to provide 5+ minutes of power (and sometimes much more depending on a variety of factors as mentioned above) vs. a flywheel UPS that may only be able to provide less than a minute of backup power.

data center maintenanceThough a battery UPS provides longer power supply when it is needed, it is not without its drawbacks.  UPS batteries must be routinely maintained.  This includes visual inspection, ensuring adequate cooling and ventilation, cleaning and more to ensure that they will work properly in the event that they are needed.  Additionally, UPS batteries have a shorter lifespan than flywheel UPS systems.  This is because the chemicals within the batteries diminish over time and ultimately lead to battery failure.  For this reason, UPS batteries must be not just routinely maintained but frequently checked to ensure that they are still working and capable of supplying power.

Further, a UPS battery has a limited number of discharge cycles.  Though it can recharge, if it is frequently discharged and then recharged, it will diminish its “expected” capacity and lifespan over time.  For flywheel UPS systems, this is not a problem (though it should be noted that flywheels can only discharge a limited number of times in a short time frame, but multiple discharges over a long period of time is not problematic).  Additionally, UPS batteries contain hazardous materials that must be safely and correctly disposed of when no longer needed.  This means that UPS batteries require special disposal methods that flywheel UPS systems do not require.

As we discussed earlier, because there are advantages and drawbacks to both flywheel and battery UPS systems, many data centers are opting for a hybrid approach.  Data Center Knowledge explains the advantages of having a hybrid system that employs the use of both flywheel and battery power, “According to Kiehn, while the general trend is toward lower-cost systems with shorter runtimes, the size of the market that still wants 5 minutes or more shouldn’t be underestimated. “A lot of customers are still asking for 5 minutes,” he said. They include colocation providers, financial services companies, as well as some enterprises…There are also reliability and TCO benefits to having both flywheel and batteries in the data center power backup chain. When utility power drops, the flywheel will react first and in most cases will never transfer the load to batteries, since the flywheel’s runtime is enough for a typical generator set to kick into gear, Anderson Hungria, senior UPS product manager at Active Power, explained. Because the batteries are rarely used, initial and replacement battery costs are lower. Theoretically, it may also extend the life of the battery, but the vendor has not yet tested for that. As two alternative energy storage solutions, the flywheel and the batteries act as backup for each other, making the overall system more reliable.”

In the technology world, processes and products that are the “old” way of doing things tend to go away quickly in favor of the latest and greatest advancements.  But, when it comes to flywheel UPS systems, they are getting a new life, particularly in the form of hybrid UPS systems.  Flywheels are not an alternative to UPS batteries when it comes to energy efficiency or length of power supply – but that does not mean they are not a viable option for many data centers.  Depending on unique data center needs, they should be considered both from a standalone perspective or as part of a hybrid UPS system to ensure better backup power supply that maximizes uptime and efficiency.


Posted in Back-up Power Industry, computer room maintenance, Data Center Battery, data center equipment, Data Center Infrastructure Management, data center maintenance, DCIM, Facility Maintenance, Power Management, Uninterruptible Power Supply, UPS Maintenance | Tagged , , , , , , , , | Comments Off

Proper Maintenance and Service of UPS System is Critical to Preventing Failure

UPS Maintenance Image-withlogo

There are few things more important to a data center than continuous power.  Without it, a data center will experience prolonged downtime, significant financial loss, a damaged reputation and other damaging effects.  It is for this reason that data centers focus a lot of their time and energy on power redundancy and ensuring that there is a properly functioning uninterruptible power supply (UPS).  A UPS will sit waiting and, should it be needed due to a power failure, will supply necessary power to keep data center infrastructure up and running.  There are a variety of UPS sizes to accommodate assorted power loads and many data centers implement multiple UPS systems to ensure they are protecting against downtime.  It is important that a UPS be prepared to function at a moment’s notice so that there is not significant loss of data.  The problem is, many data centers experience UPS failure and, the majority of times a UPS fails, it is due to lack of proper maintenance and servicing.

A power failure can occur for a variety of reasons – power outage, power surge, power sag and more.  Whatever causes a power fluctuation or outage, even a few moments of downtime can bring with it severe costs.  Should any power fluctuation or outage occur, a UPS will pick up right where the power supply left off, eliminating downtime, data loss, and damage to infrastructure.  A UPS is often thought of as a “dependable” power supply in case of emergency but, if it is not properly maintained and serviced, it may not be particularly dependable.

To be able to determine how to best maintain your data center UPS system, you must first understand why UPS systems fail from time to time.  Just like that 10 year old battery in your junk drawer may not have very much life left in it, UPS batteries diminish over time.  Even if you have not needed to use your UPS, the battery that powers it will lose capacity over time and not have as much life as originally intended.  UPS battery deterioration is often further expedited because of the often high temperatures inside data centers.  Fans occasionally fail because certain components such as ball bearings dry out or fans lose power from continuous use.  Additionally, power surges such as those caused by lightning or other transient spike can diminish a UPS battery.  Dust accumulation on UPS components can diminish UPS efficacy.  Further, the UPS battery discharge cycle (how many times the battery has been discharged and recharged) will shorten the overall life of a UPS battery.  A typical 3-phase UPS has an average lifespan of 10 years and without proper maintenance it could be much shorter.

batteryIf you think you are doing enough by occasionally checking your UPS battery, you may be leaving your data center exposed to an outage and downtime.  Government Technology explains just how many data centers are experiencing downtime due to UPS failure and preventable human errors, “Data center outages remain common and three major factors — uninterruptable power supply (UPS) battery failure, human error and exceeding UPS capacity — are the root causes, according to a new study released earlier this month. Study of Data Center Outages, released by the Ponemon Institute on Sept. 10, and sponsored by Emerson Network Power, revealed that 91 percent of respondents experienced an unplanned data center outage within the last 24 months, a slight dip from the 2010 survey results, when 95 percent of respondents had reported an outage…Fifty-five percent of the survey’s respondents claimed that UPS battery failure was the top root cause for data center outages, while 48 percent felt human error was the root cause.”  By correcting human error and properly maintaining your UPS system, you can dramatically decrease your data center’s risk of downtime.

To prevent UPS failure, it is imperative that you regularly maintain and service your UPS as part of your Data Center Infrastructure Management (DCIM) plan.  There are a few key components of proper UPS maintenance and service but physical inspection is at the core.  If you are not physically checking on your UPS system on a regular basis, there is no way to know if there is something visibly wrong or problematic that could lead to a failure.  The best thing you can do is create a UPS maintenance and service checklist and keep a detailed log of all maintenance and service to ensure that maintenance does not fall behind. Your checklist should include checking the UPS battery including testing it to ensure it is working, the UPS capacitors, the ambient temperature around the UPS, calibration of equipment, performing any service that might be required (check air filters, clean and remove dust), verify load share and make any necessary adjustments, and more.

If UPS battery failure is one of the most common causes of UPS failure and thus downtime, it is only logical that this should be one of the most important parts of your UPS maintenance checklist.  Battery discharge should be routinely checked to ensure that it is not diminished and incapable of handling the necessary power load in the event of a failure.  It is also important to visually inspect the area around the UPS and the battery itself for any obvious obstructions, dust collection or other things that may prevent adequate cooling.  If you are seeing a warning that the battery is near discharge perform necessary maintenance.  Further, the AC input filter capacitors should be checked, along with the DC filter capacitors and AC output capacitors for open fuses, swelling or leakage.  Next should you visually inspect all components for any obvious problems.  Inspect the major assemblies, wiring, circuit breakers, contacts, switch gear components, and more.  Should you see obvious damage, perform necessary maintenance and service.

Next, because data centers operate at a high temperature due to the energy output of the infrastructure, it is important to check the ambient temperature around the UPS system because a high temperature can diminish the battery capacity.  Schneider Electric explains best practices for maintaining ambient temperature around UPS for maximum battery life, “It is recommended that the UPS be installed in a temperature controlled environment similar to the intended application.  The UPS should not be placed near open windows or areas that contain high amounts of moisture; and the environment should be free of excessive dust and corrosive fumes.  Do not operate the UPS where the temperature and humidity are outside the specified limits.  The ventilation openings at the front, side or rear of the unit must not be blocked… All batteries have a rated capacity which is determined based on specified conditions.  The rated capacity of a UPS battery is based on an ambient temperature of 25°C (77°F).  Operating the UPS under these conditions will maximize the life of the UPS and result in optimal performance.  While a UPS will continue to operate in varying temperatures, it is important to note that this will likely result in diminishing the performance and lifespan of your battery.  A general rule to remember is that for every 8.3°C (15°F) above the ambient temperature of 25°C (77°F), the life of the battery will be reduced by 50 percent.  Therefore, keeping a UPS at a comfortable temperature is crucial to maximizing UPS life and capabilities.”

ups-158315_1280Visual inspection should include dust and dirt removal on the UPS system.  UPS system will sit and accumulate dust over time but dust could interfere with proper heat transfer so dust should be promptly removed to ensure the UPS system will function properly when needed.  Further, check all air filters for dust accumulation.  Dust accumulation on filters could lead to inefficiency and even overheating.  Clean and replace filters as needed to properly maintain your UPS.  Capacitors are also an integral component of UPS systems.  Capacitors aid in the transition of power in the event of an outage so if they fail, the UPS will likely fail.  Capacitors need to be routinely checked because they will dry out from wear and tear so they need to be replaced every few years to ensure proper UPS function.

Though much of the suggested UPS maintenance and service strategy may sound basic, even obvious, the fact of the matter is that UPS failure continually remains a primary source of data center downtime.  And, when you couple that with human error, it is easy to see that many data centers simply are not properly maintaining their UPS systems to prevent failure.  All of these tasks do not need to be completed every day or even every week, certain tasks can be performed weekly while others can be monthly, quarterly, semi-annually, and annually.  By breaking it up you ensure that your UPS system is being frequently and routinely checked while making routine maintenance a far more achievable task.  Additionally, by maintaining a detailed log you can see if UPS maintenance and service has fallen behind and immediately address any concerns.  When data center technicians routinely check the UPS system, they will become familiar with what looks normal and what looks concerning so that, should anything look problematic, it can be addressed and remedied immediately for peace of mind that your UPS will be there when you need it and prevent costly downtime.

Posted in computer room maintenance, data center equipment, Data Center Infrastructure Management, data center maintenance, DCIM, Facility Maintenance, Uninterruptible Power Supply, UPS Maintenance | Tagged , , , , , , , , | Comments Off

The Convergence of IT & OT in Data Centers

IT and OT – though they are two different things, the previous tendency to “divide and conquer” when it came to strategy, management and solutions is going away.  When it comes to IT and OT, their worlds are colliding inside data centers.  Operating as two separate entities without communication and collaboration is not effective, efficient or ideal. Though not all data centers are operating with IT/OT convergence, the transition has begun – IT/OT convergence is already happening in healthcare, energy, aviation, manufacturing, transportation, defense, mining, oil and gas, utilities, natural resources sectors, and more – and it is only a matter of time until it is simply the data center industry standard.

IT_OT Convergence ImageOT (operational technology) has a few primary focuses – maximizing uptime, ensuring the proper function of equipment and infrastructure, and the security and availability of operational assets and processes.  OT is a blend of both hardware and software so that environmental maintenance can occur.  Though some are not even familiar with the name “OT,” OT is essential to the day-to-day operations of a data center. The convergence of IT and OT is happening because the specific technology involved in operational technology (such as communications, software and security) is evolving and OT is integrating more information technology (IT) into their operations.

IT focuses on the use and integrity of data and intellectual property.  Its focus is on things like storage, networking devices, computers, and infrastructure that facilitate improved information storing and security. In contrast to OT, IT (information technology)’s security focus is the protection and preservation of confidential information.  Though they are two different things, they are not mutually exclusive and what data centers are finding is that there is more than just overlap, a convergence is happening.  Schneider Electric elaborates on why IT and OT worlds are colliding, “Security systems are needed to protect facilities. IT is needed to run security systems. Apply a bit of basic math theory to these statements, and it is easy to conclude that IT is then needed to protect facilities. If you are thinking this sounds like OT and IT convergence, you’re right; but security requirements push the boundaries even further to compel departmental collaboration between OT and IT. At the core, lies the need for reliable delivery of clean and continuous power.”

To maintain uptime and maximize security, IT and OT must work together. Think about factors that could lead to downtime or a security breach – problems with infrastructure management, equipment overheating, fire, flood, problems with lighting, problems with the security system, a physical breach of security, a cyber-attack, and more.  Many of these things fall under the OT umbrella but some fall under the IT umbrella.  And, in reality – managing and mitigating them involves both IT and OT.  In order to properly remote-manage a data center, and maintain RTOI (real-time operational intelligence), a proper DCIM must be in place and IT must be able to communicate with monitoring systems so that proper and accurate information is received.  As we have previously discussed, when this information is received in real time, downtime can be significantly reduced.  TechTarget elaborates on why IT and OT are converging in the way that they are now, and how it will improve efficiency and maximize data center operations, “While IT inherently covers communications as a part of its information scope, OT has not traditionally been networked technology. Many devices for monitoring or adjustment were not computerized and those with compute resources generally used closed, proprietary protocols and programmable logic controllers (PLC) rather than technologies that afford full computer control. The systems involved often relied on air gapping for security. Increasingly, sensors and connected systems like wireless sensor and actuator networks (WSANs) are being integrated into the management of industrial environments, such as those for water treatment, electric power and factories. The integration of automation, communications and networking in industrial environments is an integral part of the growing Internet of Things (IOT). IT/OT convergence enables more direct control and more complete monitoring, with easier analysis of data from these complex systems from anywhere in the world.”

AdobeStock_93793795When you integrate infrastructure management systems, your data center information will be able to flow between departments with ease.  Data from IT can and should be an indispensable tool in providing the information OT needs to formulate strategy and make decisions. The result will be increased productivity, improved efficiency, decreased downtime, and enhanced security.  With integration, knowing what your data center needs will be timely and accurate, making effective maintenance far easier.  Your RTOI will be accurate so, should you need to make a quick adjustment – whether large or small – you will hopefully know before you experience any problems or catastrophic events.

So, it seems like a simple solution, right? And, clearly based on the advantages of working together any data center would jump all over it?  Though IT/OT convergence are certainly the future of data centers, it is not necessarily an easy task to bring the two together.  GE elaborates on the challenges of IT/OT convergence, “Many cultural and technological impediments make IT/OT convergence challenging. From the perspective of culture, IT and OT have traditionally been well-separated domains. When smart assets and infrastructure are introduced, it becomes necessary to figure out new ways to divide ownership and responsibility for the management and maintenance of that infrastructure. This can potentially lead to turf wars and blame games. On top of that, OT standards have generally been proprietary and vendor specific, optimized exclusively for specialized tasks. Unifying IT and OT requires implementing well-defined standards that scale all the way from assets to data centers and back. These standards also need to account for enhanced security, since operational assets that were previously disconnected from widespread communication networks could now be vulnerable. It’s all about the enterprise. All that daunting work can be made easier, however, by the concept of “enterprise architecture.” Enterprise architecture is a top-down methodology for developing architecture by focusing first on organizational goals, strategy, vision, and business before delving into the technological specifics. This approach could keep IT/OT deployment aligned with achieving Industrial Internet goals. Going through the process of integrating IT and OT might require some initial effort, but the payoffs are worth it.”

With any changes in data centers, there are growing pains.  Logistical intricacies to fine tune.  Security challenges.  There will always be a list of challenges in implementing change.  But, the convergence of information technology and operational technology is a value-added change.  The specific values will vary amongst industries but, make no mistake, convergence will have a payoff. Though there will be challenges in converging IT and OT, success is very achievable with thorough planning, proper execution and full implementation of an IT/OT strategy.  All data center team members must be fully educated and on board to be properly prepared for the change.  Make no mistake; IT and OT are not the same.  Though they are converging they are different and separate yet, joint structures.  If a harmony and alignment of strategies can be found, IT and OT convergence can be a stunning success.

By converging IT and OT, there will be similar technology and this overlap of sorts will allow the two to work together synergistically.  This will be beneficial in a variety of ways but one of the most prominent ways is that it will be cost-saving. Not only because costly downtime will be reduced but because IT and OT teams can, in some ways, be combined and redundant team members pruned for efficiency. In addition to this, convergence will provide risk reduction because there will be an overlap of security issues and those issues will be able to be simultaneously addressed.  And, perhaps most significantly, data centers will enjoy enhance performance from IT/OT integration.  Bad redundancies (such as similar but separate operations that could be under one umbrella) and good redundancies (such as finding ways in which IT and OT can synergistically work together) enhanced.  Further, convergence will improve performance in the form of enhanced system availability. Better performance that will mean more uptime because of a reduced risk of things like cyber-attack, poor infrastructure management, power failure and more.  Through a collaborative effort, a focus on future technologies, a drive toward maximizing uptime and minimizing security risk, and a desire for improved efficiency, data centers will successfully achieve IT/OT convergence and step into the future of data centers.








Posted in Data Center Design, data center equipment, Data Center Infrastructure Management, data center maintenance, Data Center Security, Datacenter Design, DCIM, Hyper Converged Infrastructure, Power Management | Tagged , , , , , , | Comments Off

Data Center RTOI

Technology is evolving minute by minute and data centers must work to keep up with the lightening-paced evolution.  We have discussed the Internet of Things (IOT) before – the world is becoming increasingly dependent on the internet and every day processes are becoming digitized for efficiency and savings.  But, as more and more of the world becomes digitized, technology advances and data grows, and that data must be effectively and efficiently stored.  Data centers make investments in infrastructure, backup power, security and more so that they can adequately store that growing and evolving data but when things move so quickly, constant monitoring must be happening to ensure that data is not just stored properly but safely and efficiently.  Old methods of collecting and analyzing data are archaic and simply not practical.  Analyzing what went wrong after the fact, or realizing something is about to go wrong when there is not enough time to fix the problem is useless.  And, ultimately, these traditional methods are responsible for a lot of downtime in data centers.  Accurate, actionable information in real time is the only way data centers can effectively operate moving forward.

rtoi imageData centers are notoriously energy-inefficient but most data centers today are making efforts to improve and be more energy efficient.  The undertaking is not simple or straightforward because every data center is different and has unique needs.  Data centers cannot run at capacity because, should capacity change, data centers will be ill-equipped.  But, at the same time, data centers should not run way beyond what is necessary because that is a waste of energy.  More and more data centers managers are realizing the need for Real Time Operational Intelligence (RTOI).  Having access to current, accurate information is the only way to make intelligent and informed decisions about how to best manage the infrastructure of a data center.  What does RTOI look like in a data center? TechTarget provides a brief explanation of what RTOI is in a practical sense, “Real-time operational intelligence (RtOI) is an emerging discipline that allows businesses to intelligently transform vast amounts of operational data into actionable information that is accessible anywhere, anytime and across many devices, including tablets and smartphones. RtOI products turn immense amounts of raw data into simple, actionable knowledge. RtOI pulls together existing infrastructure to manage all of the data that is being pulling from a variety of sources, such as HMI/SCADA, Lab, Historian, MES, and other servers/systems. It then has the ability to organize and connect the data so it’s meaningful to the users. By integrating directly with existing systems and workflows, it can help assets perform better and help workers share more information.”  As more and more people, businesses and data centers are utilizing the cloud, and the cloud’s complexity continues to change, data management needs change and data centers struggle just to keep pace.

RTOI can greatly reduce waste and improve energy efficiency by helping identify what is in use and what is not so that things can be turned off strategically for energy savings.  Just think of all of the infrastructure in a data center that is consuming power even though it is not mission critical or in even in use. For example, determining which servers are in use and which servers can be, at least temporarily, powered down will yield significant energy savings.

One of the most significant advantages of well-executed RTOI is immediate knowledge of potential threats and the ability to deal with them before they cause downtime.  As we have often discussed, downtime is incredibly costly (costing, on average, thousands of dollars per minute).  No data center wants to experience downtime but, unfortunately, the vast majority will face it at one point or another.  Data centers can significantly reduce their risk of downtime with current, accurate, actionable information about what is happening in the data center.  As we have seen, anticipation of problems can only go so far.  Data centers simply cannot properly manage what they do not see or have knowledge about.  That is where RTOI comes in.

RTOI not only aggregates data but it measures it, tracks it, and, if well-executed, puts it easy-to-understand terms and statistics so that you can use the information to make informed decisions as well as to properly manage assets going forward.  RTOI can assist data centers in improving capacity planning, anticipating asset lifecycle and properly planning management, maintain and continuously meet regulatory compliance, optimize energy efficiency and more.

DCIM_RTOI_imagePlanning for data center capacity is far easier at the building stage but, once a data center has been built and is in operation, anticipating capacity needs, particularly as new technology means big data storage, is very challenging.  In fact, it is one of the biggest challenges data centers face today.  Panduit explains why capacity management is such a challenge in data centers, “Proactive capacity management ensures optimal availability of four critical data center resources: rack space, power, cooling and network connectivity. All four of these must be in balance for the data center to function most efficiently in terms of operations, resources and associated costs. Putting in place a holistic capacity plan prior to building a data center is a best practice that goes far to ensure optimal operations. Unfortunately, once the data center is in operation, it is all too common for it to fall out of balance over time due to organic growth and ad hoc decisions on factors like power, cooling or network management, or equipment selection and placement. The result is inefficiency and in the worstcase scenario, data center downtime. For example, carrying out asset moves, adds and changes (MACs) without full insight into the impact of asset power consumption, heat dissipation and network connectivity changes can create an imbalance that can seriously compromise the data center’s overall resilience and, in turn, its stability and uptime…Leveraging real-time infrastructure data and analytics provided by DCIM software helps maximize capacity utilization (whether for a greenfield or existing data center) and reduce fragmentation, saving the cost of retrofitting a data center or building a new one. Automating data collection via sensors and instrumentation throughout the data center generates positive return on investment (ROI) when combined with DCIM software to yield insights for better decision making.”

With accurate information in real time you can manage capacity needs and, in a moment’s notice, add capacity so that there are no problems.  Additionally, that kind of historical information is useful for predicting the need for data center expansion going forward.  For example, data centers often have orphan servers that are sitting doing nothing but collecting dust and sucking up resources like cooling and power.  Without careful and accurate management, these orphan servers could sit like this for weeks, months or even years, wasting resources that could be better allocated.  With real-time statistics about what exactly is going on in your data center, you can find these orphan servers and clean them out, freeing up capacity for other infrastructure.  In fact, carefully managing your data center’s capacity needs and more accurately anticipating future needs can mean saving millions of dollars in the long run.

DCIM and RTOI go hand in hand.  Without a proper plan for data center infrastructure management, and sophisticated monitoring software, RTOI is not achievable.  DCIM tools are necessary to measure, monitor, and manage data center operations including energy consumptions and all IT equipment as well as the facility infrastructure.  Fortunately, there are sophisticated DCIM software products available that will track sophisticated information all the way down to the rack level so that monitoring is made easy, even remotely. As mentioned, it is critical to leave behind old and archaic forms of DCIM, there is simply no way to really keep up.  Data centers, regardless of size, must focus on real-time operational intelligence as a means of accuracy. TechTarget explains why it is critical to focus on RTOI as a way of staying ahead potential problems, “Taking a new big data approach to IT analytics can provide insights not readily achievable with traditional monitoring and management tools, Volk said…For example, particularly with cloud resources, it can be difficult to anticipate how applications and data movement will affect each other. Cloud Physics allows cross-checking of logs and other indicators in real time to achieve that. This new approach is “leading edge, not bleeding edge,” Volk said. Its value to an organization will depend on the maturity and complexity of a given data center. Small and medium-sized businesses and organizations without much complexity will benefit, he said, “but companies with large and heterogeneous data centers will benefit even more.”  RTOI helps data centers provide better service to their customers, minimize downtime, improve efficiency, maximize reputation, and ultimately, save money through vastly improved operations.

Posted in Cloud Computing, data center equipment, Data Center Infrastructure Management, DCIM, Internet of Things, Mission Critical Industry | Tagged , , , , | Comments Off

Data Center Business Continuity

titan-power-business-continuity-infographicWhether you operate a data center or any other business, business continuity is incredibly important.  We all think we are immune to disaster but the reality is, if you have not formed a business continuity plan for disasters, you are leaving your data center at severe risk.  Imagine what it would be like if a disaster struck (flood, fire, etc.) and you could not get into your data center for a few hours – problematic, right?  What if that disaster was really bad and you could not get into your data center for a few days or weeks – huge problem. Business cannot come to a screaming halt so a strategy for maintaining business continuity is a must. A strategically formed, well-thought-through business continuity plan should be a part of any data center’s disaster recovery program.  A disaster recovery plan will be the big umbrella under which we will talk about business continuity because the two are inextricably related.  This is because disaster recovery focuses heavily on data recovery and management but, beyond maintaining and protecting data in the event of a disaster, a data center business and the businesses it serves must be able to continue to meet its most basic objectives.  During a disaster a data center may experience downtime in which all business operations come to a halt.  This is not a small problem – downtime may cost as much as $7,900 per minute.  A disaster recovery plan, along with a business continuity plan, will help a data center reduce downtime in the event of a disaster as well as operate continuously to meet business objectives.

To formulate a business continuity plan we must first outline what makes a successful business continuity plan.  A data center’s business continuity plan will function as a roadmap.  If a disaster strikes, you will hopefully be able to find the type of disaster in your business continuity plan and then begin following the “map” to get to the solution and restore your data center to business as usual. First and foremost, a proper business continuity plan will focus on what can be done to prevent disasters so that business continuity is never interrupted in the first place. Data centers must consider what their unique needs are because there is no such thing as a generic data center business continuity plan – it would never work.  Data centers must identify and asses all mission critical assets and risks.  Once they have been identified it will be far easier to formulate a business continuity plan with specific goals in mind.  You can prioritize your most problematic risks by focusing on the risk they pose to mission critical assets. In considering individual needs it is imperative that data centers determine what applications and processes are mission critical. For example, you’re your mission critical systems be maintained remotely? Additionally, in today’s data center world where security is a top concern, maintaining data security should be an important part of your business continuity plan.

Disaster prevention is a central part of your data center’s business continuity plan.  Identifying business continuity goals and potential problem areas will help you lay out a proper disaster prevention plan.  Depending on your unique data center, certain measures may be beneficial such as increased inspections of infrastructure, better surveillance, enhanced security in various areas including data centers grounds security and rack-based security, increased redundancy, and more.  Think in terms of real problems and real consequences; be specific so that you can make specific business continuity plans and strategies.

Some data centers may want to relocate their data center if a disaster is incredibly large but the logistics of this are far from simple.  Relocating for a disaster safely, rapidly, and securely is no simple task.  And, beyond that, it is expensive which is why many data centers – even large enterprise data centers – do not do this.  To do this properly as part of a business continuity plan, a detailed data center migration plan must accompany the business continuity plan.  Some enterprises may want to utilize regionally diverse data centers that mirror each other but this is also expensive and exceptionally complex to implement – though it can be very effective at maintaining uptime, maximizing security, and optimizing business continuity.

As mentioned, redundancy is an important part of maximizing uptime and maintaining business continuity in a data center. As part of your data center’s business continuity plan, you may want to implement load balancing and link load balancing.  Server load balancing and link load balancing are two strategies that may be used to help prevent the loss of data from an overload or outage in a data center. Continuity Central Archive explains how these two strategies can be used in data centers, “Server load balancing ensures application availability, facilitates tighter application integration, and intelligently and adaptively load balances user traffic based on a suite of application metrics and health checks. It also load balances IPS/IDS devices and composite IP-based applications, and distributes HTTP(S) traffic based on headers and SSL certificate fields. The primary function of server load balancing is to provide availability for applications running within traditional data centers, public cloud infrastructure or a private cloud. Should a server or other networking device become over-utilized or cease to function properly, the server load balancer redistributes traffic to healthy systems based on IT-defined parameters to ensure a seamless experience for end users…Link load balancing addresses WAN reliability by directing traffic to the best performing links. Should one link become inaccessible due to a bottleneck or outage, the ADC takes that link out of service, automatically directing traffic to other functioning links. Where server load balancing provides availability and business continuity for applications and infrastructure running within the data center, link load balancing ensures uninterrupted connectivity from the data center to the Internet and telecommunications networks. Link load balancing may be used to send traffic over whichever link or links prove to be most cost-effective for a given time period. What’s more, link load balancing may be used to direct select user groups and applications to specific links to ensure bandwidth and availability for business critical functions.”

Cloud computing flowchart with businessmanData centers are also utilizing the cloud for their business continuity plans because it is cost-efficient and highly effective.  The cloud platform is exceptionally effective for business continuity, particularly as data centers move more and more towards virtualization.  A cloud service with proper SLA (service level agreement) can ensure that data will be continuously saved and protected even in the event of a disaster.  This is where identifying mission critical applications and information are important.  The entirety of the data center’s workload does not need to be recovered in an instant, only that which has been determined mission critical.

In addition to the cloud, many data centers opt to implement image-based backup for continuity.  Data Center Knowledge provides a helpful description of what image-based backup is and how it can be used uniquely in data centers, “Hybrid, image-based backup is at the core of successful business continuity solutions today. A hybrid solution combines the quick restoration benefits of local backup with the off-site, economic advantages of a cloud resource. Data is first copied and stored on a local device, so that enterprises can do fast and easy restores from that device. At the same time, the data is replicated in the cloud, creating off-site copies that don’t have to be moved physically. Channel partners are also helping enterprises make a critical shift from file-based backup to image-based. With file-based backup, the IT team chooses which files to back up, and only those files are saved. If the team overlooks an essential file and a disaster occurs, that file is gone. With image-based backup, the enterprise can capture an image of the data in its environment. You can get exact replications of what is stored on a server — including the operating system, configurations and settings, and preferences. Make sure to look for a solution that automatically saves each image-based backup as a virtual machine disk (VMDK), both in the local device and the cloud. This will ensure a faster virtualization process.”

While not every data center will experience a “major” disaster where they cannot get into their facility for weeks, many data centers will experience some type of disaster.  And, as mentioned, mere minutes can cost tens of thousands of dollars.  Beyond the bottom line, the inability to continuously maintain data center business may damage your reputation irreparably.  An effective business continuity plan is capable of pivoting around both people and processes depending on the specific circumstances.  Rapidly restoring data and operations is the goal and data centers should take that goal and work backwards from there to determine the best path to maintaining business continuity.

Posted in Back-up Power Industry, Cloud Computing, Computer Room Design, DCIM, Uninterruptible Power Supply | Tagged , , , , , | Comments Off

Controlling Rack Access for Data Center Security

AdobeStock_56769671Stringent security protocols are one of the most important aspects of properly running any data center.  With constant, round-the-clock advancements in technology, the focus of security protocols is often on things like cloud/cyber security, particularly because there have been any significant security breaches recently.  Cyber security is certainly important and nothing to ignore, but it is also important to not forget about physical security.  To provide the optimal and industry-acceptable level of security, data centers must provide security on multiple levels.  This will help dramatically reduce the risk of a security breach, allow data centers to remain compliant to certain industry regulations, and will provide peace of mind to customers that everything is being done to protect data integrity.  Ensuring proper physical security compliance will help data centers avoid costly data breaches, and the resulting penalties that may arise as well.

So often, physical security efforts are focused on access to data center grounds and to the facility itself.  These efforts, while valuable and necessary are not where physical security measures should stop.  Once inside the data center facility itself there should not be unrestricted access to server racks.  There are a wide variety of individuals that must pass through a data center on a daily basis, including internal engineers, external engineers, data center staff, cleaning staff and more.  Unfortunately, many data breaches are actually “inside jobs” and therefore security at the rack level is vitally important.

Colocation data centers must be particularly vigilant with rack level security because they often house multiple businesses’ security within the same data center and some of those businesses may even be in competition.  It may sound like there is a simple solution – locked doors or cages for server racks – right?  Unfortunately, wrong.  Traditional locks can only be so complex and if a threat is able to gain access to data center grounds or get inside a facility, they can likely handle those locks.  To meet industry standards and comply with federal regulations, it simply must go beyond that, as Schneider Electric points out, “Further increasing the pressure on those managing IT loads in such locations, regulations concerning the way data is stored and accessed extends beyond cyber credentialing, and into the physical world. In the US, where electronic health records (EHR) have become heavily incentivized, the Healthcare Insurance Portability & Accountability Act (HIPAA) demands safeguards, including “physical measures, policies, and procedures to protect a covered entity’s electronic information systems and related buildings and equipment, from natural and environmental hazards, and unauthorized intrusion.” Similar measures are also demanded, e.g., by the Sarbanes-Oxley Act and Payment Card Industry Data Security Standard (PCI DSS) for finance and credit card encryption IT equipment. In addition to building and room security, it has become vital to control rack-level security so you know who is accessing your IT cabinets and what they’re doing there.”

biometrics-154662_1280For best security, custom rack enclosures can provide peace of mind that they are far harder to access than standard, “off the shelf” enclosures.  Additionally, many data centers are opting for biometric security, pin pads (where codes are changed frequently) or keycards.  Biometric locks do not use traditional keys, rather, they scan things like fingerprints or handprints. Biometric locking systems have grown significantly in popularity because they provide truly unique access.  Keycards can get lost and pin codes can be shared but a fingerprint or handprint cannot be easily shared or duplicated so it is a far more sophisticated security measure. Many worry about the consistency, accuracy and performance of biometric security but it has become incredibly advanced, as Data Center Knowledge notes, “The time taken to verify a fingerprint at the scanner is now down to a second. This is because the templates – which can be updated / polled to / from a centralized server on a regular basis – are maintained locally, and the verification process can take place whether or not a network connection is present. The enrollment process is similarly enhanced with a typical enroll involving three sample fingerprints being taken on a terminal, with the user then able to authenticate themselves from that point onwards. This level of efficiency, cost effectiveness and all round reliability of fingerprint security means that a growing number of clients are now securing their IT resources at the cabinet level and integrating the data feed from the scanner to other forms of security such as video surveillance.”

These electric locks that restrict rack access provide multiple levels of enhanced security.  For example, with electric locks, when a user scans a fingerprint or inputs a code, a central server validates authenticity and then allows or restricts access. An additional advantage of using this method is that the electronic system will automatically generate a log that details who has accessed what, and when.  This electronic tracking is far more convenient, as well as far more accurate, than manual tracking of access.  These electronic systems can be directly connected to data center facility security systems so that, should there be a problem, systems can go into automatic lockdown and alarms can be sounded in an instant.  Also, there are video surveillance options that come along with electronic-based security and monitoring.  Video surveillance can be programmed to turn on when biometric scanning is being performed, when pin codes are being entered, when security cards are being swiped or more.  Additionally, video surveillance can be programmed so that, when someone is accessing a rack it automatically captures an image of who is accessing the rack and sends it to the data center manager.  The data center manager can then choose to watch the surveillance as it happens for an enhanced level of security. This level of security also may reduce the cost and need for a physical security guard, particularly when each rack is monitored by video surveillance. With this sort of security implemented at the rack level, there will be a detailed log of who is accessing what server and when, and should a problem arise, it will be immediately apparent at which server there has been a security breach.  Further, with advanced electrical-based locking systems, they can be pre-set to only allow access at certain times.  For example, if there should never be access “after hours” to certain racks, they can be set to only allow access for pre-determined times.

Another advantage of advanced electronic locking mechanisms is that they can be easily and effectively remotely monitored.  Having on-site security staff is beneficial but is not always possible and, as previously discussed, it is advantageous to have multiple levels of security which is why remote monitoring is important.  Many government and industry regulations now have strict security parameters that data centers must remain in compliance with or face strong penalties.  These security standards are set to help protect secure financial, health and other sensitive information and they require multiple levels of security and that includes rack level security.  To not protect rack level security means that many data centers will not be in compliance – a major (and costly!) problem.

While cost of implementation may seem prohibitive to some, many are now recognizing that the cost of a breach will likely be far higher.  The same level of security used for facility access points should also be used at the rack level when optimizing data center security protocols.  Whether you are retrofitting an existing data center or building a new data center, and whether your data center has 1 rack or 100 racks, they should each be secured separately at the rack level.  Cyber security is a growing and complex arena, easily grabbing the attention of both the customer and the data center facility manager but it is critically important that physical security not be neglected.  In an age where many businesses are foregoing their enterprise data center in favor of colocation, colocation providers must be stringent in their protection of their customer’s data – not just for peace of mind and best practices, but to remain compliant with federal regulations.  If you think you are immune to a data breach, IBM Security’s most recent study will not put you at ease because they found that the global risk for a data breach in the next 24 months at 26 percent.  And, the cost will not be small!  The average consolidated total cost of a data breach is $4 million.  While the cost to implement state-of-the-art rack level security will not be small, it is will continually pay for itself over time and will likely be far less than the cost of a security breach.


Posted in computer room construction, Computer Room Design, Data Center Construction, Data Center Design, data center equipment, Data Center Infrastructure Management, Data Center Security, DCIM | Tagged , , , , | Comments Off

Strategies For Monitoring UPS Batteries & Preventing Failure

Aside from security, maximizing uptime is likely the top priority of just about any data center, regardless of size, industry or any other factors.  Most businesses today run on data and that data is being facilitated by a data center.  Businesses, and their employees and customers, depend on data being available at all times so that business processes are not interrupted.  Every second a data center experiences downtime, their clients experience downtime as well.  Data center managers and personnel are on a constant mission to prevent downtime and they must be vigilant because downtime can occur for a variety of reasons but one has been and remains the #1 threat – UPS battery failure.

UPS (Uninterruptible Power Supply) is the redundant power supply that is supposed to back up a data center in the event of an energy problem such as power failure, or a catastrophic emergency.  Having an uninterruptible power supply is necessary in any size data center because no batteries last forever and, unfortunately, even the most observant and effective data center managers cannot prevent some power failures.  The UPS also contains a battery that will kick in should the primary power source fail so that a data center (and its clients) can experience continuous operation.  Unfortunately, the very thing that is supposed to provide backup power – the UPS – can sometimes fail as well.  Emerson Network Power conducted a 2016 study to determine the cost of and root causes of unplanned data center outages, “The average total cost per minute of an unplanned outage increased from $5,617 in 2010 to $7,908 in 2013 to $8,851 in this report… The average cost of a data center outage rose from $505,502 in 2010 to $690,204 in 2013 to $740,357 in the latest study. This represents a 38 percent increase in the cost of downtime since the first study in 2010…UPS system failure, including UPS and batteries, is the No. 1 cause of unplanned data center outages, accounting for one-quarter of all such events.”


Batteries lose capacity as they age justifying the need for a preventive maintenance program. Image Via: Emerson Network Power

In order to properly for a strategy for UPS failure prevention, it is important to look at why UPS failure occurs in the first place.  At the heart of the UPS system is its battery which powers its operation.  UPS batteries cannot simply be installed and then left alone until an emergency occurs.  Even if a brand-new battery is installed and the UPS system is never needed, the battery has a built-in lifespan and it will, over time, die.  So even if you think you are safe with your UPS system and your unused battery, if you are not keeping an eye on it, you may be in trouble when a power outage occurs.

Beyond basic life-expectancy in ideal conditions, UPS battery effectiveness may be reduced or batteries may fail for other reasons.  Ambient temperatures around the UPS battery, if too warm, may damage the UPS battery.  Another reason a battery may fail is what is called “over-cycling” – when a battery is discharged and recharged so many times that it reduces capacity of the battery over time.  Further, UPS batteries may fail due to incorrect float voltage.  Every battery brand is manufactured differently and has a specific charge voltage range that is acceptable.  If a battery is constantly charged outside the recommended charge voltage range – whether undercharging or overcharging – it will reduce the battery’s capacity and may lead to battery failure during a power emergency.

Fortunately, many of these UPS failures can be traced back to human errors that are preventable.  This means that data centers looking to prevent UPS failures and maximize uptime can do so by implementing and vigilantly following a UPS failure prevention strategy.  First, it is important to develop a maintenance schedule, complete with checklists for consistency, and actually stick to it.  Don’t let routine battery maintenance fall off of your priority list, while it may not seem urgent, it will feel very urgent if the power fails.

One of the first and most important things that a data center should implement in their strategy is proper monitoring of batteries.  Every battery will have an estimated battery life determined by the manufacturer, some even boast as long of a life cycle as 10 years!  But, as any data center manager knows, UPS batteries do not last as long as their estimated life cycle because of a variety of factors. Just how long they will actually last will vary which is why monitoring is incredibly important. Batteries must be monitored at the cell level on a routine schedule, either quarterly or semi-annually and it is important to also check each string of batteries.  By doing this on a routine schedule, you can determine if a battery is near its end of life cycle or has already reached its end of life cycle and make any necessary repairs or replacements.  If it appears a battery is nearing the end of its life cycle it may be best to simply replace it so as not to risk a potential failure.  In addition to physically checking and monitoring UPS batteries, there are battery monitoring systems that can be used.  While physical checks are still critical, battery monitoring systems can provide helpful additional support that may prevent a UPS failure. Schneider Electric describes how battery monitoring systems can be a useful tool, “A second option is to have a battery monitoring system connected to each battery cell, to provide daily automated performance measurements. Although there are many battery monitoring systems available on the market today, the number of battery parameters they monitor can vary significantly from one system to another.

- A good battery monitoring system will monitor the battery parameters that IEEE 1491 recommends be measured. The 17 criteria it outlines include:

- String and cell float voltages, string and cell charge voltages, string and cell discharge voltages, AC ripple voltage

- String charge current, string discharge current, AC ripple current

- Ambient and cell temperatures

- Cell internal resistance

- Cycles

With such a system, users can set thresholds so they get alerted when a battery is about to fail. While this is clearly a step up from the scheduled maintenance in that the alerts are more timely, they are still reactive – you only get an alert after a problem crops up.”  Further, as your monitor your batteries it is important to collect and analyze the data so that you can make informed decisions about how to best maximize battery life.

Next, it is important to properly store your battery when not in use to maximize its lifespan which will help it function properly in the event of use.  A UPS battery must be charged every few months while in storage or its lifespan will be diminished.  If you cannot periodically charge your UPS battery while in storage, most experts recommend storing your battery in cooler temperatures – 50°F (10°C) or less – which will help slow down the degradation of your battery.

To keep your UPS battery functioning in optimal conditions, ambient temperature should not exceed 77 degrees Fahrenheit and should stay, generally, as close to that as possible.  It is important to not just prevent temperatures from exceeding that but prevent temperatures from frequently fluctuating because it will greatly tax UPS batteries and reduce their life expectancy.  It is important that your UPS is stored in an area of your data center where temperatures are carefully monitored and maintained to help promote proper function of your UPS in the event of an emergency.  Ideally, your UPS would be maintained in an enclosure with temperature and humidity control.


An increase in the number of annual preventive maintenance visits increases. Image Via: Emerson Network Power Network

While routine maintenance will require attention and dedication, it is not without merit.  In fact, Data Center Knowledge notes that there are statistics that back up the argument that routine maintenance really does prevent UPS failure, “In one study of more than 5,000 three-phase UPS units and more than 24,000 strings of batteries, the impact of regular preventive maintenance on UPS reliability was clear. This study revealed that the Mean Time Between Failure (MTBF) for units that received two preventive maintenance (PM) service visits a year is 23 times better than a UPS with no PM visits. According to the study, reliability continued to steadily increase with additional visits completed by skilled service providers with very low error rates.” Data centers must implement their own unique UPS maintenance strategy, tailored specifically to individual needs, and remain vigilant in their follow through.  Implementing UPS maintenance best practices, including maintaining proper temperatures, maintaining proper float voltage, avoiding over-cycling, properly storing batteries, utilizing UPS battery monitoring systems, and performing routine visual inspections, will help significantly decrease the risk of UPS failure.

Posted in Back-up Power Industry, computer room maintenance, Data Center Battery, data center equipment, Data Center Infrastructure Management, data center maintenance, DCIM, Uninterruptible Power Supply, UPS Maintenance | Tagged , , , , , , | Comments Off