Modius Data Center Blog

The Water Cooler as a Critical Facility Infrastructure

Posted by Jay Hartley, PhD on Mon, May 02, 2011 @ 04:31 PM

Any data center manager can rattle off the standard list of critical facility equipment in the data center: generator, transfer switch, UPS, PDU, CRAC, fire system, etc. At times, however, one must take a step back and broaden one's view when determining what is critical. Unfortunately, too often we don't realize we're missing something important until after disaster strikes. In the hopes of heading off some future disasters, I share with you the following cautionary tale. I'll give you the take-away message in advance: "Look up!"

Scene:  A corporate office tower in Anytown, USA. A data center consumes the bulk of one floor. It is an efficient, well-maintained data center, with dual, dedicated utility feeds supplying a 2N-redundant power system, backup generator, and redundant chillers. It also boasts a years-long history of non-stop 100% reliable operation.

Blog   Water CoolerThe office floors above the data center all have essentially identical layouts, consisting of conference rooms, cube farms, and the occasional honest-to-goodness office.  Centrally located on each floor is an efficient, well-maintained kitchenette. In each kitchenette is a water cooler. Like many of its kind where the tap water is potable, this water cooler is plumbed directly to the sink. The ¼-inch white plastic tubing is anchored in place with small brass ferrules. This system has been doing yeoman's work for years, reliably delivering chilled, filtered drinking water to the employees with better than 99% up time, allowing for scheduled maintenance.

Action:  Disaster strikes, in accordance with Murphy's Law, late one weekend night. The water cooler’s plastic plumbing finally succumbs to age and stress. Water streams onto the floor unchecked, quickly covering the linoleum surface and finding its way into the wall. There it heads in water's favorite direction, down, passing easily through the matching kitchenette walls in the identical floor plans below.

The water continues until reaching a floor with a dramatically different layout. Temporarily stopped in its pursuit of gravity, the water gathers its forces, soaking into the obstruction until eventually, like the plastic tube, the ceiling tile succumbs. The next obstruction happens to be a PDU and a couple of neighboring server racks in the data center. They too succumb, we assume rather spectacularly.

Data Center Water LeakMeanwhile, back in the kitchenette, the leak is discovered during a security sweep and the flow is cut off, but human intervention has come too late for the electronics down below. Power redundancy saved all servers that were not directly water-damaged, so only a few internal business applications took an uptime hit, along with the kitchenette. Over $100,000 of damage, thanks to the failure of a few pennies of plastic tubing in a “non-critical” part of the facility.

 

Solution:  One could easily focus on the data center itself and protecting its equipment:  Place catch basins in the ceiling and extend the raised-floor leak detection system into them. That would help, and perhaps give a bit more warning. Not a bad idea in any case, if you have the time and money. Better solution? Inexpensive, off-the-shelf, floor leak detectors come in kits with automatic shut-off valves. Available online or in your local hardware store for home use in laundry rooms. An audible alarm is nice, but does an alarm make a noise if no one is there to hear it? Definitely get one with a second, normally-closed contact closure to link into your monitoring system. (You do have one, don’t you? Consider OpenData ME, SE, or EE!) Stop the leak early, and get advanced notice.

While you're at it, pick one up for that efficient, well-maintained, and oh-so-convenient second-floor laundry room in your home!

I hope you've enjoyed this tale. In the coming weeks, I'll share additional stories from the field as well as my musings on monitoring, instrumentation, and metrics. Visit my blog next week for insights on metering total energy for PUE—and a tip shared about the ATS.

Topics: Data-Center-Best-Practices, critical facility, leak detection, Dr-Jay, Data-Collection-and-Analysis, Sensors-Meters-and-Monitoring, Uptime-Assurance, monitoring

Measuring Available Redundant Capacity (ARC) in the Data Center

Posted by Jay Hartley, PhD on Fri, Dec 18, 2009 @ 07:00 AM

One of the key power usage metrics that I often find our customers requesting is  Available Redundant Capacity (ARC). This metric can mean different things to different people, but in simple terms, we at Modius like to define it as the amount of IT load that can be added to a data center system as a whole without sacrificing redundancy.

When viewed from the rack, row, room, or building level (or even across a network of data centers at the enterprise level), ARC provides a simple way to answer the question: “Where can I safely add new IT equipment without overloading and potentially bringing down my facility?”

Typically, most data centers don’t calculate ARC. Instead, operators set a simple alarm threshold on the Actual Loadof each device. For example, if the power load reaches 50% on a device (or more often 40% when de-rating), then the device or the monitoring system will throw an alarm.

However, this simple approach to thresholding based on device power usage doesn’t effectively capture all the conditions of the broader power distribution system. There can be hidden capacity that allows for safe failover, even though simple device-level thresholding suggests otherwise.

The goal of system ARC is to identify where you can handle additional load without sacrificing system redundancy. To calculate ARC for power of a device in a dual-feed situation, the calculation is simply:

ARC = {Device Capacity}/2 – {Actual Load}

In most cases, the Device Capacity will be de-rated to allow for some margin. In the case of power capacity, it is common to de-rate apparent power (kVA) capacity by 80%. ARC can also be expressed in real power (kW) if you know or can estimate the power factor of the load. It is even more important to de-rate the capacity in the case kW measurements to allow for potential load problems that could degrade power factor.

Below is an ARC-based dashboard in action:

Here, the top panel shows how ARC has been calculated for 6 different data centers, along with a measure of cooling overhead. The lower panel shows the drill down for one of the sites.

When calculating the overall ARC for devices in parallel, you can add the ARCs of the individual units. For instance:

UPS A has 10 kVA ARC
UPS B has 8 kVA ARC
Together, they have 18 kVA ARC
Interestingly, it is possible to have a safely redundant system even though one of the individual devices has a negative ARC. For example:

UPS A has 3 kVA ARC
UPS B has −2 kVA ARC
The net ARC of the system is a small but safely positive 1 kVA
In this case, even though one UPS is nominally overloaded according to the simple one-device threshold, either UPS can fail without dropping any load.

Calculating system ARC from the individual device ARCs in this way assumes that the capacities of both parallel components are the same. This is most often the case, but in the rare instance that it is not, then you have to total the actual load across the devices, and compare it to the (de-rated) capacity of the smaller device. This ensures that the most-limited device can handle the entire load.

Some questions may arise when the load is imbalanced, as in the examples above. Such imbalances may arise because some of the load is not configured redundantly. Some loads also do not balance themselves between the two power paths. The ARC calculation doesn’t depend on knowing such details. Of course, any non-redundant load will be dropped if it loses its power source; however, as long as the system ARC is positive you know that any redundant load will be protected regardless of which power source is lost.

In summary, the goal of system ARC is to identify where you can handle additional load without sacrificing system redundancy. With parallel equipment, you can total the ARC of all components if they have the same capacity rating. When looking at ARC along the power chain, the correct system value will be the minimum ARC of any one set of components.

Kind regards,

Jay H. Hartley, PhD
Director of Professional Services
Jay.Hartley@Modius.com

Topics: Data-Center-Best-Practices, data center monitoring, Dr-Jay, data center capacity, data center energy efficiency, Measurements-Metrics, Capacity-Management

Latest Modius Posts

Posts by category

Subscribe via E-mail