Modius Data Center Blog

Jay Hartley, PhD

Recent Posts

Visualize Data Center Site Performance

Posted by Jay Hartley, PhD on Wed, Jul 06, 2011 @ 07:19 PM

There has been plenty of discussion of PUE and related efficiency/effectiveness metrics of late (Modius PUE Blog posts: 1, 2, 3). How to measure them, where to measure, when to measure, and how to indicate which variation was utilized. Improved efficiency can reduce both energy costs and the environmental impact of a data center. Both are excellent goals, but it seems to me that the most common driver for improving efficiency is a capacity problem. Efficiency initiatives are often started, or certainly accelerated, when a facility is approaching its power and/or cooling limits, and the organization is facing a capital expenditure to expand capacity.

When managing a multi-site enterprise, understanding the interaction between capacity and efficiency becomes even more important. Which sites are operating most efficiently? Which sites are nearing capacity? Which sites are candidates for decommissioning, efficiency efforts, or capital expansion?

For now, I will gracefully skip past the thorny questions about efficiency metrics that are comparable across sites. Let’s postulate for a moment that a reasonable solution has been achieved. How do I take advantage of it and utilize it to make management decisions?

Consider looking at your enterprise sites on a “bubble chart,” as in Figure 1. A bubble chart enables visualization of three numeric parameters in a single plot. In this case, the X axis shows utilized capacity. The Y axis shows PUE. The size of each bubble reflects the total IT power load.

Before going into the gory details of the metrics being plotted, just consider in general what this plot tells us about the sites. We can see immediately that three sites are above 80% capacity. Of the three, the Fargo site is clearly the largest, and is operating the most inefficiently. That would be the clear choice for initiating an efficiency program, ahead of even the less-efficient sites at Chicago and Orlando, which are not yet pushing their capacity limits. One might also consider shifting some of the IT load, if possible, to a site with lower PUE and lower utilized capacity, such as Detroit.

Data Center, Efficiency, Capacity

In this example, I could have chosen to plot DCiE (Data Center Infrastructure Efficiency)  vs. available capacity, rather than the complementary metrics PUE vs. utilized capacity. This simply changes the “bad” quadrant from upper right to lower left. Mainly an individual choice.

Efficiency is also generally well-bounded as a numeric parameter, between 0 and 100, while PUE can become arbitrarily large. (Yes, I’m ignoring the theoretical possibility of nominal PUE less than 1 with local renewable generation. Which is more likely in the near future, a solar data center with a DCiE of 200% or a start-up site with a PUE of 20?) Nonetheless, PUE appears to be the metric of choice these days, and it works great for this purpose.

Whenever presenting capacity as a single number for a given site, one should always present the most-constrained resource. When efficiency is measured by PUE or a similar power-related metric, then capacity should express either the utilized power or cooling capacity, whichever is greater. In a system with redundancy, be sure to that into account

The size of the bubble can, of course, also be modified to reflect total power, power cost, carbon footprint, or whatever other metric is helpful in evaluating the importance of each site and the impact of changes.

This visualization isn’t limited to comparing across sites. Rooms or zones within a large data center could also be compared, using a variant of the “partial” PUE (pPUE) metrics suggested by the Green Grid. It can also be used to track and understand the evolution of a single site, as shown in Figure 2.

This plot shows an idealized data-center evolution as would be presented on the site-performance bubble chart. New sites begin with a small IT load, low utilized capacity, and a high PUE. As the data center grows, efficiency improves, but eventually it reaches a limit of some kind. Initiating efficiency efforts will regain capacity, moving the bubble down and left. This leaves room for continued growth, hopefully in concert with continuous efficiency improvements.

Finally, when efficiency efforts are no longer providing benefit, capital expenditure is required at add capacity, pushing the bubble back to the left.

Those of you who took Astronomy 101 might view Figure 2 as almost a Hertzsprung-Russell diagram for data centers!

Whether tracking the evolution of a single data center, or evaluating the status of all data centers across the enterprise, the Data Center Performance bubble chart can help understand and manage the interplay between efficiency and capacity.

Data Center Capacity

Topics: Capacity, PUE, data center capacity, data center management, data center operations, DCIM

Data Center Monitoring in the Cloud

Posted by Jay Hartley, PhD on Tue, Jun 21, 2011 @ 11:24 AM

modius, opendata, logoModius OpenData has recently reached an intriguing milestone. Over half of our customers are currently running the OpenData® Enterprise Edition server software on virtual machines (VM). Most new installations are starting out virtualized, and a number of existing customers have successfully migrated from a hard server to a virtual one.

In many cases, some or all of the Collector modules are also virtualized “in the cloud,” at least when gathering data from networked equipment and network-connected power and building management systems. It’s of course challenging to implement a serial connection or tie into a relay from a virtual machine. It will be some time before all possible sensor inputs are network-enabled, so 100% virtual data collection is a ways off. Nonetheless, we consider greater than 50% head-end virtualization to be an important achievement.

This does not mean that all those virtual installations are running in the capital-C Cloud, on the capital-I Intranet. Modius has hosted trial proof-of-concept systems for prospective customers on public virtual machines, and a small number of customers have chosen to host their servers “in the wild.” The vast majority of our installations, both hardware and virtual, are running inside the corporate firewall.

Data Center, Virtualization, Monitoring Many enterprise IT departments are moving to a virtualized environment internally. In many cases, it has been made very difficult for a department to purchase new actual hardware. The internal “cloud” infrastructure allows for more efficient usage of resources such as memory, CPU cycles, and storage. Ultimately, this translates to more efficient use of electrical power and better capacity management. These same goals are a big part of OpenData’s fundamental purpose, so it only makes sense that the software would play well with a virtualized IT infrastructure.

There are two additional benefits of virtualization. One is availability. Whether hardware or virtual, OpenData Collectors can be configured to fail-over to a secondary server. The database can be installed separately as part of the enterprise SAN. If desired, the servers can be clustered through the usual high-availability (HA) configurations. All of these capabilities are only enhanced in a highly distributed virtual environment, where the VM infrastructure may be able to dynamically re-deploy software or activate cluster nodes in a number of possible physical locations, depending on the nature of the outage.

Even without an HA configuration, routine backups can be made of the entire virtual machine, not simply the data and configurations. In the event of an outage or corruption, the backed-up VM can be restored to production operation almost instantly.

The second advantage is scalability. Virtual machines can be incrementally upgraded in CPU, memory, and storage capabilities. With a hardware installation, incremental expansion is a time-consuming, risky, and therefore costly, process.  It is usually more cost-effective to simply purchase hardware that is already scaled to support the largest planned installation. In the meantime, you have inefficient unused capacity taking up space and power, possibly for years. On a virtual machine, the environment can be “right sized” for the system in its initial scope.

Overall, the advantages of virtualization apply to OpenData as with any other enterprise software. Lower up-front costs, lower long-term TCO, increased reliability, and reduced environmental impact.  All terms that we at Modius, and our customers, love to hear.

Topics: Energy Efficiency, DCIM, monitoring, optimization, Energy Management, Energy Analysis, instrumentation

Measuring PUE with Shared Resources, Part 2 of 2

Posted by Jay Hartley, PhD on Wed, May 25, 2011 @ 05:02 PM

PUE in an Imperfect World

Last week I started discussing the instrumentation and measurement of PUE when the data center shares resources with other facilities. The most common shared resource is chilled water, such as from a common campus or building mechanical yard. We looked at the simple way to allocate a portion of the power consumed by the mechanical equipment to the overall power consumed by the data center.

The approach there assumed perfect sub-metering of both the power and chilled water, for both the data center and the mechanical yard. Lovely situation if you have it or can afford to quickly achieve it, but not terribly common out in the hard, cold (but not always cold enough for servers) world. Thus, we must turn to estimates and approximations.

Of course, any approximations made will degrade the ability to compare PUEs across facilities--already a tricky task. The primary goal is to provide a metric to measure improvement. Here are a few scenarios that fall short of the ideal, but will give you something to work with:

  • Calculate PUE pPUECan’t measure data-center heat load, but have good electrical sub-metering. Use electrical power as a substitute for cooling load. Every watt going in ends up as heat, and there usually aren’t too many people in the space routinely. Works best if you’re also measuring the power to all other non-data-center cooled space. The ratio of the two will get you close to the ratio of their cooling loads. If there are people in a space routinely, add 1 kWh of load per head per 8-hr day of light office work.
  • Water temperature is easy, but can’t install a flow meter. Many CRAHs control their cooling power through a variable valve. Reported “Cooling Load” is actually the percentage opening of the valve. Get the valve characteristics curve from the manufacturer. Your monitoring system can then convert the cooling load to an estimated flow. Add up the flows from all CRAHs to get the total.
  • Have the head loads, but don’t know the mechanical yard’s electrical power. Use a clamp-on hand meter to take some spot measurements. From this you can calculate a Coefficient of Performance (COP) for the mechanical yard, i.e., the power consumed per cooling power delivered. Try to measure it at a couple of different load levels, as the real COP will depend on the % load.
  • I’ve got no information about the mechanical yard. Not true. The control system knows the overall load on the mechanical yard. It knows which pumps are on, how many compressor stages are operating, and whether the cooling-tower fan is running. If you have variable-speed drives, it knows what speed they’re running. You should be able to get from the manufacturer at least a nominal COP curve for the tower and chiller and nominal power curves for pumps and fans. Somebody had all these numbers when they designed the system, after all.

Whatever number you come up with, perform a sanity check against the DOE’s DCPro online tool. Are you in the ballpark? Heads up, DCPro will ask you many questions about your facility that you may or may not be prepared to answer. For that reason alone, it’s an excellent exercise.

It’s interesting to note that even the Perfect World of absolute instrumentation can expose some unexpected inter-dependencies. Since the efficiency of the mechanical yard depends on its overall load level, the value of the data-center PUE can be affected by the load level in the rest of the facility. During off hours, when the overall load drops in the office space, the data center will have a larger share of the chilled-water resource. The chiller and/or cooling-tower efficiency will decline at the same time. The resulting increase in instantaneous data center PUE does not reflect a sudden problem in the data center’s operations; though it might suggest overall efficiency improvements in the control strategy.

PUE is a very simple metric, just a ratio of two power measurements, but depending on your specific facility configuration and level of instrumentation, it can be remarkably tricky to “get it right.” Thus, the ever-expanding array of tier levels and partial alternative measurements. Relatively small incremental investments can steadily improve the quality of your estimates. When reporting to management, don’t hide the fact that you are providing an estimated value. You’ll only buy yourself more grief later when the reported PUE changes significantly due to an improvement in the calculation itself, instead of any real operational changes.

The trade-off in coming to a reasonable overall PUE is between investing in instrumentation and investing in a bit of research about your equipment and the associated estimation calculations. In either case, studying the resulting number as it varies over the hours, days, and seasons can provide excellent insight into the operational behavior of your data center.

Topics: BMS, Dr-Jay, PUE, instrumentation, Measurements-Metrics, pPUE

Latest Modius Posts

Posts by category

Subscribe via E-mail