Modius Data Center Blog

Data Center Cooling has many components...

Posted by Mark Harris on Thu, Jun 03, 2010 @ 03:37 PM

Just read about a new innovative way to address the cooling requirements within the data center worthy of mention here. As no surprise, the data center energy management challenge has many parts to it, and as we all are seeing, MANY different new solutions will be required and combined over time to fully embrace the REALM OF WHAT'S POSSIBLE. Oh sure, everyone will have their favorite 'energy saver' technology. We saw this happen with Virtualization, and we saw it happen with Variable Frequency Drive controllers for data center fans.

Well, what if we take a look WITHIN the servers themselves and consider the opportunities there? Does the WHOLE server generate heat? NO. Key parts do, like the CPU, chipset, VGA chip and Memory & controllers. So why do we have to BLOW SO MUCH air across the entire motherboard, using bigger expensive to operate fans? Wouldn't it be better to SPOT COOL just where the heat is? Reminder, the goal is to just move the heat away from the chips that generate heat. We don't need to move large volumes of air just for the thrill of air handling....

I have seen two competing advances in this space. One maturing approach has been adopted in 'trials' by some of the biggest server vendors. They offer liquid based micro heat exchanger equipped versions of some of their commercial server product lines. This means these special servers have included PLUMBING/cooling pipes into the server chassis themselves, and the circulating fluid moves the heat away from the server's heat-generating chips. Take a look right next to the LAN port and power plug in the back, and you'll see an inlet/outlet fitting for liquid! Basically fluid based heat removal. Humm, harkens back to the 80's when big IBM 390s were using water cooling when everyone else went to air. (As a note, fluid cooling is making a resergence as liquid cooling becomes popular once again...).

So now I see a new approach... 'solid state' air jets. Air jets? Yes really small air movers that are essentially silent, have no moving parts, and consume tiny bits of power. Turns out at least one vendor has created really small 'jets' which have proven that you can move LOTS of air without any moving parts. Yes, they are also really silent, and can magically create large amounts of air movement in really small spaces. Using this technology, you can target just the chips that need cooling with relative 'hurricanes', and then simply use small standard fans to carry this (now easily accessible) hot air out of the box.

What results in savings does the spot jets achieve? In their published test, they reduced the standard high power fan speed from 9000 rpm to 6500 rpm, going from 108watts originally to only  62watts. Add back into this an estimated 10% energy cost for the air jets themselves, and the net savings for fans inside the box is about 30%. Remember, FANs account for nearly 47% of a data centers' entire cooling energy consumption, so reducing FAN speeds inside AND outside the boxes is critical to long term power savings.

Lastly, how do you know all your effort has paid off??? Monitor FAN speeds! I'll say it a million times, monitoring FAN speeds is very important. The slower the run, the less they consume. Monitor, Monitor, Monitor!!!

Topics: Energy Efficiency, data center monitoring, data center cooling, Cooling-Airflow

Data Center Monitoring: Out-of-Band versus In-Band.

Posted by Mark Harris on Wed, Jun 02, 2010 @ 12:02 PM

There was a time where x86 hardware systems and the applications and operating systems chosen to be installed upon them were considered good, but not 'bet your business' great. Reliability was less than ideal. Early deployments saw smaller numbers of servers, and each and every server counted. The applications themselves were not decomposed well enough to share the transaction processing, so failures of any server impacted actual production. Candidly I am not sure if it was the hardware or software that was mostly at fault, or a combination of both, but the concept of server system failures was a very real topic. High Availability or "HA" configurations were considered standard operating procedure for most applications.

The server vendors responded to this negative challenge by upping their game, designing much more robust server platforms and using higher quality components, connectors, designs, etc. The operating system vendors rose to the challenge by segmenting their offerings to offer industrial strength 'server' distributions and 'certified platform' hardware compatibility programs. This made a huge difference and TODAY, modern servers rarely fail. They run, they run hard and are perceived to be rock solid if provisioned properly.

Why the history? Because in these early times for servers, their less than favorable reliability characteristics required some form of auxillary bare metal 'out of band' access for these servers to correct operational failures at the hardware level. Technologies such as Intel's IPMI and HP's ILO became commonplace discussion when looking to build data center solutions with remote remediation capabilities. This was provided by an additional small CPU chip called a BMC that required no loading, no firmware, nothing but power to communicate sensor and status data with the outside world. The ability to Reboot a server in the middle of the night over the internet from the sys admin's house was all the rage. Technologies like Serial Console and KVM were the starting point, followed by these Out-of-Band (ILO & IPMI).

Move the clock forward to today, and you'll see that KVM, IPMI & ILO are interesting technologies and critical for specific devices which are still considered critical to core businesses as they are mostly applicable when a server is NOT running any operating system or the server has halted and is no longer 'on the net'. In most all other times, when the operating system itself IS running and the servers are on the network and accessible, server makers have supplied standard drivers to access all of the sensors and other hardware features of the motherboard and allow in-band remote access with technologies such as SSH and RDP.

 

Today, it makes very little difference whether a monitoring system uses operating system calls or out-of-band access tools. The same sensor and status information is available through both sets of technologies and it depends more on how the servers are physically deployed and connected. Remember, a huge percentage of Out-of-Band ports remain unconnected on the back of product servers. Many customers consider the second OOB connection to be costly and redundant in all but the worst/extreme failure conditions. (BUT critically important for certain type of equipment, such as any in-house DNS servers, or perhaps a SAN storage director)

Topics: data center monitoring, data center temperature sensors, Protocols-Phystical-Layer-Interfaces

Fine Corinthian Leather... or Data Center Analysis?

Posted by Mark Harris on Tue, May 25, 2010 @ 09:23 AM

 

Think back to the last time your purchased a new car. I would bet that within the first 30 minutes of actually looking at the brochures or sitting in the car, the attention turned to the Leather seats, body color, Stereo system and electronics package.

By inference, the consumer (you) had already assumed and agreed that the car foundation itself was as stated in the data sheet and their design engineers had done their job building a functional car. It had a chassis, it had an engine of a certain size, and it was as speedy and efficient as the TV commercial showed. No need to be concerned that the physical layer had any issues. Somehow the car would perform.

Instead, your attention was to the 'soft' details. There you are, buying a $30,000 car, and most of the sales configuration and cost discussion was about the $3000-$4000 worth of options. Most people don't even know how big the gas tank is when they drive home in the car!

The Data Center is much the same. The underpinnings for most data centers have for the most part been specified by the building design engineers of record, built per spec, and typically installed far away from view. The mechanical and electrical structures were designed and installed based upon equipment resource requirements and assumptions at the time, and at the end of the day, the IT organization ultimately 'inherited' what was installed. How many watts per square foot were really possible? What is the redundant Cooling capacity? None of these critical resource available capacities or real-time usage is actually well understood or even visible to the IT organization over time. (And UNTIL LATELY, not even much concern about it). This situation is compounded by the fact that all of the major IT vendors are now selling boxes that consume 2-4 times the amount of power in the same space as the units shipped just two years ago. It can be seen that the data center is a VERY dynamic system, and the most valueable on-going data center analysis and KPIs must be based upon it's real-time aspects.

While IT as a whole has focused for years on their own 'Fine Corinthian Leather", (like virtualization/operating systems, storage and networks), the real challenge at hand today is to better understand the real-time performance of the chassis. The amount of fuel in the gas tank and it's current efficiency, the engine performance, the available redundancy systems, etc.

Don't get me wrong, I am a huge fan of Fine Corinthian Leather, but I think it's prudent to understand the bigger picture before claiming victory...

Topics: data center monitoring, Data Center Metrics, data center analysis

Latest Modius Posts

Posts by category

Subscribe via E-mail