Artificial intelligence is set to play a bigger role in data-center operations as enterprises begin to adopt machine-learning technologies that have been tried and tested by larger data-center operators and colocation providers.
Today’s hybrid computing environments often span on-premise data centers, cloud and collocation sites, and edge computing deployments. And enterprises are finding that a traditional approach to managing data centers isn’t optimal. By using artificial intelligence, as played out through machine learning, there’s enormous potential to streamline the management of complex computing facilities.
AI in the data center, for now, revolves around using machine learning to monitor and automate the management of facility components such as power and power-distribution elements, cooling infrastructure, rack systems and physical security.
Inside data-center facilities, there are increasing numbers of sensors that are collecting data from devices including power back-up (UPS), power distribution units, switchgear and chillers. Data about these devices and their environment is parsed by machine-learning algorithms, which cull insights about performance and capacity, for example, and determine appropriate responses, such as changing a setting or sending an alert. As conditions change, a machine-learning system learns from the changes – it’s essentially trained to self-adjust rather than rely on specific programming instructions to perform its tasks.
The goal is to enable data-center operators to increase the reliability and efficiency of the facilities and, potentially, run them more autonomously. However, getting the data isn’t a trivial task.
A baseline requirement is real-time data from major components, says Steve Carlini, senior director of data-center global solutions at Schneider Electric. That means chillers, cooling towers, air handlers, fans and more. On the IT equipment side, it means metrics such as server utilization rate, temperature and power consumption.
“Metering a data center is not an easy thing,” Carlini says. “There are tons of connection points for power and cooling in data centers that you need to get data from if you want to try to do AI.”
IT pros are accustomed to device monitoring and real-time alerting, but that’s not the case on the facilities side of the house. “The expectation of notification in IT equipment is immediate. On your power systems, it’s not immediate,” Carlini says. “It’s a different world.”
It’s only within the last decade or so that the first data centers were fully instrumented, with meters to monitor power and cooling. And where metering exists, standardization is elusive: Data-center operators rely on building-management systems that utilize multiple communication protocols – from Modbus and BACnet to LONworks and Niagara – and have had to be content with devices that don’t share data or can’t be operated via remote control. “TCP/IP, Ethernet connections – those kinds of connections were unheard of on the powertrain side and cooling side,” Carlini says.
The good news is that data-center monitoring is advancing toward the depth that’s required for advanced analytics and machine learning. “The service providers and colocation providers have always been pretty good at monitoring at the cage level or the rack level, and monitoring energy usage. Enterprises are starting to deploy it, depending on the size of the data center,” Carlini says.
Machine learning keeps data centers cool
A Delta Airlines data center outage, attributed to electrical-system failure, grounded about 2,000 flights over a three-day period in 2016 and cost the airline a reported $150 million. That’s exactly the sort of scenario that machine learning-based automation could potentially avert. Thanks to advances in data center metering and the advent of data pools in the cloud, smart systems have the potential to spot vulnerabilities and drive efficiencies in data-center operations in ways that manual processes can’t.
A simple example of machine learning-driven intelligence is condition-based maintenance that’s applied to consumable items in a data center, for example, cooling filters. By monitoring the air flow through multiple filters, a smart system could sense if some of the filters are more clogged than others, and then direct the air to the less clogged units until it’s time to change all the filters, Carlini says.
Another example is monitoring the temperature and discharge of the batteries in UPS systems. A smart system can identify a UPS system that’s been running in a hotter environment and might have been discharged more often than others, and then designate it as a backup UPS rather than a primary. “It does a little bit of thinking for you. It’s something that could be done manually, but the machines can also do it. That’s the basic stuff,” Carlini says.
Taking things up a level is dynamic cooling optimization, which is one of the more common examples of machine learning in the data center today, particularly among larger data-center operators and colocation providers.
With dynamic cooling optimization, data center managers can monitor and control a facility’s cooling infrastructure based on environmental conditions. When equipment is moved or computing traffic spikes, heat loads in the building can change, too. Dynamically adjusting cooling output to shifting heat loads can help eliminate unnecessary cooling capacity and reduce operating costs.
Colocation providers are big adopters of dynamic cooling optimization, says Rhonda Ascierto, research director for the datacenter technologies and eco-efficient IT channel at 451 Research. “Machine learning isn’t new to the data center,” Ascierto says. “Folks for a long time have tried to better right-size cooling based on capacity and demand, and machine learning enables you to do that in real time.”
Data center operators tend to run much more cooling equipment than they need to, says Cliff Federspiel, founder, president and CTO of Vigilent. “It usually produces a semi-acceptable temperature distribution, but at a really high cost.”
If there’s a hot spot, the typical reaction is to add more cooling capacity. In reality, higher air velocity can produce pressure differences, interfering with the flow of air through equipment or impeding the return of hot air back to cooling equipment. Even though it’s counterintuitive, it might be more effective to decrease fan speeds, for example.
Vigilent’s machine learning-based technology learns which airflow settings optimize each customer’s thermal environment. Delivering the right amount of cooling, exactly where it’s needed, typically results in up to a 40% reduction in cooling-energy bills, the company say.
Beyond automating cooling systems, Vigilent’s software also provides analytics that customers are using to make operational decisions about their facilities.
“Our customers are becoming more and more interested in using that data to help manage their capital expenditures, their capacity planning, their reliability programs,” Federspiel says. “It’s creating opportunities for lots of new kinds of data-dependent decision making in the data center.”
AI makes existing processes better
Looking ahead, data-center operators are working to extend the success of dynamic-cooling optimization to other areas. Generally speaking, areas that are ripe for injecting machine learning are familiar processes that require repetitive tasks.
“New machine learning-based approaches to data centers will most likely be applied to existing business processes because machine learning works best when you understand the business problem and the rules thoroughly,” Ascierto says.
Enterprises have existing monitoring tools, of course. There’s a longstanding category of data-center infrastructure management (DCIM) software that can provide visibility into data center assets, interdependencies, performance and capacity. DCIM software tackles functions including remote equipment monitoring, power and environmental monitoring, IT asset management, data management and reporting. Enterprises use DCIM software to simplify capacity planning and resource allocation as well as ensure that power, equipment and floor space are used as efficiently as possible.
“If you have a basic monitoring and asset management in place, your ability to forecast capacity is vastly improved,” Ascierto says. “Folks are doing that today, using their own data.”
Next up: adding outside data to the DCIM mix. That’s where machine learning plays a key role.
Data-center management as a service, or DMaaS, is a service that’s based on DCIM software. But it’s not simply a SaaS-delivered version of DCIM software. DMaaS takes data collection a step further, aggregating equipment and device data from scores of data centers. That data is then anonymized, pooled and analyzed at scale using machine learning.
Two early players in the DMaaS market are Schneider Electric and Eaton. Both vendors mined a slew of data from their years of experience in the data-center world, which includes designing and building data centers, building management, electrical distribution, and power and cooling services.
“The big, significant change is what Schneider and Eaton are doing, which is having a data lake of many customers’ data. That’s really very interesting for the data-center sector,” Ascierto says.
Access to that kind of data, harvested from a wide range of customers with a wide range of operating environments, enables an enterprise to compare its own data-center performance against global benchmarks. For example, Schneider’s DMaaS offering, called EcoStruxure IT, is tied to a data lake containing benchmarking data from more than 500 customers and 2.2 million sensors.
“Not only are you able to understand and solve these issues using your own data. But also, you can use data from thousands of other facilities, including many that are very similar to yours. That’s the big difference,” Ascierto says.
Predictive and preventative maintenance, for example, benefit from deeper intelligence. “Based on other machines, operating in similar environments with similar utilization levels, similar age, similar components, the AI predicts that something is going to go wrong,” Ascierto says.
Scenario planning is another process that will get a boost from machine learning. Companies do scenario planning today, estimating the impact of an equipment move on power consumption, for example. “That’s available without machine learning,” Ascierto says. “But being able to apply machine-learning data, historic data, to specific configurations and different designs – the ability to be able to determine the outcome of a particular configuration or design is much, much greater.”
Risk analysis and risk mitigation planning, too, stand to benefit from more in-depth analytics. “Data centers are so complex, and the scale is so vast today, that it’s really difficult for human beings to pick up patterns, yet it’s quite trivial for machines,” Ascierto says.
In the future, widespread application of machine learning in the data center will give enterprises more insights as they make decisions about where to run certain workloads. “That is tremendously valuable to organizations, particularly if they are making decisions around best execution venue,” Ascierto says. “Should this application run in this data center? Or should we use a collocation data center?”
Looking further into the future, smart systems could take on even more sophisticated tasks, enabling data centers to dynamically adjust workloads based on where they will run the most efficiently or most reliably. “Sophisticated AI is still a little off in to the future,” Carlini says.
In the meantime, for companies that are just getting started, he stresses the importance of getting facilities and IT teams to collaborate more.
“It’s very important that you consider all the domains of the data center – the power, the cooling and the IT room,” Carlini says. The industry is working hard to ensure interoperability among the different domains’ technologies. Enterprises need to do the same on the staffing front.
“Technically it’s getting easier, but organizationally you still have silos,” he says.