ALONG AMERICA’S WEST coast, the world’s most valuable companies are racing to make artificial intelligence smarter. Google and Facebook have boasted of experiments using billions of photos and thousands of high-powered processors. But late last year, a project in eastern Tennessee quietly exceeded the scale of any corporate AI lab. It was run by the US government.
The record-setting project involved the world’s most powerful supercomputer, Summit, at Oak Ridge National Lab. The machine captured that crown in June last year, reclaiming the title for the US after five years of Chinatopping the list. As part of a climate research project, the giant computer booted up a machine-learning experiment that ran faster than any before.
Summit, which occupies an area equivalent to two tennis courts, used more than 27,000 powerful graphics processors in the project. It tapped their power to train deep-learning algorithms, the technology driving AI’s frontier, chewing through the exercise at a rate of a billion billion operations per second, a pace known in supercomputing circles as an exaflop.
“Deep learning has never been scaled to such levels of performance before,” says Prabhat, who leads a research group at the National Energy Research Scientific Computing Center at Lawrence Berkeley National Lab. (He goes by one name.) His group collaborated with researchers at Summit’s home base, Oak Ridge National Lab.
Fittingly, the world’s most powerful computer’s AI workout was focused on one of the world’s largest problems: climate change. Tech companies train algorithms to recognize faces or road signs; the government scientists trained theirs to detect weather patterns like cyclones in the copious output from climate simulations that spool out a century’s worth of three-hour forecasts for Earth’s atmosphere. (It’s unclear how much power the project used or how much carbon that spewed into the air.)
The Summit experiment has implications for the future of both AI and climate science. The project demonstrates the scientific potential of adapting deep learning to supercomputers, which traditionally simulate physical and chemical processes such as nuclear explosions, black holes, or new materials. It also shows that machine learning can benefit from more computing power—if you can find it—boding well for future breakthroughs.
“We didn’t know until we did it that it could be done at this scale,” says Rajat Monga, an engineering director at Google. He and other Googlers helped the project by adapting the company’s open-source TensorFlow machine-learning software to Summit’s giant scale.
Most work on scaling up deep learning has taken place inside the data centers of internet companies, where servers work together on problems by splitting them up, because they are connected relatively loosely, not bound into one giant computer. Supercomputers like Summit have a different architecture, with specialized high-speed connections linking their thousands of processors into a single system that can work as a whole. Until recently, there has been relatively little work on adapting machine learning to work on that kind of hardware.
Monga says working to adapt TensorFlow to Summit’s scale will also inform Google’s efforts to expand its internal AI systems. Engineers from Nvidia also helped out on the project, by making sure the machine’s tens of thousands of Nvidia graphics processors worked together smoothly.
Finding ways to put more computing power behind deep-learning algorithms has played a major part in the technology’s recent ascent. The technology that Siri uses to recognize your voice and Waymo vehicles use to read road signs burst into usefulness in 2012 after researchers adapted it to run on Nvidia graphics processors.
In an analysis published last May, researchers from OpenAI, a San Francisco research institute cofounded by Elon Musk, calculated that the amount of computing power in the largest publicly disclosed machine-learning experiments has doubled roughly every 3.43 months since 2012; that would mean an 11-fold increase each year. That progression has helped bots from Google parent Alphabet defeat champions at tough board games and videogames, and fueled a big jump in the accuracy of Google’s translation service.
Google and other companies are now creating new kinds of chips customized for AI to continue that trend. Google has said that “pods” tightly integrating 1,000 of its AI chips—dubbed tensor processing units, or TPUs—can provide 100 petaflops of computing power, one-tenth the rate Summit achieved on its AI experiment.
The Summit project’s contribution to climate science is to show how giant-scale AI could improve our understanding of future weather patterns. When researchers generate century-long climate predictions, reading the resulting forecast is a challenge. “Imagine you have a YouTube movie that runs for 100 years. There’s no way to find all the cats and dogs in it by hand,” says Prabhat of Lawrence Berkeley. The software typically used to automate the process is imperfect, he says. Summit’s results showed that machine learning can do it better, which should help predict storm impacts such as flooding or physical damage. The Summit results won Oak Ridge, Lawrence Berkeley, and Nvidia researchers the Gordon Bell Prize for boundary-pushing work in supercomputing.
Running deep learning on supercomputers is a new idea that’s come along at a good moment for climate researchers, says Michael Pritchard, a professor at the University of California, Irvine. The slowing pace of improvements to conventional processors had led engineers to stuff supercomputers with growing numbers of graphics chips, where performance has grown more reliably. “There came a point where you couldn’t keep growing computing power in the normal way,” Pritchard says.
That shift posed some challenges to conventional simulations, which had to be adapted. It also opened the door to embracing the power of deep learning, which is a natural fit for graphics chips. That could give us a clearer view of our climate’s future. Pritchard’s group showed last year that deep learning can generate more realistic simulations of clouds inside climate forecasts, which could improve forecasts of changing rainfall patterns.