Mostly though, in the spirit of HPC Gastronomy, a hearty digestive congratulation to the SCC23 Overall Winning team:
The 4-node Piora Swiss Cheese Racklette Fondue (Milan+A100), from ETH Zürich
(kid you not!) Bon HPC appétit!
]]>Hubert, it is notable, I think, that Frontier already gets 10 EF/s in HPL-MxP and so there should certainly be hope for 1 km, even today (networking seems ok). Converting SCREAM to mixed-precision might be a demanding task, but should be well worth it in the long run, especially if the so-developed methods also apply broadly to other models. From such efforts, NOAA could finally get its 1000x Cactus and Dogwood (#75-76 at 10 PF/s) for 1-km forecasts ( https://www.nextplatform.com/2022/06/28/noaa-gets-3x-more-oomph-for-weather-forecasting-it-needs-3300x/ )!
The new 40 million-core Sunway team is also doing well here with 5 EF/s in MxP (SC23) described in detail in Open Access ( https://dl.acm.org/doi/10.1145/3581784.3607030 ) which should be good for 1.4 km x-y grids. The EU’s LUMI and Japan’s Fugaku, at 2.4 and 2.0 EF/s MxP, respectively, might do nicely at around 2 km of horizontal resolution.
Hopefully, SCREAM’s vertical-Hamiltonian, horizontal-spectral, and temporal IMEX high-CFL RK, provide a bounty of opportunities for accurate, effective, and stable mixed-precision implementations.
The 10 EF/s 1-km goal also makes me wonder if anything interesting might have come out of last year’s DOE/Oak-Ridge “Advanced Computing Ecosystem” RFI, aimed at that performance range ( https://www.nextplatform.com/2022/06/30/so-you-think-you-can-design-a-20-exaflops-supercomputer/ )? If MxP were to give a 5x to 10x speedup on those, we’d be looking at 100 EF/s of practical effective oomph (one full tentacle of ZettaCthulhu! eh-eh-eh!)!
]]>Keep up the great work, all the way to 1 km at 10 EF/s, and beyond!
]]>I join you in your congratulations, and add to them the 3 new CPU-only systems that are in the Top-25 for both HPL and HPCG:
MareNostrum 5 GPP (Xeon), Shaheen III-CPU (EPYC), and Crossroads (Xeon)
Easy to program, flexible, and powerful, what’s no to like!
]]>Speaking of which, it was ironically mentioned to me recently that the costs of building power infrastructure for a linpack run could be saved, since we never run the machine that hard again after acceptance.
]]>Speaking of GBP, Sarat of E3SM SCREAM! just commented (back on the GBP TNP piece https://www.nextplatform.com/2023/09/15/chinas-1-5-exaflops-supercomputer-chases-gordon-bell-prize-again/ ) that their very detailed and impressive physically-based (PDEs) whole-earth 3.25 km horizontal-resolution cloud-cover prediction presentation (128 vertical nodes, 10 billion parameters for physics + dynamics, MI250x ROCm and V100 CUDA) is also Wednesday (morning) Nov. 15 ( https://sc23.conference-program.com/presentation/?id=gbv102&sess=sess298 ) with Open Access paper ( https://dl.acm.org/doi/10.1145/3581784.3627044 ). A must see by all accounts!
Interestingly, the European equivalent to SCREAM! is the model named EXCLAIM! (go figure!)! (eh-eh-eh!)
]]>Cloud HPC, that, on the face of it, seemed like a very dumb idea recently, is apparently not that bad at all with MS’ Azure Eagle machine hitting that very impressive 561 PF/s on HPL (#3)! Nvidia’s Eos SuperPOD (#9) also looks interesting as a pre-built 121 PF/s system that you can just buy, plug-in, and run (I guess), rather than having to go through months of instrument tuning and qualifier exams (as in that next log10 perf level: Exaflopping)!
And who could have ever expected the dual Spanish Inquisition minds of MareNostrum 5 ACC and GPP (#8 and #19) that convert HPC unfaithfuls (with surprise, ruthless efficiency, and torture by pillow and comfy chair) at a combined rate of 178 PF/s!?
The list is completely different from my expectations, but a great one nonetheless! Congrats to all, and especially the new entries!
]]>They were fat for their time, I suppose. I would like 1,024-bit vectors. HA!
]]>I think staying up without losing a node is important. How much wall time do these exascale HPL computations take anyway?
I’m confused why the A64FX is said to be “using special accelerators that put what is in essence a CPU and a fat vector on a single-socket processor.” I thought the Scalable Vector Extension on the A64FX were 512-bit wide. Isn’t that the same width as AVX512 on the Xeon?
Maybe the difference is the integrated HBM? But now Xeon MAX also has that.
I appreciate your analysis and find it amazing how much fun people have with the high-performance Linpack. Whether a relevant indicator for practical computation or not, HPL is still a good stress test to make sure the hardware works and meets design specifications.
]]>That smells like a typo to me….
]]>