I had another look and it appears the 512bit vector engine does support double precision, which I didn’t see before. There is also another slide that mentions a 10x improvement over Power9 on Linpack.
The same calculation as before to correct for the differing number of cores
10/2.5 = 4
suggests a 400 percent per-core speedup. This is much more competitive.
While a petabyte of shared RAM sounds amazing, price performance at the entry level is crucial for business that want to grow and for independent developers to polish the open source software needed at all scales.
Hopefully the availability of the Power10 and the end of the epidemic will soon be two good news stories.
]]>They need to learn how to hype for the general public, sadly we engineers aren’t too good at that 🙂
]]>30 cores/12 cores = 2.5
so most of the 3X speedup comes simply from increasing the core count. Seen another way, since
3.0/2.5 = 1.2
it would appear the per-core speedup is only 20 percent. While that would be fine for a one or two year release cycle, it’s been 4 years.
As the single-core performance of Power9 on processor-bound scientific workloads was less than half something like the Xeon Gold 6126 a year ago, it is completely unclear whether the announced 20 percent per-core performance gain of Power10 is much to boast about. I sincerely hope it is, because the IBM Power has many other advantages that would be short changed if the built-in floating-point arithmetic is slow.
In my opinion, the Fujitsu AFX64 used in the Fugaku supercomputer shows that combining high bandwidth with scalable vector instructions all in the CPU core leads to GPU-like levels of performance that is much easier to write code for. This is important, because research science depends on creating new programs to solve new problems using new algorithms.
]]>