Comments on: IBM Brings An Architecture Gun To A Chip Knife Fight

By: Eric Olson

Mon, 24 Aug 2020 16:49:13 +0000

@jimmy I think you may be right about the slides.

I had another look and it appears the 512bit vector engine does support double precision, which I didn’t see before. There is also another slide that mentions a 10x improvement over Power9 on Linpack.

The same calculation as before to correct for the differing number of cores

10/2.5 = 4

suggests a 400 percent per-core speedup. This is much more competitive.

While a petabyte of shared RAM sounds amazing, price performance at the entry level is crucial for business that want to grow and for independent developers to polish the open source software needed at all scales.

Hopefully the availability of the Power10 and the end of the epidemic will soon be two good news stories.

]]>

By: jimmy

Thu, 20 Aug 2020 08:02:44 +0000

wow their slides are so bad (== good for engineers), Nvidia is really good at hyping the smallest upgrades to their architecture, here similar upgrades are mentioned in a bloated text section.

They need to learn how to hype for the general public, sadly we engineers aren’t too good at that 🙂

]]>

By: Eric Olson

Thu, 20 Aug 2020 04:25:50 +0000

I’m looking at the official IBM press release in which it details the footnote on comparing Power10 to Power9 with the phrase “3X performance is based upon pre-silicon engineering analysis of Integer, Enterprise and Floating Point environments on a POWER10 dual socket server offering with 2×30-core modules vs POWER9 dual socket server offering with 2×12-core modules.” My reasoning says

30 cores/12 cores = 2.5

so most of the 3X speedup comes simply from increasing the core count. Seen another way, since

3.0/2.5 = 1.2

it would appear the per-core speedup is only 20 percent. While that would be fine for a one or two year release cycle, it’s been 4 years.

As the single-core performance of Power9 on processor-bound scientific workloads was less than half something like the Xeon Gold 6126 a year ago, it is completely unclear whether the announced 20 percent per-core performance gain of Power10 is much to boast about. I sincerely hope it is, because the IBM Power has many other advantages that would be short changed if the built-in floating-point arithmetic is slow.

In my opinion, the Fujitsu AFX64 used in the Fugaku supercomputer shows that combining high bandwidth with scalable vector instructions all in the CPU core leads to GPU-like levels of performance that is much easier to write code for. This is important, because research science depends on creating new programs to solve new problems using new algorithms.

]]>

By: Brett

Wed, 19 Aug 2020 20:23:54 +0000

Good analysis! Power10 did have an internal code name, but you’d be forgiven for being confused by it. The code name is “Power Ten”, which is apparently a term used in rowing when the rowers go all out for ten strokes.

]]>

By: joseph roth

Wed, 19 Aug 2020 02:55:07 +0000

amazong chip! the new memkey sharing capability is going to be a game changer!

]]>