Re: Transistor count

On 8 Dec 2020, at 23:18, Jones, Chris C (UK Warton) <chris.c.jones@xxxxxxxxxxxxxx> wrote:

This email may contain proprietary information of BAE Systems and/or third parties.

Larry,

Not all of us have given up exactly, but there are so many barriers to achieving massive parallelism. I have to pack lots of processing into each core to hide the message passing, then I need loads of cache ram to make the amount of work efficient, that means I cannot efficiently make use of all the cores available on each processor – we run on just half most of the time and just waste the rest.

Yes. This is fundamental. To hide latency you need “excess parallelism”. That excess parallelism has to be supported by RAM. As RAM gets larger it gets slower. In the case of caches, moving from L1 to L2 on a really well design processor we see a 4-5x increase in latency. And L3 will be around another 10-20x. And, of course, you need to pay (£/€/$) for that extra memory.

That’s why the message passing latency needs to be brought down.

Thus, for our types of problem, I can find the sweet spot in terms of the number of cores and processors for a particular case, which in most cases is between 500 and 1800 cores. Below the sweet spot the code runs linearly more slowly, at the sweet spot a single run might take up to 10 days or more, but above the sweet spot, the code runs progressively more slowly turning over remarkably quickly to, not just diminishing returns, but catastrophic slowing down so that at about 2000 cores we might as well run on a big workstation.

This means that I can increase throughput by running more cases simultaneously provided I have many cases to run. What I cannot do is to run the problems faster, and that is what I need. I was offered a computer as big as a block of flats that other day (in jest, I should add) but I had to say that it would still not make the code run faster. That is the tragedy of present day parallel computing and HPC. Is this why the term supercomputing seems to have been dropped?

As you say, the challenge is to run one problem quickly. With two problems use two computers and it goes twice as fast, etc.

By the way, I am still running and “embarrassingly parallel” code.

But there is no new understanding here. The fundamentals were understood (by some people) and documented in the 1990s.

Roger

Regards,
Chris

Prof. Christopher C R Jones BSc. PhD C.Eng. FIET
BAE Systems Engineering Fellow
EMP Fellow of the Summa Foundation
Principal Technologist – Electromagnetics                <image001.jpg>

Military Air & Information                                                 ( Direct:   +44 (0) 3300 477425
Electromagnetic Engineering, W423A                           ( Mobile: +44 (0)7855 393833
Engineering Integrated Solutions                                  7 Fax:      +44 (0)1772 8 55262
Warton Aerodrome                                                           * E-mail:   chris.c.jones@xxxxxxxxxxxxxx
Preston                                                                              : Web:      www.baesystems.com
PR4 1AX

BAE Systems (Operations) Limited
Registered Office: Warwick House, PO Box 87, Farnborough Aerospace Centre, Farnborough, Hants, GU14 6YU, UK
Registered in England & Wales No: 1996687
Exported from the United Kingdom under the terms of the UK Export Control Act 2002 (DEAL No 8106)