Larry
What do you mean by 2 GHz transputer? Same configuration(RAM/links) as (e.g. T800)? 100x the performance (i.e time to do multiply 100x faster)? Although 100x sounds like it should be easy there are some things to take into account. The transputer design used 4-phase clocking - much more happened on a single clock cycle than happens in modern 2-phase design. In other words, you might consider the design to effectively be 2x or so faster than a modern 20MHz design. So we should be thinking about 200x for the circuitry. 4 GHz is fast but the design is small and simple so should work. In terms of the places there might be problems in the microarchitecture, the key thing would be to execute ldnl in 1 ns. (2 x 1 / (2 GHz). This is doable; the Apple A14 does better than this, it manages to perform a similar access to a 128KB L1 data cache with 3 cycle latency at 3 Ghz (i.e. 1 ns). So I think it must be the case that the basic processor/RAM system can be built. I think the system clocking and coms system would take some work. A 2 GHz link for local on-chip communication could be doable - we assume you have a square grid of transputers with local connections. The challenge would be keeping the latency - and hence throughput - as low as it is in the transputer design. Where we communicate between different clock regimes we will incur delays (some number of cycles - and remember an ack packet is only two (link) cycles long.
Speed of light isn’t the issue for circuits on silicon. Signals propagate much slower than this on-chip. [For large systems the problem is that the techniques used to build high-bandwidth systems incur very large latencies - you can build very high bandwidth long-distance interconnects but you face a latency problem.]
I think the problem is that there is little work tackling the combined problem of massively parallel programming and computer architecture together. Roger
|
Attachment:
signature.asc
Description: Message signed with OpenPGP