[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Transistor count



I told about this mail til Claus Meber (claus.meder@xxxxxxxxxxxxxx - added to this thread). I knew he had some info that might interest you. Here is his response to me:

.- - - CLAUS MEBER START - - - 

Hi Øyvind,

Sounds interesting. I did not even know that there is still such a large community out there. If you like you can forward my mail address and the following short summary about my FPGA Transputer hobby to the interested people.

Short summary of my projects current status:

I'm using a Digilent Arty A7-100 Board. The Xilinx Vivado Tool is able to map into the Artix 7-100 chip:
  • 14 fully featured T425C with a limited number of links per transputer
  • 14 16kbytes 2 way set associative cache for each T425C as a port to external memory which is 16MB for each node of the signle 256MB DDR3 chip on the board.
  • one Xilinx MIG core for DDR3 access, connected to all the caches, access prioritized by a round-robin arbiter
  • a graphics core for VGA output 1600x1200 32 or 8 bit per pixel
  • a Ethernet MII interface 10/100 mbit which can be accessed by the root transputer only, I ported LWIP to the good old ANSI-C compiler thus having the Transputer running a Web-Server
  • the root Transputer got one link to the host which is fully compatible with the old 20Mbit Links. I connected the very nice LINKUSB which was designed and built by Mike. The rest of the Links are running at CPU clock speed with the original serial protocol consuming 11bits per byte.
  • Performance-Counters for each Transputer to get more insight which are the bottlenecks of the design

A little bit of background how I achieved it: I started my project at the end of 2018 inspired by my discovery of the Microcode ROM dump from a T425C on Gavin's web-site. I started decoding the meaning of the bits. First on a spreadsheet and later with the help of an emulator written in C. Thus I was able to bring in my ideas about how the brilliant Inmos designers might have solved their problems following the basic principle of keeping everything simple as possible. Now I can state they did a really good job. It is really not a very complex design. Many things are solved in very clever way avoiding spending too much transistors for the function needed.

End of 2019 I had enough insight how to design the T425C in FPGA technology. Unlike the original design I decided for a single clock fully synchronous design. Implementation took half a year. Testing and bug fixing took until early Summer this year. Since then I did some ANSI-C and OCCAM programming on my small "super computer". As everyone I wrote a distributed Mandelbrot calculation with my own router processes. Currently I'm running the old flight simulator and because the source is available I'm enhancing it.

A big thank you to Mike who supported me with all his great knowledge about Transputers and was a very good partner in discovering some secrets of the design.

I'm sure over the Christmas period I can write some more documentation which is always a burden for me because I'm fully satisfied if I understood and solved a problem.

To feed the discussion about resources, here is what my latest Vivado run states (synthesis and implementation set to default strategy).


Interested in clock speeds? The CPU core achieves around 80MHz for the xc7a100tcsg324-1 device. With the FPGA flooded by T425Cs it drops to 70+MHz. Luckily my Arty board can be over-clocked. The design is currently running at a 120MHz clock speed (WNS around -5ns). I tried to ask Xilinx which part I really have but unfortunately they did not grant me access to their 2D Marking Application Lounge :-(
Here is the rspy output:
rspy -l
   # Part  rt Link0 Link1 Link2 Link3
   0 T425C120 1736K   ...   10M   ...
   1 T425C120   10M   10M   10M   10M
   2 T425C120   10M   10M   ...   ...
   3 T425C120   10M   10M   ...   ...
   4 T425C120   10M   10M   ...   ...
   5 T425C120   10M   10M   ...   ...
   6 T425C120   10M   10M   ...   ...
   7 T425C120   10M   10M   ...   ...
   8 T425C120   10M   10M   ...   ...
   9 T425C120   10M   10M   10M   ...
  10 T425C120   10M   10M   10M   ...
  11 T425C120   10M   10M   10M   ...
  12 T425C120   10M   ...   10M   ...
  13 T425C120   10M   10M   10M   10M


The configuration is for running the flight simulator thus the four link connection on the last node which is the graphics node.

Please feel free to contact me.

   Claus

- - - CLAUS MEBER END - - - 

Øyvind


19. nov. 2020 kl. 20:28 skrev Øyvind Teig <oyvind.teig@xxxxxxxxxxx>:

Guys


I guess the microcode is of little help? http://transputer.net/iset/iset.asp


But the good memory plus back-of-the-envelope calculations should hit by a factor of.. better than 10?

(I added Michael Brüstle at transputer.net to this mail list)

Øyvind 

19. nov. 2020 kl. 19:54 skrev Roger Shepherd <rog@xxxxxxxx>:

If I recall the T4 was 25% RAM, 25% processor. 25% link and 25% other - by area (things like pads take up a lot of space but not many transistors). The RAM is much more transistor dense than the other blocks. The link block (4 bidirectional links and the event channel) is significantly less dense - the ‘register’ part is CPU like but the actually shift registers and synchronisers are very non-dense. So, perhaps 25% of the density of RAM overall. Now, doing the measurement on my photograph, it looks like 4 links occupy 2/3rds the area of the RAM which gives

200k * 25% * 2/3 = 33k transistor for 4 links = 8k per link (which is in line with your estimate below). I suspect your estimate is nearer to the truth than mine.

But in assessing anything you need to consider that the transistor count is affected by word size (two words of buffer, one word of address) and control of the interconnect to route bytes into the buffer etc. 

Roger

On 19 Nov 2020, at 17:55, Larry Dickson <tjoccam@xxxxxxxxxxx> wrote:

Hi Tony, David and all,

Does anyone remember how many transistors are in a link? We are
gathering information on transistor efficiency; now Tony's numbers
indicate floating point costs about 50,000, and David on memory
indicates 4KB costs about 200,000. 25,000 for CPU and 25,000 for
links would indicate 6000 per link, but that is just a guess and I
could be way off.

As you may be guessing, I am imagining an eight-link Transputer!
Long ago, in my PDPTA'96 Roadmap paper, I calculated "burden
bandwidth" for a one-direction link communication using Forrest
Crowell and Neal Elzenga's published measurements, and got
37MB/s, same whether links were running one-way or both-ways,
and per-unidirectional-communication timing bandwidth of 1.2 MB/s
when running both ways. This means by extrapolation that eight
links running full speed both ways would be supportable (reducing
CPU speed by 52% due to DMA burden).

Everything can be mapped into modern cores and communications
(e.g. Manchester code lanes); the principle stays the same.

Larry

On Mar 18, 2019, at 9:59 PM, Tony Gore <tony@xxxxxxxxxxxx> wrote:

Hi Larry

As I recall, T414 was about 250,000 and the T800 was 300,000.

Tony Gore

Tony Gore
+44 7768 598570


From: occam-com-request@xxxxxxxxxx <occam-com-request@xxxxxxxxxx> on behalf of Larry Dickson <tjoccam@xxxxxxxxxxx>
Sent: Tuesday, March 19, 2019 1:06:42 AM
To: Occam Family
Subject: Transistor count
 
All,

How many transistors does a Transputer have (e.g. of the T2 or T4 family)? I have heard a wide range numbers from 27,000 to 200,000, but am having trouble finding an authoritative reference.

Larry





Øyvind TEIG 
+47 959 615 06
oyvind.teig@xxxxxxxxxxx
https://www.teigfam.net/oyvind/home
(iMac)