[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: Transistor count



Hi all,

 

may be someone is interested in the LUT count of a T425C-like FPGA design:

 

(Xilinx - ISE 14.7 – map report) Section 13 - Utilization by Hierarchy (16.02.2020 – XC6SLX45)

                                 -------------------------------------

+-----------------------------------------------------------------------------------------------------------------------------------

| Module                                   | Partition | Slices*       | Slice Reg     | LUTs          | LUTRAM        | BRAM/FIFO |

+-----------------------------------------------------------------------------------------------------------------------------------

| b004_top/                                |           | 12/1316       | 7/1408        | 16/4186       | 0/20          | 8/20      |

| +i_t42_all_top                           |           | 0/1304        | 0/1401        | 0/4170        | 0/20          | 0/12      |

| ++i_t42_cpu_ctrl2data                    |           | 0/550         | 0/354         | 0/1752        | 0/16          | 0/8       |

| +++i_t42_cpu_ctrlpath                    |           | 0/119         | 0/68          | 0/255         | 0/0           | 0/8       |

| ++++i_t42cpu_idecode                     |           | 25/25         | 0/0           | 35/35         | 0/0           | 0/0       |

| ++++i_t42cpu_iptr                        |           | 23/23         | 32/32         | 62/62         | 0/0           | 0/0       |

| ++++i_t42cpu_oreg                        |           | 41/41         | 32/32         | 101/101       | 0/0           | 0/0       |

| ++++i_t42cpu_prefetch                    |           | 30/30         | 4/4           | 57/57         | 0/0           | 0/0       |

| ++++i_t42cpu_ucrom                       |           | 0/0           | 0/0           | 0/0           | 0/0           | 0/8       |

| +++++i_t42cpu_ucode_rom1024x128_spartan3 |           | 0/0           | 0/0           | 0/0           | 0/0           | 8/8       |

| +++i_t42_cpu_datapath                    |           | 3/431         | 0/286         | 3/1497        | 0/16          | 0/0       |

| ++++i_t42cpu_abcdereg                    |           | 225/225       | 161/161       | 557/557       | 0/0           | 0/0       |

| ++++i_t42cpu_alu                         |           | 103/103       | 0/0           | 555/555       | 0/0           | 0/0       |

| ++++i_t42cpu_bytealign                   |           | 76/76         | 60/60         | 304/304       | 0/0           | 0/0       |

| ++++i_t42cpu_constbox                    |           | 0/0           | 0/0           | 0/9           | 0/0           | 0/0       |

| +++++i_t42cpu_constbox_rom32x32          |           | 0/0           | 0/0           | 9/9           | 0/0           | 0/0       |

| ++++i_t42cpu_pipecontrol                 |           | 16/16         | 34/34         | 36/36         | 0/0           | 0/0       |

| ++++i_t42cpu_pointers                    |           | 0/4           | 0/0           | 0/16          | 0/16          | 0/0       |

| +++++i_t42cpu_pointers_spsram32x32       |           | 4/4           | 0/0           | 16/16         | 16/16         | 0/0       |

| ++++i_t42cpu_wptr                        |           | 4/4           | 31/31         | 17/17         | 0/0           | 0/0       |

| ++i_t42_cpu_syspath                      |           | 0/59          | 0/192         | 0/211         | 0/0           | 0/0       |

| +++i_t42cpu_sbits                        |           | 20/20         | 32/32         | 54/54         | 0/0           | 0/0       |

| +++i_t42cpu_syscontrol                   |           | 5/5           | 0/0           | 7/7           | 0/0           | 0/0       |

| +++i_t42cpu_timer                        |           | 34/34         | 160/160       | 150/150       | 0/0           | 0/0       |

| ++i_t42_linkpath                         |           | 3/638         | 0/844         | 7/2017        | 0/0           | 0/0       |

| +++i0_t42link_chanin                     |           | 94/94         | 110/110       | 267/267       | 0/0           | 0/0       |

| +++i0_t42link_chanout                    |           | 63/63         | 100/100       | 199/199       | 0/0           | 0/0       |

| +++i1_t42link_chanin                     |           | 88/88         | 110/110       | 269/269       | 0/0           | 0/0       |

| +++i1_t42link_chanout                    |           | 64/64         | 100/100       | 199/199       | 0/0           | 0/0       |

| +++i2_t42link_chanin                     |           | 81/81         | 110/110       | 270/270       | 0/0           | 0/0       |

| +++i2_t42link_chanout                    |           | 55/55         | 100/100       | 198/198       | 0/0           | 0/0       |

| +++i3_t42link_chanin                     |           | 84/84         | 110/110       | 269/269       | 0/0           | 0/0       |

| +++i3_t42link_chanout                    |           | 53/53         | 100/100       | 196/196       | 0/0           | 0/0       |

| +++i_t42link_chanevent                   |           | 4/4           | 4/4           | 9/9           | 0/0           | 0/0       |

| +++i_t42link_mem_if                      |           | 49/49         | 0/0           | 118/118       | 0/0           | 0/0       |

| +++i_t42link_synclogic                   |           | 0/0           | 0/0           | 16/16         | 0/0           | 0/0       |

| ++i_t42_mempath                          |           | 0/57          | 0/11          | 0/190         | 0/4           | 0/4       |

| +++i_t42mem_decoder                      |           | 10/10         | 0/0           | 22/22         | 0/0           | 0/0       |

| +++i_t42mem_extarbiter                   |           | 21/21         | 0/0           | 57/57         | 0/0           | 0/0       |

| +++i_t42mem_extpocadapt                  |           | 0/5           | 0/7           | 2/20          | 0/4           | 0/0       |

| ++++i_fifo                               |           | 5/5           | 7/7           | 18/18         | 4/4           | 0/0       |

| +++i_t42mem_intarbiter                   |           | 20/20         | 0/0           | 50/50         | 0/0           | 0/0       |

| +++i_t42mem_intern                       |           | 0/0           | 4/4           | 1/1           | 0/0           | 0/4       |

| ++++i_t42mem_intern_dpsram2kx32_spartan3 |           | 0/0           | 0/0           | 0/0           | 0/0           | 4/4       |

| +++i_t42mem_rdddist                      |           | 1/1           | 0/0           | 40/40         | 0/0           | 0/0       |

| +main_pll                                |           | 0/0           | 0/0           | 0/0           | 0/0           | 0/0       |

+-----------------------------------------------------------------------------------------------------------------------------------

* Slices can be packed with basic elements from multiple hierarchies.

  Therefore, a slice will be counted in every hierarchical module

  that each of its packed basic elements belong to.

** For each column, there are two numbers reported <A>/<B>.

   <A> is the number of elements that belong to that specific hierarchical module.

   <B> is the total number of elements from that hierarchical module and any lower level

   hierarchical modules below.

*** The LUTRAM column counts all LUTs used as memory including RAM, ROM, and shift registers.

 

 

One T42 requires about ~4000 LUTs … half for CPU and half for Link-DMAs (Note: in both tables 4x SerDes are still missing).

The overall LUT count of a “from scratch” T42 (specification based) design matches quite well with Claus Meder’s T425C design (reengineered from MicroCode).

 

Here is another (only top of) table from a synthesis trial for 8x T42 cores w/ 2kB Cache each (Note: byte parallel connection of links, uCROM synth. in LUT-RAM):

 

 

(Xilinx - Vivado 2019.2 - synthesis report) Report Instance Areas: (30.08.2020 - XC7A100T)

                                            ---------------------

+------+------------------------------------------------+----------------------------------------+------+

|      |Instance                                        |Module                                  |Cells |

+------+------------------------------------------------+----------------------------------------+------+

|1     |top                                             |                                        | 66235|

|2     |  \gArbitration.i_mem_timeslice_arbiter         |mem_timeslice_arbiter                   |   544|

|3     |    tag_fifo                                    |fifo_cc_got                             |    45|

|4     |  \gCore[0].i_t42_core                          |t42_core__xdcDup__1                     |  8078|

|5     |    i_t42_all_top                               |t42_all_top_1685                        |  5661|

|6     |      i_t42_cpu_ctrl2data                       |t42_cpu_ctrl2data_1899                  |  2241|

|7     |        i_t42_cpu_ctrlpath                      |t42_cpu_ctrlpath_1920                   |  1600|

|8     |          i_t42cpu_iptr                         |t42cpu_iptr_1927                        |    48|

|9     |          i_t42cpu_oreg                         |t42cpu_oreg_1928                        |   194|

|10    |          i_t42cpu_prefetch                     |t42cpu_prefetch_1929                    |   180|

|11    |          i_t42cpu_ucrom                        |t42cpu_ucrom_1930                       |  1178|

|12    |            i_t42cpu_ucode_rom1024x128_spartan3 |t42cpu_ucode_rom1024x128_spartan3_1931  |  1178|

|13    |        i_t42_cpu_datapath                      |t42_cpu_datapath_1921                   |   641|

|14    |          i_t42cpu_abcdereg                     |t42cpu_abcdereg_1922                    |   438|

|15    |          i_t42cpu_alu                          |t42cpu_alu_1923                         |     9|

|16    |          i_t42cpu_bytealign                    |t42cpu_bytealign_1924                   |   126|

|17    |          i_t42cpu_pipecontrol                  |t42cpu_pipecontrol_1925                 |    37|

|18    |          i_t42cpu_wptr                         |t42cpu_wptr_1926                        |    31|

|19    |      i_t42_cpu_syspath                         |t42_cpu_syspath_1900                    |   408|

|20    |        i_t42cpu_sbits                          |t42cpu_sbits_1917                       |    98|

|21    |        i_t42cpu_syscontrol                     |t42cpu_syscontrol_1918                  |     5|

|22    |        i_t42cpu_timer                          |t42cpu_timer_1919                       |   305|

|23    |      i_t42_linkpath                            |t42_linkpath_1901                       |  2142|

|24    |        i0_t42link_chanin                       |t42link_chanin_1907                     |   256|

|25    |        i0_t42link_chanout                      |t42link_chanout_1908                    |   292|

|26    |        i1_t42link_chanin                       |t42link_chanin__parameterized0_1909     |   257|

|27    |        i1_t42link_chanout                      |t42link_chanout__parameterized0_1910    |   242|

|28    |        i2_t42link_chanin                       |t42link_chanin__parameterized1_1911     |   282|

|29    |        i2_t42link_chanout                      |t42link_chanout__parameterized1_1912    |   250|

|30    |        i3_t42link_chanin                       |t42link_chanin__parameterized2_1913     |   289|

|31    |        i3_t42link_chanout                      |t42link_chanout__parameterized2_1914    |   254|

|32    |        i_t42link_chanevent                     |t42link_chanevent_1915                  |    19|

|33    |        i_t42link_mem_if                        |t42link_mem_if_1916                     |     1|

|34    |      i_t42_mempath                             |t42_mempath_1902                        |   145|

|35    |        i_t42mem_extpocadapt                    |t42mem_extpocadapt_1903                 |    24|

|36    |          i_fifo                                |fifo_cc_got_small_1906                  |    24|

|37    |        i_t42mem_intern                         |t42mem_intern_1904                      |   121|

|38    |          i_t42mem_intern_dpsram2kx32_spartan3  |t42mem_intern_dpsram2kx32_spartan3_1905 |   108|

|39    |    \i_t42mem_extern_poc.gCCFIFO.i_ccfifo       |fifo_ic_mem__xdcDup__1                  |   464|

|40    |      \blk1to2.fifo                             |fifo_ic_got__8                          |   253|

|41    |      \blk2to1.fifo                             |fifo_ic_got__parameterized0__8          |   210|

|42    |    \i_t42mem_extern_poc.gCache.i_cache_mem     |cache_mem_1686                          |  1953|

|43    |      cache_cpu_inst                            |cache_cpu_1687                          |  1542|

|44    |        cache_inst                              |cache_par2_1689                         |  1498|

|45    |          TU                                    |cache_tagunit_par_1690                  |  1353|

|254   |      \g2.req_fifo                              |fifo_glue_1688                          |   410|

+------+------------------------------------------------+----------------------------------------+------+

 

Summary:  each full Link requires about ~500 LUTs + 50 LUTs for SerDes (<= many thanks to Claus for the SerDes LUT count!).

 

Best regards,

Uwe

_____________

 

Uwe Mielke

Karl-Marx-Str.55

D-01109 Dresden

Mobil +49 (0)176 6220.4565

Office +49 (0) 351 886.2923

Home +49 (0) 351 8116.184

uwe.mielke@xxxxxxxxxxxx

Skype: longbow57


Von: occam-com-request@xxxxxxxxxx [mailto:occam-com-request@xxxxxxxxxx] Im Auftrag von Tony Gore
Gesendet: Freitag, 20. November 2020 20:28
An: Larry Dickson; Øyvind Teig
Cc: Roger Shepherd; David May; occam-com; Michael Bruestle; Transputer TRAM
Betreff: RE: Transistor count

 

Hi all

 

The T800 was 100mm2 in 3 micron technology. So the current bleeding edge is around 500 smaller, so you could get approx. 25,000 T800s on a chip of the same size today. Except you couldn’t in reality, because of the interconnect required. Let’s assume more layers of interconnect, and some more memory (say 32k), and you would be looking at 1,000 – 10,000. The T800 clocked at 20MHz for 10 MIPs and 10 MFlops. So take a clock rate of 2GHz, and the performance goes up 100X. So very roughly a T800 array taking the same size silicon would have 10,000 x 100 = 1 million times the raw performance coming in at a meaningless 10 teraFLOPS, and two and half B042 boards would reach a petaflop. Not that it would do much useful other than ray tracing or the Mandelbrot set.

 

People are playing around with huge amounts of simple processing embedded in the memory for some ML and AI applications and getting some great performance. There are also some new devices/building blocks around that bear a passing resemblance to the Inmos A100 as well, or at least on a cursory look there were a few familiar looking concepts in them.

 

Mind blowing really to see how performance has increased in these few decades.

 

 

Tony Gore

 

Aspen Enterprises Limited email  tony@xxxxxxxxxxxx

tel +44-1278-769008  GSM +44-7768-598570 URL:

 

Registered in England and Wales no. 3055963 Reg.Office Aspen House, Burton Row, Brent Knoll, Somerset TA9 4BW.  UK

 

 

 

From: Larry Dickson <tjoccam@xxxxxxxxxxx>
Sent: 20 November 2020 00:57
To: Øyvind Teig <oyvind.teig@xxxxxxxxxxx>
Cc: Roger Shepherd <rog@xxxxxxxx>; Tony Gore <tony@xxxxxxxxxxxx>; David May <David.May@xxxxxxxxxxxxx>; occam-com <occam-com@xxxxxxxxxx>; Michael Bruestle <michael_bruestle@xxxxxxxxx>; Transputer TRAM <claus.meder@xxxxxxxxxxxxxx>
Subject: Re: Transistor count

 

Wow, this is a fantastic response! I had no idea there was still so much interest lurking

around. 6000-8000 seems to be the consensus and I must point out that I was

just guessing when I said 6000 - not going from real knowledge like so many

of you.

 

In any case, all these communications put together really put us in the picture.

Now we turn to the fact that Nvidia has unveiled (last May) the A100 chip with 54

billion transistors and over 10,000 GPU (AI) cores. But with our back-of-the-envelope

dreaming we could imagine 180,000 of Tony Gore's T800s in a chip with the same

transistor count . . . clocked up to modern standards . . . climate models, anyone?

 

Larry

 

On Nov 19, 2020, at 2:47 PM, Øyvind Teig <oyvind.teig@xxxxxxxxxxx> wrote:

 

I told about this mail til Claus Meder (claus.meder@xxxxxxxxxxxxxx - added to this thread). I knew he had some info that might interest you. Here is his response to me:

 

.- - - CLAUS MEDER START - - - 

 

Hi Øyvind,

 

Sounds interesting. I did not even know that there is still such a large community out there. If you like you can forward my mail address and the following short summary about my FPGA Transputer hobby to the interested people.

 

Short summary of my projects current status:

 

I'm using a Digilent Arty A7-100 Board. The Xilinx Vivado Tool is able to map into the Artix 7-100 chip:

  • 14 fully featured T425C with a limited number of links per transputer
  • 14 16kbytes 2 way set associative cache for each T425C as a port to external memory which is 16MB for each node of the signle 256MB DDR3 chip on the board.
  • one Xilinx MIG core for DDR3 access, connected to all the caches, access prioritized by a round-robin arbiter
  • a graphics core for VGA output 1600x1200 32 or 8 bit per pixel
  • a Ethernet MII interface 10/100 mbit which can be accessed by the root transputer only, I ported LWIP to the good old ANSI-C compiler thus having the Transputer running a Web-Server
  • the root Transputer got one link to the host which is fully compatible with the old 20Mbit Links. I connected the very nice LINKUSB which was designed and built by Mike. The rest of the Links are running at CPU clock speed with the original serial protocol consuming 11bits per byte.
  • Performance-Counters for each Transputer to get more insight which are the bottlenecks of the design

A little bit of background how I achieved it: I started my project at the end of 2018 inspired by my discovery of the Microcode ROM dump from a T425C on Gavin's web-site. I started decoding the meaning of the bits. First on a spreadsheet and later with the help of an emulator written in C. Thus I was able to bring in my ideas about how the brilliant Inmos designers might have solved their problems following the basic principle of keeping everything simple as possible. Now I can state they did a really good job. It is really not a very complex design. Many things are solved in very clever way avoiding spending too much transistors for the function needed.

End of 2019 I had enough insight how to design the T425C in FPGA technology. Unlike the original design I decided for a single clock fully synchronous design. Implementation took half a year. Testing and bug fixing took until early Summer this year. Since then I did some ANSI-C and OCCAM programming on my small "super computer". As everyone I wrote a distributed Mandelbrot calculation with my own router processes. Currently I'm running the old flight simulator and because the source is available I'm enhancing it.

A big thank you to Mike who supported me with all his great knowledge about Transputers and was a very good partner in discovering some secrets of the design.

I'm sure over the Christmas period I can write some more documentation which is always a burden for me because I'm fully satisfied if I understood and solved a problem.

To feed the discussion about resources, here is what my latest Vivado run states (synthesis and implementation set to default strategy).

<gffedllakccamkff.png>

 

Interested in clock speeds? The CPU core achieves around 80MHz for the xc7a100tcsg324-1 device. With the FPGA flooded by T425Cs it drops to 70+MHz. Luckily my Arty board can be over-clocked. The design is currently running at a 120MHz clock speed (WNS around -5ns). I tried to ask Xilinx which part I really have but unfortunately they did not grant me access to their 2D Marking Application Lounge :-(

Here is the rspy output:

rspy -l
   # Part  rt Link0 Link1 Link2 Link3
   0 T425C120 1736K   ...   10M   ...
   1 T425C120   10M   10M   10M   10M
   2 T425C120   10M   10M   ...   ...
   3 T425C120   10M   10M   ...   ...
   4 T425C120   10M   10M   ...   ...
   5 T425C120   10M   10M   ...   ...
   6 T425C120   10M   10M   ...   ...
   7 T425C120   10M   10M   ...   ...
   8 T425C120   10M   10M   ...   ...
   9 T425C120   10M   10M   10M   ...
  10 T425C120   10M   10M   10M   ...
  11 T425C120   10M   10M   10M   ...
  12 T425C120   10M   ...   10M   ...
  13 T425C120   10M   10M   10M   10M

The configuration is for running the flight simulator thus the four link connection on the last node which is the graphics node.

 

Please feel free to contact me.

 

   Claus

 

- - - CLAUS MEBER END - - - 

 

Øyvind

 

 

19. nov. 2020 kl. 20:28 skrev Øyvind Teig <oyvind.teig@xxxxxxxxxxx>:

 

Guys

 

 

I guess the microcode is of little help? http://transputer.net/iset/iset.asp

 

 

But the good memory plus back-of-the-envelope calculations should hit by a factor of.. better than 10?

 

(I added Michael Brüstle at transputer.net to this mail list)

 

Øyvind 

 

19. nov. 2020 kl. 19:54 skrev Roger Shepherd <rog@xxxxxxxx>:

 

If I recall the T4 was 25% RAM, 25% processor. 25% link and 25% other - by area (things like pads take up a lot of space but not many transistors). The RAM is much more transistor dense than the other blocks. The link block (4 bidirectional links and the event channel) is significantly less dense - the ‘register’ part is CPU like but the actually shift registers and synchronisers are very non-dense. So, perhaps 25% of the density of RAM overall. Now, doing the measurement on my photograph, it looks like 4 links occupy 2/3rds the area of the RAM which gives

 

200k * 25% * 2/3 = 33k transistor for 4 links = 8k per link (which is in line with your estimate below). I suspect your estimate is nearer to the truth than mine.

 

But in assessing anything you need to consider that the transistor count is affected by word size (two words of buffer, one word of address) and control of the interconnect to route bytes into the buffer etc. 

 

Roger

 

On 19 Nov 2020, at 17:55, Larry Dickson <tjoccam@xxxxxxxxxxx> wrote:

 

Hi Tony, David and all,

 

Does anyone remember how many transistors are in a link? We are

gathering information on transistor efficiency; now Tony's numbers

indicate floating point costs about 50,000, and David on memory

indicates 4KB costs about 200,000. 25,000 for CPU and 25,000 for

links would indicate 6000 per link, but that is just a guess and I

could be way off.

 

As you may be guessing, I am imagining an eight-link Transputer!

Long ago, in my PDPTA'96 Roadmap paper, I calculated "burden

bandwidth" for a one-direction link communication using Forrest

Crowell and Neal Elzenga's published measurements, and got

37MB/s, same whether links were running one-way or both-ways,

and per-unidirectional-communication timing bandwidth of 1.2 MB/s

when running both ways. This means by extrapolation that eight

links running full speed both ways would be supportable (reducing

CPU speed by 52% due to DMA burden).

 

Everything can be mapped into modern cores and communications

(e.g. Manchester code lanes); the principle stays the same.

 

Larry

 

On Mar 18, 2019, at 9:59 PM, Tony Gore <tony@xxxxxxxxxxxx> wrote:

 

Hi Larry

As I recall, T414 was about 250,000 and the T800 was 300,000.

Tony Gore

Tony Gore

+44 7768 598570

 


From: occam-com-request@xxxxxxxxxx <occam-com-request@xxxxxxxxxx> on behalf of Larry Dickson <tjoccam@xxxxxxxxxxx>
Sent: Tuesday, March 19, 2019 1:06:42 AM
To: Occam Family
Subject: Transistor count

 

All,

How many transistors does a Transputer have (e.g. of the T2 or T4 family)? I have heard a wide range numbers from 27,000 to 200,000, but am having trouble finding an authoritative reference.

Larry

 

 

 

Øyvind TEIG 
+47 959 615 06
oyvind.teig@xxxxxxxxxxx
https://www.teigfam.net/oyvind/home
(iMac)

 

 

 

 

 

Øyvind TEIG 
+47 959 615 06
oyvind.teig@xxxxxxxxxxx
https://www.teigfam.net/oyvind/home
(iMac)