[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: WoTUG
Larry,
> > Subject: Re: WoTUG
> > To: tjoccam@xxxxxxxxxxxxx (Lawrence Dickson)
> > Date: Wed, 27 Oct 1999 21:12:31 +0100 (BST)
> > From: "Roger Peel" <R.Peel@xxxxxxxxxxxxxxx>
> >
> > Larry,
> >
> > > Off the subject: someone mentioned Alpha links. Any details... ?
> >
> > I put this on an OHP slide for my final year Comp Arch class :
> >
> > 1.6GHz Alpha to be fastest Quake chip on planet
> >
> > <snip>
> >
> > The 21364 integrates CPU, level two cache, direct memory controller and a
> > transputer-class multiprocessing connection of each processor to four
> > others at 10 Gbytes/s interprocessor bandwidth,
>
> (a) Does this mean 10 Gbytes/s per LINK, or on all four links combined?
Apparently, all four combined. See
http://www.digital.com/alphaoem/present/sld018.htm
I cannot find many useful references to this processor.
> (b) How many pins per link? See my comments, below.
> (c) What interprocessor distance is supported?
I have no information on these.
> (d) What about latency?
15ns (see Web reference above). There appears to be a network and
router model, too. The integration of communication and scheduling
instructions isn't obviously mentioned.
> > > We need a new Transputer.
> >
> > Barry Cook and I are looking at how much more work needs to be done to
> > our Occam-to-FPGA compiler to build one.
> >
> > Have you seen http://www.ee.surrey.ac.uk/Personal/R.Peel/wotug22.ps ?
>
> Now I have. I downloaded it and it is GREAT! I noticed it pointed toward
> dates already past - I'm out of touch it seems - did you finish the
> Portakit processor? Why this is great, see below...
We ran into a few structural issues within the compiler that provoked a
re-design and re-implementation of some sections. These are now nearly
complete. Along the way, we have understood a number of low-level
FPGA-circuit implementation details a little better, and have worked
through a number of other FPGA designs. In particular, flip-flops and CLBs
were being eaten up too fast as designs grew, and we needed to re-think
what optimisation could be done, and where in the compiler to locate it.
> I got on board after occam 1; my handbooks were the occam 2 reference
> manual (1/4" thick) and the Transputer instruction set compiler writer's
> guide (also 1/4" thick). Also the B008 description which gave a solid
> hardware foundation for extending occam to the PC, never exploited.
>
> Now we need the occam 1 reference manual and the Portakit instruction set
> (39 integer instructions)! With miniaturization, raw CPU power increases
> as the cube. Pin count increases as the square, and edge connections
> increase only linearly. So the hangups are increasingly in connectivity.
> The Portakit moves away from the T9000 in the direction of tremendously
> successful Microchip Technologies, whose PIC processors specialize in
> driving individual pins to do a great variety of things. (PICs have
> about 35 instructions.)
Occam 1 (or proto-occam) was a very restricted language for 'real'
purposes - although it was fine for understanding and experimenting
with process concurrency.
Imagine Occam 2 without :
* data types other than INT
* protocols
* multi-dimensional arrays
* FUNCTIONs
* all interactions of these on every other language feature
... and you'll have Occam 1.
The Portakit architecture does not really have the elegance of the
later transputer instruction set. In particular, it does not have an
internal execution stack, but a handful of (non-orthogonally-addressed)
registers. The ALT instruction is rather different, too. We're moving
away from this processor back to more transputer-like designs. These
are why we are looking at further FPGA code optimisation techniques.
> Fitting your separate memory model, PICs mate with tremendously popular
> serial memory (some uses only one wire!) - very slow but it saves those
> valuable pins. Nothing comes anywhere near OS links - fast, 2 pins (so
> it has a big advantage over DS links too). Think of the connectivity
> required for sensors and control of a car or airplane. Timing is not so
> demanding, being human scale, but robustness under complex IO is
> totally required. PICs, as normally programmed, fail THAT test; only
> occam passes it.
The Para-PC architecture, as currently described, is likely to be
deficient in processor-to-memory bandwidth. Using fast, narrow memory
channels (e.g. of RAMBUS performance) and a very fast hub/router might
be more realistic in comparison with current microprocessors. The
Para-PC's clean multiprocessor architecture remains very attractive, of
course.
> Finally there's cost. Transputers might have referred to transistor-like
> arrays but they cost a lot more than transistors. Maybe the Portakit
> could bridge that gap at last. Define printers and cars and washing
> machines using occam and who cares if we have to deal with four bit
> integers.
Transputers may have been sold at prices rather larger than their
contemporary competitor products, but they only comprised tens of
thousands of gates (compared with around ten million gates in some
modern processors). Apart from the packaging cost (which reduces using
modern packages), surely they could be manufactured very cheaply
nowadays ...
What are your particular applications? Would it be possible to
build yourself a dedicated controller on FPGAs?
Roger.