[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: WoTUG



All,

Sorry to be so slow to reply. Heavy work schedule. There are some
important directions here, I think.

On Mon, 1 Nov 1999, Denis A Nicole wrote:

> On Fri, 29 Oct 1999, Lawrence Dickson wrote:
>
> > Now we need the occam 1 reference manual and the Portakit instruction set
> > (39 integer instructions)! With miniaturization, raw CPU power increases
> > as the cube. Pin count increases as the square, and edge connections
> > increase only linearly. So the hangups are increasingly in connectivity.
> > The Portakit moves away from the T9000 in the direction of tremendously
> > successful Microchip Technologies, whose PIC processors specialize in
> > driving individual pins to do a great variety of things. (PICs have
> > about 35 instructions.)
>
> We can generate code for PIC from SPoC.

(a) Which PICs? They come in a great variety, from really stripped-down to
"C-capable" 80186-class heavyweights.  (b) Are link analogues between
different processors defined, and if so how? ALTing too?  (c) What is the
host system (on which the compiling is done)?  (d) How is code loaded
and run?

On Mon, 1 Nov 1999, Roger Peel wrote:

> Transputers may have been sold at prices rather larger than their
> contemporary competitor products, but they only comprised tens of
> thousands of gates (compared with around ten million gates in some
> modern processors).  Apart from the packaging cost (which reduces using
> modern packages), surely they could be manufactured very cheaply
> nowadays ...

That is definitely exactly what I wanted to hear.

> What are your particular applications?  Would it be possible to
> build yourself a dedicated controller on FPGAs?

General answer: whatever is a disaster with current overcentralized
computing techniques.

(1) Last week at work, a vendor representative for a hardware RAID board
manufacturer said that they are moving the control of such boards to "in
band" SCSI because RS232 type serial can't handle the bandwidth needed to
control 100 or more disks as will be necessary. This despite the fact
- which he admitted - that "in band" is terrible design since control
dies when data function dies! Not to mention the ill-documented buffering
that makes it almost impossible to synchronize controlled changes with
ongoing data flow.

Two-wire out-of-band OS links are hundreds of times as fast as RS232.
Transputers are perfectly designed to handle data flows including error
and partial reset propagation. I hear that "in band" is totally taking
over in this kind of application. We should be eating them alive (with
fifteen year old, mature technology).

(2) Software data flow solutions - a whole alphabet soup; RAID (redundant
array of inexpensive disks), NAS (network-attached storage), SAN (storage
area network), etc. Billions of dollars are being spent on these.
They are ALL variations on a single theme: controlled data flow
with delays due to buffering, compression, redundancy parity algorithms,
data stream sharing and multiple simultaneous consumers/producers. They
are all technically shaky because of the inherent contradiction between
delayed data flow and instant error reporting (or other state change), the
primitive default of languages like C++ and Java as well as C. Result:
massive proliferation not only of code but of standards (SCSI, FibreChannel,
many Ethernet-based flavors) none of which quite works right. A new bulky
software project every six months, none of which is ever completed.

You have to encapsulate the disaster areas and force propagation of state
using provable occam state machines. We have years of research and software
tools to do this, and the hardware too - including the "transputer class"
alphas at the top end, if they work. And we can replace the "alphabet soup"
gobbledegook with a single, transparent class of hardware and programming
scalable any way.

(3) Robotics - sensor and control. With voice IO if possible. How big is
the automotive market, the home appliance market, etc? Don't be proud, just
grab a humongous voice processor to be your front end and put your itty
bitty PICs or Portakits in a network of any complexity behind. To do the
complex control stuff that we can do easily, that everyone else has given
up on because they now "know" it can't be done reliably. The difficulty is
with the little guys in back, not the big talker in front (which is getting
all the attention in development).

(4) A general task that lies behind all the above. Qualifying a
communication link to be "link-like", to have the state capabilities that
support the blocking natural to channels, ALTs etc. This includes things
like DMA. It also includes the "resetch" equivalent for when the cable gets
pulled out.

Doing this kills three birds with one stone (quite a feat, you'll agree):
(a) Gets them all talking to each other. (b) Eliminates the nightmarish
hardware hang problems that can bring any system down, including the
"uninterruptible sleep" conditions that plague Unix and Linux. (c) Forces
an occam structure on all coding that communicates using "links" - it just
comes naturally as you find when doing it, even in C. Centralized
structures become less and less relevant.

Other notes relating to Roger's posting:

> > > >    Off the subject: someone mentioned Alpha links. Any details... ?
> > >
> > > I put this on an OHP slide for my final year Comp Arch class :
> > >
> > > 1.6GHz Alpha to be fastest Quake chip on planet
> > >
> > > <snip>
> > >
> > > The 21364 integrates CPU, level two cache, direct memory controller and a
> > > transputer-class multiprocessing connection of each processor to four
> > > others at 10 Gbytes/s interprocessor bandwidth,
> >
> > (a) Does this mean 10 Gbytes/s per LINK, or on all four links combined?
>
> Apparently, all four combined.  See
>    http://www.digital.com/alphaoem/present/sld018.htm
> I cannot find many useful references to this processor.
>
> > (b) How many pins per link? See my comments, below.
> > (c) What interprocessor distance is supported?
>
> I have no information on these.
>
> > (d) What about latency?
>
> 15ns (see Web reference above).  There appears to be a network and
> router model, too.  The integration of communication and scheduling
> instructions isn't obviously mentioned.

Surely somebody in the occam world can be found to get the assembly manual
or simulator or whatever for this thing and force the Transputer
instruction set over it?? Think of all the bloated developments we can
leapfrog.

>
> > > >    We need a new Transputer.
> > >
> > > Barry Cook and I are looking at how much more work needs to be done to
> > > our Occam-to-FPGA compiler to build one.
> > >
> > > Have you seen http://www.ee.surrey.ac.uk/Personal/R.Peel/wotug22.ps ?
> >
> > Now I have. I downloaded it and it is GREAT! I noticed it pointed toward
> > dates already past - I'm out of touch it seems - did you finish the
> > Portakit processor? Why this is great, see below...
>
> We ran into a few structural issues within the compiler that provoked a
> re-design and re-implementation of some sections.  These are now nearly
> complete.  Along the way, we have understood a number of low-level
> FPGA-circuit implementation details a little better, and have worked
> through a number of other FPGA designs.  In particular, flip-flops and CLBs
> were being eaten up too fast as designs grew, and we needed to re-think
> what optimisation could be done, and where in the compiler to locate it.

Mike Patrick, an old Transputer developer for General Dynamics, may be the
man who knows what you need to know.

> > I got on board after occam 1; my handbooks were the occam 2 reference
> > manual (1/4" thick) and the Transputer instruction set compiler writer's
> > guide (also 1/4" thick). Also the B008 description which gave a solid
> > hardware foundation for extending occam to the PC, never exploited.
> >
> > Now we need the occam 1 reference manual and the Portakit instruction set
> > (39 integer instructions)! With miniaturization, raw CPU power increases
> > as the cube. Pin count increases as the square, and edge connections
> > increase only linearly. So the hangups are increasingly in connectivity.
> > The Portakit moves away from the T9000 in the direction of tremendously
> > successful Microchip Technologies, whose PIC processors specialize in
> > driving individual pins to do a great variety of things. (PICs have
> > about 35 instructions.)
>
> Occam 1 (or proto-occam) was a very restricted language for 'real'
> purposes - although it was fine for understanding and experimenting
> with process concurrency.
>
> Imagine Occam 2 without :
>
> * data types other than INT
> * protocols
> * multi-dimensional arrays
> * FUNCTIONs
> * all interactions of these on every other language feature
>
> ... and you'll have Occam 1.
>
>
> The Portakit architecture does not really have the elegance of the
> later transputer instruction set.  In particular, it does not have an
> internal execution stack, but a handful of (non-orthogonally-addressed)
> registers.  The ALT instruction is rather different, too.  We're moving
> away from this processor back to more transputer-like designs.  These
> are why we are looking at further FPGA code optimisation techniques.

Not a problem as long as you do not lose the simplicity your paper was
showing. Let floating point be a specialist library, for instance. You
had ints of different length and span; you had PLACE commands putting
them on pin arrays - all things characteristic of PIC-type programming.
It's growing in a different direction. None of the "withouts" you mentioned
for occam 1 sound like any handicap when you are growing in that direction.
They're a big plus if they eliminate complexity. ALTs have to work, though;
many-to-one channels would be nice too.

> > Fitting your separate memory model, PICs mate with tremendously popular
> > serial memory (some uses only one wire!) - very slow but it saves those
> > valuable pins. Nothing comes anywhere near OS links - fast, 2 pins (so
> > it has a big advantage over DS links too). Think of the connectivity
> > required for sensors and control of a car or airplane. Timing is not so
> > demanding, being human scale, but robustness under complex IO is
> > totally required. PICs, as normally programmed, fail THAT test; only
> > occam passes it.
>
> The Para-PC architecture, as currently described, is likely to be
> deficient in processor-to-memory bandwidth.  Using fast, narrow memory
> channels (e.g. of RAMBUS performance) and a very fast hub/router might
> be more realistic in comparison with current microprocessors.  The
> Para-PC's clean multiprocessor architecture remains very attractive, of
> course.

Think about tremendous skeins of slow data channels in an automobile or
airplane. Don't try to compete with "current microprocessors" in bandwidth.
Beat them in robust handling of complexity.

Larry Dickson