[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VCP

To: James Wolffe <James_Wolffe@xxxxxxxxxx>, 1355-association@xxxxxxxxxxxxxx, occam-com@xxxxxxxxx, Paul Walker <paul@xxxxxxxxxxxxxxxxxx>, barry@xxxxxxxxxxxxxx, 106375.1222@xxxxxxxxxxxxxx, 104330.1033@xxxxxxxxxxxxxx, ian.page@xxxxxxxxxxxxxxx
Subject: Re: VCP
From: "info@xxxxxxxxx" <info@xxxxxxxxx>
Date: Sun, 15 Dec 1996 23:51:22 +0100
Dear all,

I think Wolfe's remarks are an excellent contribution to the idea of a new
"VCP". As I already pointed out in a reply to Paul Walker, the VCP is nice
and can provide better performance in certain conditions. My question is a
bit if it is worthwhile when considering a more general purpose use. The
VCP, at least as I see it in the now defunct T9000, implements pure occam -
read synchronized - channels. As such we are talking about bitstreams. The
user himself is responsible for application level headers, giving a more
semantic content to the datastreams. As a result when the mapping in the
network or its topology is changed, often the application code has to be
adapted.
To illustrate this, I would like to explain a bit of the history and the
current implementation of our communication scheme in Virtuoso Classico /VSP.
VSP stands for Virtual Single Processor and defines portability and fully
distributed semantics. (transparent parallel processing).

1. History.
-----------
Virtuoso started as a port of a simple single processor real-time kernel to
the T414/T800 in 1990. This was the first time prioritized pre-emptive
scheduling was made possible on the transputer. It offered tasks,
semaphores, FIFO queues, mailboxes and logical resources. In the next year
we developed a first distributed version. The idea was to allow the
programmer to use any of the "kernel objects" without the need to know where
they were located in the network, the routing being done by the system
level. This worked but we discovered some major problems with the syntax and
semantics of the existing kernel and its API (the same applies for many
RTOS, including POSIX). Some services could not be distributed without some
serious side-effects. E.g. the semaphores were binary, we needed counting
semaphores, the mailboxes were tied in with a receiver task and passed
pointers. We needed a much more symmetric situation and copies instead of
pointers.
The result is now a set of about 70 services, with synchronous and
asynchronous communication primitives. Through-routing is also done
automatically.   
In order to preserve the hard real-time capabilities, we also adapted a
communication scheme based on dynamic buffers and prioritised packet switching.
The packet switching is needed to avoid that a communication medium is
monopolized for a too long time, while the prioritization is needed to
assure that system wide - making abstraction of the unavoidable but
minimized communication delay - all higher priority activities happen first.
If you look at the FIFO scheduling in the transputer, you can discover very
serious FIFO effects at run-time. Hence, when using the T9000, we had to
program around the VCP routing scheme as it using a round-robin scheme. The
same applies for the resource queues. For your information, we never
released the T9000 version, because we didn't see any stable HW.
We ported this very succesfully to the TMS320C40 and the SHARC. The high
priority FIFO queue being replaced by a software version, our nanokernel. We
get excellent performance as you can see below. The routing scheme uses
command packets of 64 byte and user defined datapackets (typically 1 K or
less). The system also allows "fat links", whereby we split the packets over
multiple links at the same time.

               T800 - 30MHz          C40-40 MHz                 SHARC - 40 MHz.
------------------------------------------------------------------------------- 
Hop delay      50 microsecs         21 microsecs               20 microsecs
Throughput(task to task)
1 link         1.8 MB/s             19 MB/s                    39 MB/s
3 links        4.7 MB/s             30 MB/s                    100 MB/s
Context switch
nanokernel     +/- 1 microsec        1 microsec                580 nanosec.

Of course the programming effort on the DSP is a lot higher when compared
with the T800. You have to program a lot more in assembler, you have to
set-up the DMAs, etc. Also crucial is to minimize interrupt disabling and
latencies.
The main point is that while HW support like link switches and VCPs can
improve the performance, a more flexible SW implementation is not
necessarily that bad.

2. Other schemes.
----------------

We also have a couple of other products, like Virtuoso Nano /VSP where the
scheduling is round-robin (hence also the communication) and Virtuoso
Synchro (still a prototype), where the scheduling is static. For these
cases, no prioritisation is requied. For memory reasons when buffering, we
still need packets for Nano.

3. The interest of the T9000 link.
---------------------------------

One of the more difficult tasks in embedded real-time systems is fault
tolerance. And that's a major reason for our interest in the SMCS and T9000
link.
Our VSP model is a good starting point as the user can write his program
independently of the topology. So when he or the kernel detects a failing
link or processor, the correcting action is greatly reduced to updating
routing tables (I simplify things here a bit). An important aspect is that
the processor and communication failures can be isolated, independently
reset and can be stopped from propagating.
We could do this with the T800, by using a fault-tolerant link driver (using
error checking bits and time-outs). On the C40 or SHARC, this is virtually
impossible, unless you accept high overheads. So we gladly contributed to
the design of the SMCS. Note however that unless for start-up code, we will
most of the time not use the higher level SMCS protocols but just the naked
datacommunication scheme. The rest being done in SW, with the least amount
of work-arounds possible.
For fault-tolerance, however, even if the naked performance of
VSP+linkswitch can be better, I am not so sure it is that useable. If a
failure occurs in the VCP or the linkswitch, how many channels are affected
? How do you recover with the least amount of intrusion on the rest of the
system ? 
One must be very careful when putting things in HW. Often because of a
restricted application in mind, a lot can get lost when you force developers
to program around it for other types of applications. A typical example is
the HW stack used in many DSP. It saves ONE cycle when in sequential code,
it costs a lot more when the kernel has to swap the stack in a multi-tasking
environment. Fortunately, newer DSPs are getting rid of it (but for the main
reason that nobody can write a C compiler that can use the HW stack).

4. Some first conclusions.
-------------------------

A VCP with the functionality as found on the T9000 might work well because
it follows the T9000 programming model. Although it was one of the
complexities that proved to much to get operational T9000's. Its essential
function is to implement some automatic routing. If you are considering a
VCP useable for any type of processors, things must be re-examined. What
functionality would I find useful ?
Firstly, the "overhead" on traditional processors is different. Even with a
SW nanokernel, the hi-pri queue equivalent, the routing is not the overhead
(you just look up a table). The overhead is due to the use of the available
communication hardware (e.g. managing the buffers, setting up the DMA,
synchronising the two processors (using interrupts), protecting the shared
memory zone, swapping context, etc.) On transputer architectures, this is
part of the instruction set (to a large extent). On a DSP, this can take up
to 10 microseconds using tens of instructions. So an ideal VCP would
implement most of this functionality in HW. So the interface would be a
datastructure with as main entry a pointer to a data/command packet. You
would end up with something like the SMCS except that routing would be added
as well. One of the main problems to solve how is to assure that  no
transmitting of data is done before the receiver side is ready (this means,
has buffer space). Fifos could be useful to reduce the interrupt rates. Etc.

These are just some thoughts I put down, it is not a detailed functional
specification. So there are probably some holes in it. But if it helps the
discussion, please feel free to comment.

Best regards,

Eric Verhulst    


At 01:44 PM 12/13/96 EST, you wrote:
>     I would like to add to Paul Walker's comments on VCP's, based on some 
>     prior experience with transputer-based packet communication (in T800 
>     software) and studies of systems using the T9000 (unfortunately never 
>     implemented), as well as communications architectures quite distict 
>     from the occam/transputer heritage.
>     
>     I offer my apologies in regard to possible duplication of this 
>     message. Also, apologies for my ignorance of the prior occam reflector 
>     activity, which I presume related to hardware compilation of an occam 
>     VCP design.
>     
>     The notion of the VCP is quite appealing, to help implement capable 
>     parallel systems exploiting the IEEE1355 link and switch technology. 
>     Such a coprocessor function, closely coupled to the link interface 
>     hardware, is capable of providing excellent performance along with 
>     reasonable cost.
>     
>     Before committing to hardware designs though, it is worthwhile to 
>     consider the range of applications which might be addressed. In 
>     particular, it may be beneficial to consider the separation of 
>     functionality into the categories of mechanism and policy, where the 
>     relatively constant (or performance critical) portions of the 
>     interface are handled with hardware mechanisms, and the policies are 
>     determined in a highly flexible manner by software. Clearly, this fits 
>     well with the "simple PLD" approach, coupled with a processor (most 
>     advantageously an ST20450?- or perhaps a processor supported by 
>     KROC?).
>     
>     Also of interest is the range of packet protocols which might be 
>     supported by the interface. While the synchronous flow-controlled 
>     occam channels directly supported by the T9000's VCP are quite 
>     valuable, other kinds of communication also have their place in 
>     concurrent systems. Examples include client-server systems where all 
>     possible clients may not be known in advance of server software 
>     development, and multimedia applications where unsynchronized 
>     best-effort communication (with dropping of packets when buffers are 
>     full) is the most expedient approach.
>     
>     Along these lines, it would also be a good idea to coordinate packet 
>     protocol support with the new synchronization models being put 
>     together for KROC in order to assure efficient hardware level support.
>     
>     It seems to me that the prospects for market success will be best when 
>     a variety of packet handling protocols are supported by the hardware. 
>     Therefore, the packet format should include a packet protocol 
>     identifier (a single byte should be adequate) which will allow a wide 
>     range of protocols. This identifier could, for example, allow the 
>     performance- critical issue of packet acknowledgement to be handled at 
>     the hardware level.
>     
>     The T9000 scheme where the return route is stored in the VCP by virtue 
>     of channel creation could then be supplemented with other schemes, 
>     such as an "anonymous packet" capability which includes the return 
>     route with the incoming packet. Versions without any acknowledge 
>     support could also be built (packets without buffer space available 
>     for them would need to be discarded to prevent deadlock). There are 
>     probably at least several other protocols that would be of value, in 
>     particular to support synchronization requirements, or perhaps for 
>     fault tolerance.
>     
>     In summary, I think that the "small PLD with programmable processor" 
>     approach to a VCP is the best idea- but not just for economic reasons. 
>     Multiprotocol software-programmable policy is a significant added 
>     value, if only just a little extra flexibility is built into the 
>     hardware.
>     
>     Comments Welcome,
>     
>     Jim
>     
>     *******************************************************************
>     James Wolffe
>     Sr Member Technical Staff
>     Northrup Grumman Norden Systems
>     75 Maxess Rd.
>     Melville NY 11747
>     USA
>     Phone 1 516 845 2220
>     Fax   1 516 845 2906
>     email james_wolffe@xxxxxxxxxx
>
>
>
FROM : Eric Verhulst              - For North America, contact :
Eonic Systems nv   NEW ADDRESS !  - Eonic Systems Inc.
Nieuwlandlaan 9                   - 12210 Plum Orchard Dr.
B-3200 Aarschot Belgium           - Silver Spring, MD 220904-7801, USA
Tel : +32 16 62 15 85             - Tel : +1 301 572 5000
Fax : +32 16 62 15 84             - Fax : +1 301 572 5005
e-mail : Eric.Verhulst@xxxxxxxxx  or info@xxxxxxxxx
Visit us on the WEB :  http://www.eonic.com
----- Virtuoso : the best RTOS for DSP ------------------
Follow-Ups:
- Re: VCP
  - From: Paul Walker
Prev by Date: occam wish list
Next by Date: Re: VCP
Previous by thread: occam wish list
Next by thread: Re: VCP
Index(es):
- Date
- Thread