Re: Occam-Tau - the natural successor to Occam-Pi

On 3 Oct 2012, at 23:27, Eric Verhulst (OLS) wrote:

The test as described has some serious weaknesses:

1. The test is unrealistic. The ISR (or its equivalent on the XMOS chip) does nothing, except setting a bit to high. It should at least read a value from a peripheral register to be comparable with a real situation. In addition, it would be useful to stress test the CPU with some tasks that continuously exercise context switching (as this requires disabling interrupts).
2. FreeRTOS is free and that says it all. It even disables interrupts over every service call. As far as I can see, the RTOS was not even exercised or at least it is not clearly described.
3. This says little about the behavior when other interrupts are present as well. With one interrupt, one always get the best time.

The major point is however that hard real-time is not about how fast a CPU reacts to an external event/interrupt. This is one aspect, but essentially determined by how the peripheral hardware is designed (e.g. how long does it hold the data?). Any reaction time is acceptable as long as it is less than the holding time and less than the interval between two interrupts. Hard real-time is about meeting multiple deadlines in a predictable manner, independently of how many tasks/processes are running. The first step is to execute a scheduleability analysis (typically RMA). There are several papers and books since the 1970's describing this (Liu and Stanckovic). That is not enough, one must also guarantee that the Worst Case is strictly bounded because under realistic conditions the interrupt response time is an histogram, not a single value.

On the XMOS chip this works pretty well until one needs more processes than available hardware threads (8 at 100 MHz by using a kind of timeslicing) or in general hardware resources:
32-bit processor providing up to 700 MIPS (that is: all threads together)
Eight hardware threads and 32 channel ends
Ten timers and six clock blocks
Four XMOS Links
64KBytes SRAM and 8KBytes OTP memory (shared by all threads)
Because at the heart, the XMOS CPU is still a von Neuman machine. How will it behave with 100 (software) threads each having their deadline to meet? How will that be guaranteed when there are asynchronous interrupts, hence the scheduling is not strictly periodic and predictable. And how do you share and protect the shared resources (e.g. the memory).

FYI, see http://www.altreonic.com/sites/default/files/Transparent%20Programming%20of%20ManyMulti%20Cores%20with%20OpenComRTOS.pdffor a more elaborate comparison.

This being said, we are still waiting for the ideal multicore chip. The final limitation is not so much how many gates one can squeeze on a chip, but how many I/O pins and 0WS memory can be made available in a package. In addition, today the market needs are shifting towards safety. Gates are almost free and finally they are being used to make the chips more reliable and capable of handling runtime faults. See e.g.http://www.ti.com/lsds/ti/arm/hercules_arm_cortex_r_safety_microcontrollers/arm_cortex_r4/rm4_arm_cortex_r4/overview.page for this evolution. The XMOS architecture has some potential in this domain, if enabled by the software support.

Best regards,

Eric

From: David May [mailto:dave@xxxxxxxxxxxxx]
Sent: Wednesday, October 03, 2012 8:10 PM
To: Larry Dickson
Cc: eric.verhulst@xxxxxxxxxxxxx; 'Eric Verhulst (OLS)'; 'Occam Family'
Subject: Re: Occam-Tau - the natural successor to Occam-Pi - or is there one already?

Dear all,

I reluctantly added a single priority level to the transputer - at the
time the economics seemed to support this. But I'm still not sure.

Today, I'm sure the economics doesn't support the complexity
and overheads of priority schedulers, interrupts etc. The
hardware event-handling on the XMOS cores is faster than
anything an interrupt-based system can deliver - and
much easier to design with - here's a link:

https://www.xmos.com/download/public/Benchmark-Methods-to-Analyze-Embedded-Processors-and-Systems%28X7638A%29.pdf?support=1

Even for a single low-end core, this approach will easily
out-perform a conventional interrupt system.

With a lot of cores, as Larry has said, the event-handling will be
at the edge of the system. The core of the system will be running
parallel communication structures but - as I said in the
presentation - these have to be designed to ensure
efficient parallel communication flows.

Best wishes

David

On 3 Oct 2012, at 15:05, Larry Dickson wrote:

On Oct 3, 2012, at 1:25 AM, Eric Verhulst (OLS) <eric.verhulst@xxxxxxxxxxxxxxxxxxxxxx> wrote:

The bad news is that these chips are terribly complex to program. Silicon gates are supposed to be free, so the hardware has zillions of options. To understand the issue: the TI chips can route some 1000 interrupts to each core (using a 3 layer interrupt controller). Obviously, interrupt latency is still good, but relatively slower than on the much simpler ARM M3. The point is that a simple PRI ALT will not do the job. Two priorities are not sufficient. It worked more or less on the transputer because this chip had only one event pin. Impressive as the performance was at the time, it was still lacking for hard real-time when tens of processes were competing for the CPU. 32 to 256 priorities are needed to manage all the on-chip resources more or less satisfactorily.

So, it looks like the occam-restaurant is still at the end of the universe as Douglas Adams would have said. And looking at this discussion, the babelfish isn't helping very much.

Have fun,

Eric

Simple design answer: a chip (like the XMOS or Adapteva?) with a boundary of (say) 28 cores, each serving a single event/interrupt/link. Two-level PRI PAR independently on each core, so that each boundary core is absolutely responsive to its event. Extremely fast internal comms between cores, and hardware FIFOs between the boundary cores and the 36 internal cores, so that the reliably captured hard IO can handle slight delays before soft processing.

The reason I resist multiple hard priorities is that I think that solution is mostly a chimera. The top guy is OK, but number 2 and below swiftly become subject to occasional bad delays (when the interrupts happen to fall on top of each other). "[T]ens of processes … competing for the CPU" are only a real-time problem if they depend on lots of independent asynchronous stimuli (given a fast CPU). Multicore, which was not available in the time of the Transputer (except in the form of multiple Transputers, which was way expensive) lets you be responsive to all the stimuli independently.

Larry