[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XCore matters



Øyvind, Peter, David and all,

OK, the included emails in this back-and-forth were getting rather onerous, so I trimmed some of them out. Reference them in earlier members of the thread.

Thank you, Øyvind, for the graphic - it helps tremendously. The only thing I wonder about is that it seems to indicate there are only two tiles (=two hardware cores) in the whole CPU. Is this true, or is that just a partial diagram? For some reason I had the impression there were 16. 

Does OTP mean one-time-programmable (microcode) or am I missing the acronym boat?

Correct me if I am wrong - but I understand the staggered cycles are within a tile, so you could have a logical-core-cycle being up to eight staggered-cycle-steps long at maximum usage, but the two tiles operate on top of each other as independent cores. And the switch, I suppose, permits channel communication between one tile and the other even though they have independent memory spaces. Questions: is the switch used when one logical core is communicating to another on the same tile? And is there a speed difference between within-tile communication and between-tiles communication?

How hard would it be (in theory) to write an occam/occonf compiler that would treat the thing you drew as a network of 16 (not 2) "Transputers" and allow processes inside each logical core using the other devices (combinable and distributable?) you have described? Here I am thinking habitually in the uniprocessor-embedded-program (Arduino or MSP430) mindset of one low-priority (loop) process and possibly several ongoing high-priority (interrupt service) processes, which I have always thought of as a cleaner and more unified design than "the program" plus "drivers and ISRs". Or do you have to dedicate a separate thread to each interrupt?

As for the semicolons, I was thinking from a pure syntax/semantics point of view, where if the context is enough to distinguish, you do not need a separate symbol even though you have dissimilar functions. And looking at the Compiler Writer's Guide (and oc assembly code output) each semicolon is implemented by a new "in" or "out" command (with any necessary addressing support) so it's really just a sequence, but due to PROTOCOL you can say it's all happening with the same channel, instead of having to declare a separate channel for each transmission as in Go.

Larry

On Dec 11, 2020, at 1:40 AM, Øyvind Teig <oyvind.teig@xxxxxxxxxxx> wrote:

David and Larry (++, I hope)

I had assumed a "tile" was a core, and that there were a number of cores on an XMOS chip. But, Øyvind, you said (about processes on different cores)

I am sorry, I probably have messed it up. But I have the below picture in my head all the time.

I have now not related to the xcore.ai architecture since I haven’t got any board yet. 

The XCore-200 architecture is described at https://www.xmos.ai/download/xCORE-200:-The-XMOS-XS2-Architecture-(ISA)(1.1).pdf (2015/04/01). What am I complaining about, it’s all there for me to read, 289 pages. I could have known the answer to everything..

I found the above at https://www.xmos.ai/xcore-200/ - where there are much more. Including the https://www.xmos.ai/download/xCORE-200-Product-Brief(1.0).pdf. Here’s the figure:

<PastedGraphic-1.png> 

The left and the right columns are TILES. That phrase is not in the brief, but it is in the code, like on tile[0]: 

Each TILE has 8 LOGICAL CORES and one each of SRAM and ATP. Etc. 

I never had the impression that occam protocol semicolons were anything fundamental - just the start of a new assembly-level input or output, i.e. they could be put in sequence as separate lines of code, except for the need to group things in a PROTOCOL. In my language investigations using lex and yacc, I found the semicolon could be eliminated entirely (and replaced with a comma). That makes it possible to define a variant on occam that looks like C.

I thought semicolons in the protocol had semantic value, in that they introduced synchronization points. I first discovered this in the SPoC (Southampton Portable occam Compiler) where there was a complete channel comm between each semicolon (C code). Like two cogwheels. That said, I could’nt use it for anything, could I? So I thought it had to do with giving some other processes more time, to give communication, for all what it was worth, smaller granularity. I feel like this is kind of story telling, nice to read, perhaps, but precise..?

Øyvind


10. des. 2020 kl. 23:45 skrev Larry Dickson <tjoccam@xxxxxxxxxxx>:

Peter and Øyvind,

You are right, Peter, about the internal channel (I was confusing it with a link communication, where they do both deschedule). I knew that, and put it into my Fringes and Workshop. But my main point was about the DMA, and I still think I am right. The internal T4 communication can go at 40 MB/s (one word every 2 cycles) but the link is limited to less than 2 MB/s. Thus, dedicating a process to transmitting or receiving on a link could result in a 95% waste of cycles. With DMA, most of this is recaptured (my work on Crowell and Elzenga's results showed the burden bandwidth of a link transmission was 37 MB/s).

I had assumed a "tile" was a core, and that there were a number of cores on an XMOS chip. But, Øyvind, you said (about processes on different cores)

======
They share. But different slices don’t. 
======

And then you corrected yourself and said "slices" should be tiles, which I had assumed were cores. Now I am confused. By "core" I mean something that can run cycles without pause timewise absolutely on top of another core (not staggered). But what do they share, and what is distributed? Is there an XMOS assembly language?

What matters is which resources exist, and when they are in use. Transputer and occam make it pretty easy to follow that, but I have a hard time with all the distinctions being introduced by the XMOS.

I never had the impression that occam protocol semicolons were anything fundamental - just the start of a new assembly-level input or output, i.e. they could be put in sequence as separate lines of code, except for the need to group things in a PROTOCOL. In my language investigations using lex and yacc, I found the semicolon could be eliminated entirely (and replaced with a comma). That makes it possible to define a variant on occam that looks like C.

Larry

On Dec 10, 2020, at 12:31 PM, Øyvind Teig <oyvind.teig@xxxxxxxxxxx> wrote:

It looks like I meant to say that occam protocol semicolons and som XC interface patterns are the same. They are not, by far. XC has roles (client, server) and tasks would run in between. There also is a data-less synchronization, that I think may be compared with occam-pi’s ‘!!’. But the similarity came with the fact that there are chunks of communication, I meant to say.. 

Øyvind