[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XCore matters



All

11. des. 2020 kl. 17:27 skrev Larry Dickson <tjoccam@xxxxxxxxxxx>:

Øyvind, Peter, David and all,

OK, the included emails in this back-and-forth were getting rather onerous, so I trimmed some of them out. Reference them in earlier members of the thread.

Good!

Thank you, Øyvind, for the graphic - it helps tremendously. The only thing I wonder about is that it seems to indicate there are only two tiles (=two hardware cores) in the whole CPU. Is this true, or is that just a partial diagram? For some reason I had the impression there were 16. 

I guess you should think of them as 16, since there is nothing, I think, you could do with «one». If you placed code on only one logical core (called core), it would get all cycles, but then hw for 7 would be wasted.

Does OTP mean one-time-programmable (microcode) or am I missing the acronym boat?

At https://www.xmos.ai/download/xCORE-Architecture-Flyer(3).pdf it says that «Each tile also has a block of one-time programmable memory for secure boot code and encryption keys.»

Correct me if I am wrong - but I understand the staggered cycles are within a tile, so you could have a logical-core-cycle being up to eight staggered-cycle-steps long at maximum usage, but the two tiles operate on top of each other as independent cores.

I think you are right. I don’t have the feeling for «staggered» even with Google translate but maybe «interleave» is better. If all 8 cores need to run, then they get a cycle each in sequence. Not 2-2-2-2-2-2-2-2 but 1-1-1-1-1-1-1-1 cycles. I think some instructions are two cycles long, I don’t know how that is handled. (Maybe I should have researched more to avvoid such uncertainties, but then you would have gotten the reply on Sunday.) And if only 5 of the cores are used and need to run, there is a scheme by the scheduler. This is known a priori so that timing may be calculated. And yes, the tiles are independent of each other.

And the switch, I suppose, permits channel communication between one tile and the other even though they have independent memory spaces.

Yes. Plus, there are restrictions on when to use channels and interfaces (interfaces are defined sequences that may have roles).

Questions: is the switch used when one logical core is communicating to another on the same tile? And is there a speed difference between within-tile communication and between-tiles communication?

I don’t think so. I think cores on the same tile communicate via shared RAM only. (Sunday applies)
Speed diffs - don’t know.

How hard would it be (in theory) to write an occam/occonf compiler that would treat the thing you drew as a network of 16 (not 2) "Transputers" and allow processes inside each logical core using the other devices (combinable and distributable?) you have described?

We would have to try. I have done some playing with scheduling and tasks on cores at https://www.teigfam.net/oyvind/home/technology/215-my-xc-code-downloads-page/ It is not «down to earth» - rather the opposite. Just download and unzip and pick out _SideChannelAttack_01.xc. But there may be better examples in the XMOS literature.

Here I am thinking habitually in the uniprocessor-embedded-program (Arduino or MSP430) mindset of one low-priority (loop) process and possibly several ongoing high-priority (interrupt service) processes, which I have always thought of as a cleaner and more unified design than "the program" plus "drivers and ISRs". Or do you have to dedicate a separate thread to each interrupt?

As on the transputer, there is no interrupt scheme as such. Any task can wait in a select on a port. Ports are advanced macrocells with support in XC and with ready made macros. Except distributable tasks, which are only allowed to talk with each other. So there is no concept of an ISP. (But XMOS did supply something underlying hw like startkit_adc like on the startkit board (obsoleted). I haven’t seen it after that.

As for the semicolons, I was thinking from a pure syntax/semantics point of view, where if the context is enough to distinguish, you do not need a separate symbol even though you have dissimilar functions. And looking at the Compiler Writer's Guide (and oc assembly code output) each semicolon is implemented by a new "in" or "out" command (with any necessary addressing support) so it's really just a sequence, but due to PROTOCOL you can say it's all happening with the same channel, instead of having to declare a separate channel for each transmission as in Go.

Ok

Øyvind

Larry

On Dec 11, 2020, at 1:40 AM, Øyvind Teig <oyvind.teig@xxxxxxxxxxx> wrote:

David and Larry (++, I hope)

I had assumed a "tile" was a core, and that there were a number of cores on an XMOS chip. But, Øyvind, you said (about processes on different cores)

I am sorry, I probably have messed it up. But I have the below picture in my head all the time.

I have now not related to the xcore.ai architecture since I haven’t got any board yet. 

The XCore-200 architecture is described at https://www.xmos.ai/download/xCORE-200:-The-XMOS-XS2-Architecture-(ISA)(1.1).pdf (2015/04/01). What am I complaining about, it’s all there for me to read, 289 pages. I could have known the answer to everything..

I found the above at https://www.xmos.ai/xcore-200/ - where there are much more. Including the https://www.xmos.ai/download/xCORE-200-Product-Brief(1.0).pdf. Here’s the figure:

<PastedGraphic-1.png> 

The left and the right columns are TILES. That phrase is not in the brief, but it is in the code, like on tile[0]: 

Each TILE has 8 LOGICAL CORES and one each of SRAM and ATP. Etc. 

I never had the impression that occam protocol semicolons were anything fundamental - just the start of a new assembly-level input or output, i.e. they could be put in sequence as separate lines of code, except for the need to group things in a PROTOCOL. In my language investigations using lex and yacc, I found the semicolon could be eliminated entirely (and replaced with a comma). That makes it possible to define a variant on occam that looks like C.

I thought semicolons in the protocol had semantic value, in that they introduced synchronization points. I first discovered this in the SPoC (Southampton Portable occam Compiler) where there was a complete channel comm between each semicolon (C code). Like two cogwheels. That said, I could’nt use it for anything, could I? So I thought it had to do with giving some other processes more time, to give communication, for all what it was worth, smaller granularity. I feel like this is kind of story telling, nice to read, perhaps, but precise..?

Øyvind


10. des. 2020 kl. 23:45 skrev Larry Dickson <tjoccam@xxxxxxxxxxx>:

Peter and Øyvind,

You are right, Peter, about the internal channel (I was confusing it with a link communication, where they do both deschedule). I knew that, and put it into my Fringes and Workshop. But my main point was about the DMA, and I still think I am right. The internal T4 communication can go at 40 MB/s (one word every 2 cycles) but the link is limited to less than 2 MB/s. Thus, dedicating a process to transmitting or receiving on a link could result in a 95% waste of cycles. With DMA, most of this is recaptured (my work on Crowell and Elzenga's results showed the burden bandwidth of a link transmission was 37 MB/s).

I had assumed a "tile" was a core, and that there were a number of cores on an XMOS chip. But, Øyvind, you said (about processes on different cores)

======
They share. But different slices don’t. 
======

And then you corrected yourself and said "slices" should be tiles, which I had assumed were cores. Now I am confused. By "core" I mean something that can run cycles without pause timewise absolutely on top of another core (not staggered). But what do they share, and what is distributed? Is there an XMOS assembly language?

What matters is which resources exist, and when they are in use. Transputer and occam make it pretty easy to follow that, but I have a hard time with all the distinctions being introduced by the XMOS.

I never had the impression that occam protocol semicolons were anything fundamental - just the start of a new assembly-level input or output, i.e. they could be put in sequence as separate lines of code, except for the need to group things in a PROTOCOL. In my language investigations using lex and yacc, I found the semicolon could be eliminated entirely (and replaced with a comma). That makes it possible to define a variant on occam that looks like C.

Larry

On Dec 10, 2020, at 12:31 PM, Øyvind Teig <oyvind.teig@xxxxxxxxxxx> wrote:

It looks like I meant to say that occam protocol semicolons and som XC interface patterns are the same. They are not, by far. XC has roles (client, server) and tasks would run in between. There also is a data-less synchronization, that I think may be compared with occam-pi’s ‘!!’. But the similarity came with the fact that there are chunks of communication, I meant to say.. 

Øyvind








Øyvind TEIG 
+47 959 615 06
oyvind.teig@xxxxxxxxxxx
https://www.teigfam.net/oyvind/home
(iMac)