RE: Transputer - schools

Thanks David.

Hmm it is 20 years since I have done much digging into the internals of compilers – I now use APL for my multi-tasking lab control and data collection programs.

Further, my understanding of the Arm is limited to what I have picked up porting these languages to the Raspberry Pi.

So I am afraid I will have to leave this next stage to others. However I am certainly interested in doing the bits I am able to do.

“System interface for I/O” is something I am interested in for my work on porting APL to the Adapteve Epiphany – again this is missing I/O – but I am fairly out of my depth – there are people who understand this sort of thing far better than I.

Cheers,

Beau

From: David May [mailto:dave@xxxxxxxxxxxxxxxxxxxxx]
Sent: 29 June 2013 18:41
To: J.B.W.Webber
Cc: David May; Ruth Ivimey-Cook; Larry Dickson; occam-com@xxxxxxxxxx
Subject: Re: Transputer - schools

Beau,

This is great. You now have a compiler running

on an ARM simulator - compiling itself. This is a fairly good

test in itself - the compiler includes a good mix of high and

low level programming techniques.

However, it should now be fairly simple to bootstrap the

compiler so that we don't need the simulator. You are

using an ARM to run a simulator that simulates an ARM.

The changes needed are

1: object code format - probably need to modify the

compiler output to produce ELF or something.

2: system interface for i/o etc.

If you do this, the compiler will produce ARM object

code that will run on any ARM (I think). The code will

be at least 10 times faster than running it on the

simulator.

I am working on putting concurrency into X - this will

be slightly different from occam (reflecting what I have

learned over the last few years).

From an educational viewpoint, X is interesting for

several reasons:

1. It is small and understandable - both language and

compiler

2. It is powerful enough to express its own compiler

3. It has no GOTOs, Breaks, Pointers etc - I think

Floyd-Hoare logic can be used on any program

you can write in X.

Also - my small ARM subset can easily be used to

re-target an occam compiler (constants, static link

and all!). I'm happy to help with this, if someone wants

to take it on.

Best wishes

David

On 29 Jun 2013, at 12:10, J.B.W.Webber wrote:

Dear David,

I have now built the code you attached on a Raspberry Pi.

I followed the script you sent, and it all worked as you said.

You mention example programs – could you attach these so that I can test what I have built further ?

If people are interested, I will add this compiler to the compiler set I will make available on a SD card for the Raspberry Pi, for people to experiment with.

Cheers,

Beau Webber

From: Sympa,pkg125 [mailto:sympa@xxxxxxxxxx] On Behalf Of J.B.W.Webber
Sent: 29 June 2013 09:31
To: David May
Cc: Ruth Ivimey-Cook; Larry Dickson; occam-com@xxxxxxxxxx
Subject: Re: Transputer - schools

Dear David, it is good to have your input here. Thanks for attaching your stripped down Arm code. I have been following your talks on this - really interesting, an important development.

I do now have Occam compiling .occ source, on the Raspberry Pi - this is just the interpreted version of the University of Kent Kroc. Works nicely for my investigative purposes - where speed does not matter.

My interest is a follow up of work I did on getting the concise/agile array manipulating language APL to run on the Transputer, back in the 1980s.

To be more precise, I want APL nodes running on the processors, joined together by a communication harness. And to me, Occam is the ideal language to build that harness, given its robustness against common multiprocessing pitfalls.

I now have both aplc APL to C compiler and Kroc Occam running on the Raspberry Pi, and am inviting people to join me in experimenting in this very low cost environment.

Cheers, Beau Webber

Sent from my iPhone

On 28 Jun 2013, at 23:04, "David May" <dave@xxxxxxxxxxxxxxxxxxxxx> wrote:

I've seen this trail. But I'm not sure what you are trying to do. If it's to

port occam to ARM - good idea.

The best way to do this is to retarget the back-end of the occam

compiler. You'll also need a runtime for the scheduling instructions.

I've recently done a compiler for a simple language that targets

a subset of ARM Thumb instructions - details attached. It has a

codebuffer that deals with all of the constant pool issues. The

instruction subset works on all current ARMs (I think). The language

is the sequential subset of the one I used during the design of the

XMOS architecture.

I did this to teach students about computer architecture and

compilers - but it's probably useful in other contexts! I'm working

on a much simpler architecture than the ARM subset - one that

also supports concurrency with thousands of cores. And in the

meantime it would be great to get occam compiled to ARM,

raspberry-pi etc.

Also - I suspect a carefully written transputer simulator would

run quite fast on a modern processor. The simulator I've attached

(for the ARM subset) runs at over 100MIPS on my ageing

MacBook - so the compiler will bootstrap using the simulator

in under 0.5 seconds.

Best wishes

David

I've attached:

1. A description of the language, thumb subset and
   compiler - xarmnotes.pdf

2. A simulator - armsim.c - that executes a binary in a.bin. This
   uses another file - armexec.c which is generated by another
   program - armsgen.c (you only need this if you want to
   modify the simulator).

3. A compiler binary - xarm.bin

4. A compiler source - xarm.x

5. Another compiler source that generates assembly code (for
which we have no assembler right now) which is easy to read.

6. A source produced using a formatter and latex - xarm.pdf.

To make this lot work, you have to compile the simulator using (eg)

gcc armsim.c

Then you can copy xarm.bin to a.bin:

cp xarm.bin a.bin

and run the simulator; the compiler binary expects source from
standard input so the command is:

a.out <xarm.x

This will produce a binary output in a file called sim2.

You can copy sim2 to a.bin and compile again:

cp sim2 a.bin

a.out <xarm.x

You should be able to do this over and over again with the same
result.

To see what is being compiled, you can compile xarma.x:

a.out <xarma.x

Now you can copy sim2 to a.bin and compile again:

cp sim2 a.bin

a.out <xarm.x

This will produce the assembly code for the compiler,
corresponding to the code that is being executed by the
simulator.

I also have versions that produce different loader formats
etc - as the compiler actually produces position-independent
binary in its codebuffer it should be fairly easy to bootstrap it
to produce anything you want.

I have not tried executing the compiler on a real ARM
although I have run some simple programs. It has an
(undocumented) feature - if you use a constant as an
array the compiler will generate memory accesses
relative to the value of the constant - this provides for
memory mapped i/o. Also if you use a constant as
a procedure or function the compiler generates an
SVC - this is used in the simulator to do file i/o.

The compiler has some rough edges especially in
error reporting. I'm gradually tidying it up.

Best wishes

David
On 28 Jun 2013, at 17:02, Ruth Ivimey-Cook wrote:
Larry Dickson wrote:
Can you go into detail about these "literal pools"? I don't remember seeing such a thing (probably it had a different name). I remember the static chain, which sounded complicated.
 
What I'm wondering is whether they can be avoided or simplified by following a coding convention. Then, if the other cases were done inefficiently, it would not matter so much.
 
ARM processors - at least the architecture available then - were limited in where they could load or save, using a single instruction, to addresses IIRC +/- 4096 bytes of the PC. If you needed an address outside of that range, you had to calculate or load the address into a register. Similarly, any non-trivial constant had to be loaded in the same way. Therefore, in ARM code you find, between blocks of opcodes, these pools or blocks of literal values that are used by the code in the surrounding area. This is because ARM opcodes are always one 32-bit word, so you can't have a 32-bit constant in one of them.

The problem I had was that, without knowing more about the context than a single instruction, it was hard to translate transputer-code values and addresses into anything remotely efficient in ARM terms, so these literal pools became big - so big in some cases that it became impossible to use just pools at function boundaries. That was when I started looking around for alternatives.

Below I've included some of the code I wrote to work out whether a constant could be loaded directly or whether it had to be pooled.

Regards
Ruth

/* will v fit in mask m (eg. 0xff) with a shift of 2n? */

__inline unsigned int rotate_left_by_2(unsigned int v)
{
    unsigned r = (v << 2) | (v >> (30));
    return r;
}

unsigned int is_arm_immed(unsigned int v, int m)
{
    int n;
    int nm = ~m; /* e.g. 0xffffff00 */

    n = 0;
    while ((n < 14) && ((v & nm) == 0))
    {
        v = rotate_left_by_2(v);
        n++;
    }
    return (n != 14);
}

int big (int n, int bits)                     /* deal with big operands (12..32 bits) */
{
    if (bits == 8)
    {
        /* small means it will fit in 8 bits, modified by an even rotate, with sign independent */
        int k = (n >= 0) ? n : -n;

        if (is_arm_immed(k, 0xff))
        {
            mnem("ldimm"); T0; dec(n); COMMENT("big constant"); NL;
            return 1;
        }
    }
    else /* 12 bits excl. sign, straight */
    {
        if (is_arm_immed(n, 0xfff))
        {
            mnem("ldimm"); T0; dec(n); COMMENT("big constant"); NL;
            return 1;
        }
    }

    return 0;
}
-- 
Software Manager & Engineer
Tel: 01223 414180
Blog: http://www.ivimey.org/blog
LinkedIn: http://uk.linkedin.com/in/ruthivimeycook/ 

<xarmnotes.pdf>

<armsim.c>

<armexec.c>

<xarm.bin>

<xarm.x>

<xarma.x>

<xarm.pdf>