Re: Transputer - schools

On 29 Jun 2013, at 12:10, J.B.W.Webber wrote:

Dear David,
I have now built the code you attached on a Raspberry Pi.
I followed the script you sent, and it all worked as you said.
You mention example programs – could you attach these so that I can test what I have built further ?

If people are interested, I will add this compiler to the compiler set I will make available on a SD card for the Raspberry Pi, for people to experiment with.
Cheers,
                Beau Webber

From: Sympa,pkg125 [mailto:sympa@xxxxxxxxxx] On Behalf Of J.B.W.Webber
Sent: 29 June 2013 09:31
To: David May
Cc: Ruth Ivimey-Cook; Larry Dickson; occam-com@xxxxxxxxxx
Subject: Re: Transputer - schools

Dear David, it is good to have your input here. Thanks for attaching your stripped down Arm code. I have been following your talks on this - really interesting, an important development.
I do now have Occam compiling .occ source, on the Raspberry Pi - this is just the interpreted version of the University of Kent Kroc. Works nicely for my investigative purposes - where speed does not matter.
My interest is a follow up of work I did on getting the concise/agile array manipulating language APL to run on the Transputer, back in the 1980s.
To be more precise, I want APL nodes running on the processors, joined together by a communication harness. And to me, Occam is the ideal language to build that harness, given its robustness against common multiprocessing pitfalls.
I now have both aplc APL to C compiler and Kroc Occam running on the Raspberry Pi, and am inviting people to join me in experimenting in this very low cost environment.
Cheers, Beau Webber

Sent from my iPhone

On 28 Jun 2013, at 23:04, "David May" <dave@xxxxxxxxxxxxxxxxxxxxx> wrote:

I've seen this trail. But I'm not sure what you are trying to do. If it's to
port occam to ARM - good idea.

The best way to do this is to retarget the back-end of the occam
compiler. You'll also need a runtime for the scheduling instructions.

I've recently done a compiler for a simple language that targets
a subset of ARM Thumb instructions - details attached. It has a
codebuffer that deals with all of the constant pool issues. The
instruction subset works on all current ARMs (I think). The language
is the sequential subset of the one I used during the design of the
XMOS architecture.

I did this to teach students about computer architecture and
compilers - but it's probably useful in other contexts! I'm working
on a much simpler architecture than the ARM subset - one that
also supports concurrency with thousands of cores. And in the
meantime it would be great to get occam compiled to ARM,
raspberry-pi etc.

Also - I suspect a carefully written transputer simulator would
run quite fast on a modern processor. The simulator I've attached
(for the ARM subset) runs at over 100MIPS on my ageing
MacBook - so the compiler will bootstrap using the simulator
in under 0.5 seconds.

Best wishes

David

I've attached:

1. A description of the language, thumb subset and
   compiler - xarmnotes.pdf

2. A simulator - armsim.c - that executes a binary in a.bin. This
   uses another file - armexec.c which is generated by another
   program - armsgen.c (you only need this if you want to
   modify the simulator).

3. A compiler binary - xarm.bin

4. A compiler source - xarm.x

5. Another compiler source that generates assembly code (for
which we have no assembler right now) which is easy to read.

6. A source produced using a formatter and latex - xarm.pdf.

To make this lot work, you have to compile the simulator using (eg)

gcc armsim.c

Then you can copy xarm.bin to a.bin:

cp xarm.bin a.bin

and run the simulator; the compiler binary expects source from
standard input so the command is:

a.out <xarm.x

This will produce a binary output in a file called sim2.

You can copy sim2 to a.bin and compile again:

cp sim2 a.bin

a.out <xarm.x

You should be able to do this over and over again with the same
result.

To see what is being compiled, you can compile xarma.x:

a.out <xarma.x

Now you can copy sim2 to a.bin and compile again:

cp sim2 a.bin

a.out <xarm.x

This will produce the assembly code for the compiler,
corresponding to the code that is being executed by the
simulator.

I also have versions that produce different loader formats
etc - as the compiler actually produces position-independent
binary in its codebuffer it should be fairly easy to bootstrap it
to produce anything you want.

I have not tried executing the compiler on a real ARM
although I have run some simple programs. It has an
(undocumented) feature - if you use a constant as an
array the compiler will generate memory accesses
relative to the value of the constant - this provides for
memory mapped i/o. Also if you use a constant as
a procedure or function the compiler generates an
SVC - this is used in the simulator to do file i/o.

The compiler has some rough edges especially in
error reporting. I'm gradually tidying it up.

Best wishes

David

On 28 Jun 2013, at 17:02, Ruth Ivimey-Cook wrote:

Larry Dickson wrote:
Can you go into detail about these "literal pools"? I don't remember seeing such a thing (probably it had a different name). I remember the static chain, which sounded complicated.

What I'm wondering is whether they can be avoided or simplified by following a coding convention. Then, if the other cases were done inefficiently, it would not matter so much.

ARM processors - at least the architecture available then - were limited in where they could load or save, using a single instruction, to addresses IIRC +/- 4096 bytes of the PC. If you needed an address outside of that range, you had to calculate or load the address into a register. Similarly, any non-trivial constant had to be loaded in the same way. Therefore, in ARM code you find, between blocks of opcodes, these pools or blocks of literal values that are used by the code in the surrounding area. This is because ARM opcodes are always one 32-bit word, so you can't have a 32-bit constant in one of them.

The problem I had was that, without knowing more about the context than a single instruction, it was hard to translate transputer-code values and addresses into anything remotely efficient in ARM terms, so these literal pools became big - so big in some cases that it became impossible to use just pools at function boundaries. That was when I started looking around for alternatives.

Below I've included some of the code I wrote to work out whether a constant could be loaded directly or whether it had to be pooled.

Regards
Ruth

/* will v fit in mask m (eg. 0xff) with a shift of 2n? */

__inline unsigned int rotate_left_by_2(unsigned int v)
{
    unsigned r = (v << 2) | (v >> (30));
    return r;
}

unsigned int is_arm_immed(unsigned int v, int m)
{
    int n;
    int nm = ~m; /* e.g. 0xffffff00 */

    n = 0;
    while ((n < 14) && ((v & nm) == 0))
    {
        v = rotate_left_by_2(v);
        n++;
    }
    return (n != 14);
}

int big (int n, int bits)                     /* deal with big operands (12..32 bits) */
{
    if (bits == 8)
    {
        /* small means it will fit in 8 bits, modified by an even rotate, with sign independent */
        int k = (n >= 0) ? n : -n;

        if (is_arm_immed(k, 0xff))
        {
            mnem("ldimm"); T0; dec(n); COMMENT("big constant"); NL;
            return 1;
        }
    }
    else /* 12 bits excl. sign, straight */
    {
        if (is_arm_immed(n, 0xfff))
        {
            mnem("ldimm"); T0; dec(n); COMMENT("big constant"); NL;
            return 1;
        }
    }

    return 0;
}

--
Software Manager & Engineer
Tel: 01223 414180
Blog: http://www.ivimey.org/blog
LinkedIn: http://uk.linkedin.com/in/ruthivimeycook/

<xarmnotes.pdf>
<armsim.c>
<armexec.c>
<armexec.c>
<xarm.bin>
<xarm.x>
<xarma.x>
<xarm.pdf>