[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Denis A Nicole writes:

> The memory management story is also simple.  You calculate at compile time
> the size of an activation record for each different recursively used
> process.  The compiled code then manages a separate list of exact-size
> buffers for each recursively used process. In-line code picks a buffer
> from the list on process activation (or sbrk()'s a new one if the list is
> empty) and returns the buffer to the list on process termination.  No
> buffers are ever returned back from these exact-fit lists. The buffer
> lists just grow out of contiguous free store.
> The total memory required is bounded, and is optimal if there is only one
> recursive process---the typical teaching scenario. It all gets returned at
> program termination.
> Does this look useful for real codes?

Prof Brinch Hansen sent me this paper and a couple of others. I was waiting 
to see what other people thought. I too was surprised by this stategy: he
does acknowledge it as a limitation at the end of the paper. 

I am probably being dim, but as far as I can see, the algorithm has to
update shared variables and so invoke some synchronisation mechanisms.

For those that haven't seen the paper, here is part of an occamish outline 
that I prepared earlier :-)  

The memory used by any particular process is never 
released for reuse by other processes. So a single use of a deeply recursive
process early in a program run might permanently consume most of the memory.
He quotes exactly three algorithms: Quicksort, N-body pipeline and Laplace
matrix. None of them seems to me to be very representative of the sort of 
massively parallel and perhaps deeply recursive things that we might want.

Except that the maximum depth of recursion is determined at run time, the
stategy is not unlike (Michael's?) textual replication trick for recursion.
The idea seems to be

top ---> (last address in use)                    
   ---------------                               ------------------
  |      min      |                         1    |                |
   ---------------                               |    (pool)      |
  |               |                              |________________|
  |               |                      index   |                |
  |    ( memory)  |                               ---------------- 
  |               |                              |   pointers     |
  |               |                              |    to old      |
  |               |                              |    released    |
   ---------------                               |   workspaces   |
  |      max      |                      limit   |                |
   ---------------                               |________________|

One pool slot for each process (textual declaration), all initially "empty".
top is global variable initially pointing to last memory location in use,
so (top + 1) is free for use. Invariant: top --> last used address.

PROC allocate (VAL INT index,length,INT address)  
    address := pool[index]
      address <> empty
        pool[index] := memory[address]
      address = empty
          address := top + 1
          top     := top + length
          assume top <= max       -- IF
                                  --   top > max
                                  --      STOP
                                  --   TRUE
                                  --      SKIP

PROC release(VAL INT index,address)
    memory[address] := pool[index]
    pool[index]     := address

Each (textual) process is allocated an index and a length: calculated by the
compiler. The length is just that of the workspace. As it stands above, the
lowest workspace slot is used to maintain a linked list of previously used
workspace slots, now free for re-use, but only by instances of the current
process, as identified by the index.


Updating the shared top has to be guarded. But so has updating pool[index]
since a process (like Quicksort) may/will spawn more than one copy of itself
in parallel, both contending for the same slot in the pool.

I am probably being dim, but it seems to be quite expensive to arrange these
synchronized updates, and having the pool of recycled workspaces available
does not seem to deliver much. Am I missing something ? Probably. :-(


PS. Another of the papers "SuperPascal - A Publication Language For
    Parallel Scientific Computing", Concurrency - Practice & Experience 6,5
    (Aug 94), 461. is an interesting read and has a whole section on occam.
    What did surprise me was that there was no mention of CSP or any 
    of the several semantic models. There is mention  of 12 programs:
    they are typical of "scientific" problems, rather than the problems
    of embedded, real time, safety critical, massively parallel and hardware 
    that we also try to address. 
A E Lawrence, MA., DPhil.  	adrian.lawrence@xxxxxxxxxxxxxx
MicroProcessor Unit, 13, Banbury Road, Oxford. OX2 6NN. UK.                
Voice: (+44)-1865-273274,  Fax: (+44)-1865-273275