[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Go - deadlocked processes causing memory leaks


> I’ve always thought InputOrFail and brethren were very important,

I think they are

> and that the theory needed to be expanded to try to embrace those cases,

but I think this expansion is tricky.

> for reasons like those in this thread. In my book Crawl-Space Computing, I worked on one case (orchard sensors) that had failing connections, and found I had to carry it a little farther (p 133):
>   bool1 & c1 ? mess1
>     — code
>   — more channel guards if needed
>   clock ? AFTER timeout
>     — code, e.g. timedout := TRUE
>     — code, e.g. notdone := FALSE
> where c1 ? mess1 is really an InputOrFail and branches to FAILURE if aborted. The reason it works is that once an inputting link channel guard is selected, and before its communication is done, its process’s address remains in its channel word and can be recovered. If the first, “unsolicited” byte has not been sent, or has not finished coming, then the channel’s branch of the ALT is not ready, and the timer branch wins. So this makes the ALT bulletproof.

I think the C1 ? mess1 has to be an InputOrFail. The problem is that if the first byte arrives correctly, but there is a problem before the end of the message, an input may hang. The presence of the first byte will trigger the selection mechanism of the ALT and an input (c1 ? mess) will be performed; it is this input which can fail, and once the input instruction has been executed the rest of the ALT has already been cleaned up, and hence there is no pending timeout. 

> Larry Dickson

I have two concerns. 

The first, which I’ve mentioned before, is that a “failing” communication undermines the programming model. Dealing with failing communication is a bit like dealing with failing variables. How might a variable fail? As far as I can see, the failure of a variable can only be the failure to yield the correct (last written) value when read; this is like a communication occurring but the wrong value being received. I think we have to assume our variable work (although perhaps accessing the value of a failed variable could be mapped into STOP?).  In a communication we can take steps to ensure that the data passed is correct (error correction) although there are implementation issues (see next point). This leaves us with the nasty property of the failing communication that there is a breakdown of synchronisation. Whilst I think InputOrFail etc. provide a fairly neat way of localising failure, it is by no means perfect. For example, consider

    CHAN OF [5]BYTE c :
         InputOfFail.t(c, buffer, t, failedInput)
         OutputofFail.t(c, “Hello”, t, failedOutput)

There is no need for failedInput and failedOutput to have the same value - which seems a little strange. Perhaps something can be done to make these operations semantically clean, and to make the compose properly, and….. 

Which brings me to my second point. Implementation. In occam we chose not to have output guards because of the cost of implementation (notwithstanding whether we could have come up with a “correct” implementation). InputOrFail etc are significantly more expensive than ? and !. I have a concern that any scheme to make the semantics of InputOrFail etc “nice” will carry a heavy penalty. 

So what can be done? I suspect the way to approach this is to use a language which has  a higher level of abstraction and allows interaction between concurrent entities to be treated at a higher level. For example, if the client-server relationship were captured in such language, then (i) more efficient implementation of clients-and-servers might be possible than can be achieved using occam, and (ii) it may be able to deal with “unreliable clients" (as seen by a server) and “unreliable servers” (as seen by clients) in a neat manner. I think the trick with this type of endeavour is to make choices which allow real problems to be solved neatly, while not providing too much generality. 



Roger Shepherd