RE: Go - deadlocked processes causing memory leaks

Hi,

I did some work on this some time ago and discovered what Rick refers to in that there is no mechanism for concurrent goroutines to join so that you can check/wait for termination of a parallel.

I communicated with the go developers having raised a bug and in fact had an interaction with the main scheduler developer. He really was not interested in what I was saying indicating that this was the way the scheduler was designed and that was the way it was going to be. What happens is that gorotines that do input create lots of instances in preparation for input and do this until you run out of memory. I had a working system until I put a time delay in to slow it down ie

input message

sleep for some short period

output message

It worked for a bit and then went into garbage collect mode and then ran out of memory and the comment was "that is what it is designed to do" I got frustrated and stopped working with go.

I did find a paper from CMU on go scheduling problems which was very informative and included a discussion as to why the go people should follow the example of the Ritson and Barnes design for occam-pi!

I attach the somewhat unhelpful discussion . The reference in the last email to an on-line document can be found a bit up from the bottom and when I have to go to gctrace and dumps then I tend to decide that life is too short!

Jon

Professor Jon Kerridge
School of Computing
Edinburgh Napier University
Merchiston Campus
Edinburgh EH10 5DT

0131 455 2777

j.kerridge@xxxxxxxxxxxx

http://www.soc.napier.ac.uk/~cs10/

From: occam-com-request@xxxxxxxxxx [occam-com-request@xxxxxxxxxx] on behalf of Rick Beton [rick.beton@xxxxxxxxx]
Sent: 29 March 2015 17:35
To: Occam Family
Subject: Re: Go - deadlocked processes causing memory leaks

Hi Roger,

see notes inserted below...

On 29 March 2015 at 17:01, Roger Shepherd <rog@xxxxxxxx> wrote:

I think I understand the explanation in Stack Overflow. Essentially this says that if a one-buffered channel is used, the sending process won’t block even if the receiver is dead, and the process and channel will be garbage collected. Presumably in the implementation, a process’s memory is reclaimed only when it terminates, whereas dangling channels are garbage collected.

That is a reasonable interpretation. Go has quite straightforward memory model, using the stack as much as possible and the garbage-collected heap otherwise. Anything that goes out of scope is reclaimed either by being on the stack (and the stack pointer is moved), or by having no more references to it. This applies to both the channel and the goroutine in the specific example.

I deduce it is not an error to have a program which outputs N items into a channel but inputs only N-1. [I know how you get there, how you end up with buffered channels, and then end up with the nasty consequences. I think the solution is change the programming model so the cases that cause problems can be better expressed - occam and Go are too low level).

I think the buffered channel “fix” only works if the channel eventually becomes ready - if it doesn’t, the process cannot complete.

The buffered channel fix works because the sender can write without waiting; it then immediately terminates and its memory can be recovered.

I can’t be bothered to go through the whole of the background to this, and my knowledge of go is only superficial. However, I did design and implement the occam implementation of InputOrFail etc. so I think I have some insight into this. However, I don’t understand the motivation of this example, i.e. what the programmer is really trying to do. (I suspect that some problems are caused by having these mono-pole processes which fork but do not necessarily join).

In occam we write an input with timeout as:-

ALT

InputChannel ? message

HandleMessage(message)

TIME ? AFTER timeout

HandleLackOfMessage()

Of course, this doesn’t really match the go example which is a bit like

CHAN c OF Thing :

PAR

SEQ

TIME ? t

TIME ? AFTER t + delay

c ! someThing

ALT

c ? message

HandleMessage(message)

TIME ? AFTER timeout

HandleLackOfMessage()

which makes clear the deadlock in the case the ALT chooses the timeout path.

This looks to me like a good representation of the relevant bits of the Go. The important difference is overall termination: in the Occam case, the join semantics means that the whole block will deadlock in the timeout case. Go is different because the join semantics are different - the Read function can terminate even with the secondary goroutine waiting in its deadlocked state. It's easy for the developer to overlook this, because there may not appear to be any termination problem.

So, we might take the Go example as an example of problems with process termination rather than a problem with channel.

Quite.

But, as I said I don’t really understand the semantics of Go. Is the system supposed to detect and suppress deadlocked processes?

Go is like Occam in this: only when every process is deadlocked do you get notified by the runtime that there is a major problem. There is no earlier notification that things are not going well.

You mentioned that Go and Occam are both a bit low level. Perhaps the issue is that the original author of the faulty code was attempting to write an I/O function for fetching something off a network, with a timeout option. Here I suspect that there may exist alternative solutions in which this network request would be blocking and the timeout would be handled elsewhere.

However, it's a sunny afternoon and I shall leave that thought dangling for now, without exploring it further.

Rick :-)