Re: Go - deadlocked processes causing memory leaks

On 29 Mar 2015, at 15:18, Rick Beton <rick.beton@xxxxxxxxx> wrote:

PS
"Because the channel is non-blocking, ..."
should read
"Because the channel is unbuffered, ..."

(edited version follows below)
Rick

Rick,

I think I understand the explanation in Stack Overflow. Essentially this says that if a one-buffered channel is used, the sending process wonât block even if the receiver is dead, and the process and channel will be garbage collected. Presumably in the implementation, a processâs memory is reclaimed only when it terminates, whereas dangling channels are garbage collected. I deduce it is not an error to have a program which outputs N items into a channel but inputs only N-1. [I know how you get there, how you end up with buffered channels, and then end up with the nasty consequences. I think the solution is change the programming model so the cases that cause problems can be better expressed - occam and Go are too low level).

I think the buffered channel âfixâ only works if the channel eventually becomes ready - if it doesnât, the process cannot complete.

I canât be bothered to go through the whole of the background to this, and my knowledge of go is only superficial. However, I did design and implement the occam implementation of InputOrFail etc. so I think I have some insight into this. However, I donât understand the motivation of this example, i.e. what the programmer is really trying to do. (I suspect that some problems are caused by having these mono-pole processes which fork but do not necessarily join).

In occam we write an input with timeout as:-

ALT

InputChannel ? message

HandleMessage(message)

TIME ? AFTER timeout

HandleLackOfMessage()

Of course, this doesnât really match the go example which is a bit like

CHAN c OF Thing :

PAR

SEQ

TIME ? t

TIME ? AFTER t + delay

c ! someThing

ALT

c ? message

HandleMessage(message)

TIME ? AFTER timeout

HandleLackOfMessage()

which makes clear the deadlock in the case the ALT chooses the timeout path.

So, we might take the Go example as an example of problems with process termination rather than a problem with channel. But, as I said I donât really understand the semantics of Go. Is the system supposed to detect and suppress deadlocked processes?

The other reason why I think you might want to write this sort of stuff is to dead with failing communication - that is, a failure of the model that the language supports. There are at least two problems with failing communication - the first is defining it and detecting it, the second is what to do about it (recovery). With the transputer although we were primarily concerned about link* failure (probably due to disconnection) our solution would work on internal channels.

Because we were dealing with a failure of the underlying hardware to support the programming model we couldnât rely on the use of âinput with timeoutâ (notwithstanding we didnât support âoutput with timeoutâ). [One problem is that for a link channel, it is the arrival of the first byte of the communication which causes the ALT to make itâs selection. A subsequent failure which meant that the message dis not complete would fail to trigger the timeout].

Addressing "the first is defining it and detecting itâ - we defined a failure as a failure to complete an input or output before either a timeout or another message arrived. We provided the functionality by four procedures:

InputOrFail.t(CHAN c, []BYTE mess, TIMER TIME, INT t, BOOL aborted)

OutputOrFail.t(CHAN c, VAL []BYTE mess, TIMER TIME, INT t, BOOL aborted)

InputOrFail.c(CHAN c, []BYTE mess, CHAN kill, BOOL aborted)

OutputOrFail.c(CHAN c, VAL []BYTE mess, CHAN kill, BOOL aborted)

I canât remember exactly how they worked but it must have been something like

PROC OutputOrFail.c(CHAN c, VAL []BYTE mess, CHAN kill, BOOL aborted)

CHAN completed :

INT workspace_pointer_of_first_process_of_PAR, endP_instruction_of_first_process_of_PAR

PAR â set workspace_pointer_of_first_process_of_PAR to workspace pointer of outputting process

SEQ â outputing process

c ! mess

completed ! ANY

LABEL: endP_instruction_of_first_process_of_PAR

ALT

completed ? x

aborted := FALSE

kill ? x

INT c_pid, completed_cid :

SEQ

â at this point (given this Sequential process is running) the outputting process may be

â waiting for communication on c to complete

â on the run queue

â waiting to communicate on completed (i.e. the communication on c worked but we have defined this as a failure)

aborted = TRUE â !

c_pid = resetch(c) â reset channel instruction

completed_pid = resetch(completed)

â we will cause outputting process to execute its join() [ENDP]

â set up IPtr of process workspace to be ENDP

WriteToMemory(workspace_pointer_of_first_process_of_PAR, Iptr.s, endP_instruction_of_first_process_of_PAR)

â add process to queue if it isnât there already

(c_pid <> NotProcess.p) OR (completed_pid <> NotProcess.p)

â process was waiting on a channel

RUNP(workspace_pointer_of_first_process_of_PAR)

TRUE

SKIP

So, returning to the theme, this code works by causing the outputting process and the watchdog process to perform their join() in the normal way. So, I suspect that the Go system could do something similar - causing the âdeadlockedâ process to complete by slight of hand.

By the way, this code relies on certain atomicity properties of the transputer scheduler (e.g. the operation of the reset channel instruction). Otherwise it can become difficult and/or expensive to deal with possibly zombie processes.

Roger

*A link failure can affect both of the channels that the link implements, and processes at either end of the link.

On 29 Mar 2015, at 15:18, Rick Beton <rick.beton@xxxxxxxxx> wrote:
PS
"Because the channel is non-blocking, ..."
should read
"Because the channel is unbuffered, ..."

(edited version follows below)
Rick
On 29 March 2015 at 15:15, Rick Beton <rick.beton@xxxxxxxxx> wrote:
Hi all,

Is this a good community to ask a concurrency question specific to Go?

There is an interesting post on StackOverflow asking why memory leaks can occur when a service process doesn't complete due to timeout.

The question is here http://stackoverflow.com/questions/29323560/bug-detect-go-channels-with-select
func Read(url string, timeout time.Duration) (res *Response) {
    ch := make(chan *Response)
    go func() {
        time.Sleep(time.Millisecond * 300)
        ch <- Get(url)
    }()
    select {
    case res = <-ch:
    case <-time.After(timeout):
        res = &Response{"Gateway timeout\n", 504}
    }
}
Because the channel is unbuffered, the secondary goroutine is blocked to send its response and therefore occupies some memory. When the response is consumed, everything is cleaned up correctly. But when a timeout occurs instead, the secondary goroutine is effectively left deadlocked in memory, and this memory leak is never recovered.

There is a simple proposed solution of using a buffered channel. The sending goroutine is never blocked and terminates immediately. The receiving goroutine either consumes the buffer's message or skips it. Either way the memory is garbage-collected. No leak occurs.

I have a question: are there other practical solutions to this puzzle that allow unbuffered channels to operate without the deadlock described?

Rick

PS as an aside, it has been my observation that actor systems (in the Erlang and Akka style) can't suffer from deadlocks because they don't allow blocking reads or writes. But do they therefore suffer from subtle memory leaks instead? I think quite they most likely do.

On 29 March 2015 at 15:15, Rick Beton <rick.beton@xxxxxxxxx> wrote:
Hi all,

Is this a good community to ask a concurrency question specific to Go?

There is an interesting post on StackOverflow asking why memory leaks can occur when a service process doesn't complete due to timeout.

The question is here http://stackoverflow.com/questions/29323560/bug-detect-go-channels-with-select
func Read(url string, timeout time.Duration) (res *Response) {
    ch := make(chan *Response)
    go func() {
        time.Sleep(time.Millisecond * 300)
        ch <- Get(url)
    }()
    select {
    case res = <-ch:
    case <-time.After(timeout):
        res = &Response{"Gateway timeout\n", 504}
    }
}
Because the channel is unbuffered, the secondary goroutine is blocked to send its response and therefore occupies some memory. When the response is consumed, everything is cleaned up correctly. But when a timeout occurs instead, the secondary goroutine is effectively left deadlocked in memory, and this memory leak is never recovered.

There is a simple proposed solution of using a buffered channel. The sending goroutine is never blocked and terminates immediately. The receiving goroutine either consumes the buffer's message or skips it. Either way the memory is garbage-collected. No leak occurs.

I have a question: are there other practical solutions to this puzzle that allow unbuffered channels to operate without the deadlock described?

Rick

PS as an aside, it has been my observation that actor systems (in the Erlang and Akka style) can't suffer from deadlocks because they don't allow blocking reads or writes. But do they therefore suffer from subtle memory leaks instead? I think quite they most likely do.