[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Priority receives in Erlang



Dear Matt, PLASmids, CRGers, occam-com, java-threads, et al.,

Apologies for multiple sends - we must rationalise our mailing lists!

I'm getting to grips with Erlang.  I'm puzzled by some things so am seeking
feedback on my understanding.  Please respond if anything below seems wrong
or unfair!  I'd be especially grateful for pointers to Erlang literature
that's relevant.  Thanks.

Question 10.15 at:

  http://www.erlang.org/faq/academic.html

asks: "Can a process receive messages with varying priorities?".

The answer is "Sure" ... but what is offered looks a little wobbly:

  receive ->                                                 %% code A
    {priority_msg, Data1} -> priority (Data1)
  after 0 ->
    receive
      {priority_msg, Data1} -> priority (Data1)
      {normal_msg, Data2} -> normal (Data2)
    end
  end.

Matt: this is exactly the structure of code used in the Erlang Santa
example on our Santa website (with 'reindeer' instead of 'priority_msg'
and 'elves' instead of 'normal_msg').


Race Hazard
===========

I don't like the answer because of a classic race-hazard.

Suppose there's a 'normal_msg' sitting in the mailbox of this process
and no 'priority_msg' and let the code start executing.

The poll (in the opening receive with zero timeout) fails to find a
'priority_msg'.  Control returns to the program code and it heads off
into the timeout branch (another receive).

While this is happening, a 'priority_msg' arrives in the mailbox - this
was the result of parallel activity in another core/machine (or another
process could have been swapped into the core our process was using
and sent the message).

Anyway, our process eventually gets to its nested receive.  'normal_msg'
is ahead of the just-arrived 'priority_msg' in the mailbox queue and
will be the one selected by that receive.

So, we had a 'priority_msg' and a 'normal_msg' in our mailbox and
chose to take the 'normal_msg', :(.  That's not good.

Now, you could argue that the 'priority_msg' was just too late for us,
so that it's OK that we didn't take it - we'll take it next time we loop
back to this code for sure.

I don't like that argument though.  We've used two language primitives
here in *sequence* (assuming the opening poll fails).  Anything can
happen in between those primitives being applied - circumstances can
change between the first ending and the second starting.  We can assume
atomicity in dealing with circumstances by each primitive, but not by
two in sequence.  The "just too late" appeal is weak.  A *long* time
could have passed between the first poll and the second receive (e.g. if
this process was swapped out and there were thousands of other runnable
processes to be scheduled before our process gets back in).  There could
be hundreds of 'priority_msg's in the mailbox (sent by panicking processes
wondering why our process wasn't responding) before that second receive
gets going and they will all be ignored in favour of the 'normal_msg',
because they will all be behind it in the mailbox queue.

To do this securely, Erlang needs a way to bind the sequential use of
the polling receive and timeout receive into atomic operation (unlikely)
or a Built-in-Function (BiF) that does a "pri-receive" atomically (not hard).


Performance Hazard
==================

The cost of this priority-receive pattern (code A, above) seems horrible!

The mailbox is searched twice if the poll fails, which is bad news if
the queue is lengthy.  Maybe the compiler can spot the pattern and do
something clever: like record the first 'normal_msg' it finds in the queue
while searching for a 'priority_msg' ... if it doesn't find the latter,
it can immediately go for the former.

Even if it can do this, the effort of trawling through the whole queue
in the first place seems awful!

If we have a process processing 'normal_msg's all the time but always
having to look out for, and immediately deal with, a 'priority_msg',
the costs look daunting.

Compare this with occam:

  PRI ALT                  -- code B
    priority ? msg
      ...  action
    normal ? msg
      ...  action

In the case when 'priority' is ready, the processing time between the
start of the PRI ALT and start of the 'priority' action is nanoseconds
(no channel gets enabled).

In the case when 'priority' is not ready and 'normal' is ready, the
time to get to the 'normal' action is barely a few more nanoseconds
than the above (again, no channel gets enabled).

In the case when neither channel is ready, both channels fail the
initial check and both then get enabled (assuming nothing arrives).
This takes some tens of nanoseconds - probably much less than 100?
When something arrives on either channel, both channels must be
disabled before the choice is made and relevant action started
- some tens of nanoseconds again.

Such costs rise linearly as the number of channels in the PRI ALT
increases.

The Erlang costs rise super-linearly (from a high base) ... unless
the compiler and run-time are very clever indeed.  The code is not
pleasant to write (I'm extending the pattern from code A, but I'm
not at all sure this is right?):

  receive ->                                                 %% code C
    {priority_0_msg, Data0} -> priority_0 (Data0)
  after 0 ->
    receive
      {priority_0_msg, Data0} -> priority_0 (Data0)
      {priority_1_msg, Data1} -> priority_1 (Data1)
    after 0 ->
      receive
        {priority_0_msg, Data0} -> priority_0 (Data0)
        {priority_1_msg, Data1} -> priority_1 (Data1)
        {priority_2_msg, Data2} -> priority_2 (Data2)
      after 0 ->
        receive
          {priority_0_msg, Data0} -> priority_0 (Data0)
          {priority_1_msg, Data1} -> priority_1 (Data1)
          {priority_2_msg, Data2} -> priority_2 (Data2)
          {priority_3_msg, Data3} -> priority_3 (Data3)
        end
      end
    end
  end.

In this, we are looking to receive one of four possible message types,
but giving priority to 'priority_0_msg', then 'priority_1_msg' etc.

Race hazards are present.  There are lots of scenarios (similar to,
and more than, those for code fragment A) in which a lower priority
message is selected over a higher one sitting in the mailbox.  Let's
ignore these for now (since we are looking at performance).

The obvious implementation requires four sweeps through the mailbox
queue, with more complex search requests each time.  Maybe a clever
compiler could get the run-time to do this in a single sweep ... but
that would be pretty clever!

In occam:

  PRI ALT                  -- code D
    priority.0 ? msg
      ...  action
    priority.1 ? msg
      ...  action
    priority.2 ? msg
      ...  action
    priority.3 ? msg
      ...  action

The worst case (when nothing is ready) is implemented by a fast initial
check through the four channels, then an enable sweep.  When/if something
arrives, there is the disable sweep.

The number of these sweeps (check/enable/disable) is independent of
the number of (prioritised) message sources.  The "obvious" Erlang
implementation requires one sweep through the process mailbox for each
(prioritised) message type, with increasingly complex checks each time.

What about a replicated prioritised receive?  In occam:

  PRI ALT i = 0 FOR n      -- code E
    priority[i] ? msg
      ...  action

In Erlang, extend the pattern from code C.  Define the message structure:

  {atom, level, data}

where 'atom' is some identifying atom for the messages we want to take,
'level' is a priority level (0 = highest, 1 = next highest, ...) and
'data' is the real data in the message.  Define the functions:

  %% code F

  pri_receive (my_atom, max_level) -> pri_receive (my_atom, 0, max_level)

  pri_receive (my_atom, level, max_level) ->
    if
      (level < max_level) ->
        receive
	  {my_atom, i, data} [when i <= level] -> {i, data}
	after 0 ->
	  pri_receive (my_atom, level + 1, max_level)
	end;
      true ->
        receive
	  {my_atom, i, data} [when (i <= level)] -> {i, data}
	end
    end.

To wait for 'n' levels of priority and, when one is received, respond with
some 'priority' action:

  priority (pri_receive (my_atom, n - 1))

That's not too bad (if I've got the when-guard syntax and semantics right).
In the worst case, it still needs one sweep through the mailbox for each
priority level ... and it still suffers numerous race hazards.  Implementing
'pri_receive' as a BiF could eliminate those hazards and make it *much*
faster.

By the way, the when-guards on the receive patterns above are neat.  They
let a process inspect a potential match of a message in its mailbox to see
if the information it contains is acceptable: if not, the match fails and
the message is left in the mailbox.  This is something that can be expressed
in CSP, but not in any existing CSP language or library.


Erlang and occam
================

Erlang processes are *a bit* like occam-pi processes that only have a single
input channel on which all messages must arrive.  The other end of that
channel is SHARED and MOBILE, corresponds to the Erlang concept of unique
process identifier (pid) and must be given to any process that needs to
send it a message.  To send a message to another process, it must be given
the SHARED MOBILE end (i.e. pid) of that other process.

This seems a bit constraining!  If Erlang process could be equipped with
multiple named mailboxes, it could do the priority receive much more easily
(for the programmer) and much faster (for the run-time system).

Sending messages to named processes was the original CSP idea (Hoare, 1978)
This was dropped in favour of communication via channels (independent of
processes) in the revised CSP (Hoare, 1983) and occam language (May, 1983).
Erlang has not undergone such a revision.

Having single mailboxes and named target processes (with names distributed
either explicitly as 'pid's or magically as registered 'names') means there
is no code that represents process networks, which feels like a bad loss.
It means networks can grow without discipline (spaghetti).  There is no
concept of networks within networks - i.e. processes whose implementation
is a network (bound within the confines of the process being implemented)
- i.e. processes with the constraints of physical components (no spaghetti).

Thoughts?

Peter.