[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Something for the weekend ...
In the `something-for-the-weekend' posting (March 19th, 1999), I noted that
we had managed to deadlock the JavaPP ALT algorithm:
http://www.hensa.ac.uk/parallel/groups/wotug/java/discussion/4.html
that was published some 2.5 years earlier and had survived rather well until
then.
It was left as a challenge to the reader to find the flaw (a race hazard) and
correct it, saying I would publish my version at WoTUG-22. Unfortunately,
there being so many other exciting things happening, I never got around to
doing this - despite the exceedingly late nights ;-)
Somewhat belatedly then, here is the `corrected' algorithm currently used
in the JCSP library ... `correct', that is, unless Jeremy's FDR check tells
us otherwise!!!
Note: the `corrections' to the original version are:
o removing the global synchronized from the Alternative.select method and
imposing it only around the ALT WAIT operation;
o making the Channel.disable synchronized.
That may *sound* simple but it took over 4 days full-time thinking ... :-(.
There were numerous ever-more-complex-and-desperate attempts to secure the
race hazard before this relatively simple `solution' was realised. In fact,
this new algorithm is closer to the transputer's micro-coded ALT instructions
than the original :-) ...
One point about this new algorithm: during the ENABLE sequence, there can be
parallel *writes* to the same variable. The ALTing process may write `ready'
to the `state' variable (currently holding `enabling') and so might a process
writing to a channel that had been enabled earlier in the sequence. The latter
write is synchronized on the `alt' object (see the `schedule' method) but the
former is not - so both could commit to doing this. That could be prevented
by synchronising the ALTing process' write on the `alt' object as well. But
I don't believe that is necessary - parallel writes of the same value ought
to be safe ...
I could explain the original race hazard that lead to deadlock and why this
new version avoids that ... but no time now. I have a fairly comfortable
feeling, based on informal reasoning, that it is now correct ... but I had
that before with the old one! Anyway, it passes the prolonged stress tests
that broke the original. Hopefully, Jeremy will now be able to verify it.
This logic is similar to the things Kevin Vella had to get to grips with for
the SMP version of KRoC occam. It's very hard. Anyone working directly with
Java threads at the level of synchronized/wait/notify *has* to get involved
with this stuff ... and the complexity seems to grow exponentially with the
number of synchronized/wait/notifies that are mutually dependent. I'm hoping
not to have to do that any more ;-) ...
Cheers,
Peter.
PS. Jeremy: note that the enclosed Channel is actually a many-1 channel.
It may simplify things, initially, to make it a 1-1 channel by removing
the write_monitor.
PPS. Jeremy: note also that the `read' method below dispenses with the
`local' variable used in the paranoia section on page 274 of WoTUG-21.
Because I now know that JVMs do *not* release the object lock after
invoking a notify (which I find a little peculiar), the concern
expressed in the second paragraph on that page disappears and we
can get rid of `local'. The CSP model of Java notify reflects this,
so you should be able to re-verify the 1-1/many-1 channels with that
simplification :-)
The first paragraph, though, still describes a real problem - so we
still need the extra notify/wait in the case when the reader is first
to the channel. It would be interesting to apply FDR to the code on
page 273 and see it fail ...
PPS. Jeremy: please hold off verifying this ALT until I've replied to your
posting from yesterday! I want to argue some more about the merits
of the LEFT-only model for a JCSPCHANNEL versus the LEFT-RIGHT one.
==============================================================================
public class Alternative {
private static final int enabling = 0;
private static final int waiting = 1;
private static final int ready = 2;
private static final int inactive = 3;
private int state = inactive;
public int select (Channel[] c) throws InterruptedException {
int selected = -999999; // this value should *never* be returned!
int i;
state = enabling; // ALT START
for (i = 0; i < c.length; i++) {
if (c[i].enable (this)) { // ENABLE CHANNEL
state = ready;
selected = i;
break;
}
}
synchronized (this) {
if (state == enabling) { // ALT WAIT
state = waiting;
wait ();
state = ready;
}
}
// assert : state == ready
for (i--; i >= 0; i--) {
if (c[i].disable ()) { // DISABLE CHANNEL
selected = i;
}
}
state = inactive;
return selected; // ALT END
}
synchronized void schedule () {
switch (state) {
case enabling:
state = ready;
break;
case waiting:
state = ready;
notify ();
break;
// case ready: case inactive:
// break
}
}
}
///////////////////////////////////////////////////////////////////////////
public class Channel {
private int channel_hold; // buffer (not detectable to users)
private boolean channel_empty = true; // synchronisation flag
private Object write_monitor = // all writers multiplex through this
new Object ();
private Alternative alt; // state of reader
public synchronized int read () throws InterruptedException {
if (channel_empty) {
channel_empty = false; // first to the rendezvous
wait (); // wait for the writer thread
notify (); // schedule the writer to finish
} else {
channel_empty = true; // second to the rendezvous
notify (); // schedule the waiting writer thread
}
return channel_hold;
}
public void write (int n) throws InterruptedException {
synchronized (write_monitor) {
synchronized (this) {
channel_hold = n;
if (channel_empty) {
channel_empty = false; // first to the rendezvous
if (alt != null) { // the reader is ALTing on this Channel
alt.schedule ();
}
wait (); // wait for the reader thread
} else {
channel_empty = true; // second to the rendezvous
notify (); // schedule the waiting reader thread
wait (); // let the reader regain this monitor
}
}
}
}
synchronized boolean enable (Alternative alt) {
if (channel_empty) {
this.alt = alt;
return false;
} else {
return true;
}
}
synchronized boolean disable () {
alt = null;
return !channel_empty;
}
}