[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
A CSP model for Java threads
For all you late Friday night workers ...
A CSP MODEL FOR JAVA MULTITHREADING
===================================
Here is a draft CSP model for the sync/wait/notify primitives of Java.
It's not quite as bad as I feared! This is as it should be. The Java
primitives are not that hard to explain informally ... so the same should
be true about any formal explanation. I would be grateful for feedback
on anything wrong with this draft.
We start with the ending ...
CONCLUSIONS:
============
Assuming this model holds water, it gives us a formal semantics for Java
multithreading. Despite all the problems encountered and warnings circulated
on this topic, I've seen little concern in the Java community about the
absence of such a thing. But without it, we will always remain uncomfortable
about the security of any product that uses more than one thread of control.
The particular semantics given here is a *CSP* one. The importance of the
*CSP* is that it is an algebra for concurrent systems -- a formal piece of
mathematics with which we can specify requirements precisely (including
things like deadlock-freedom) and prove that our implementations satisfy
them. Further, some powerful and mature CSP tools (e.g. FDR from Formal
Systems Ltd. and Jeremy Martin's deadlock/sat checker) can be applied.
What this means is that *any* multithreaded Java code -- not just code using
the JCSP or CTJ libraries that give direct access to CSP primitives -- becomes
amenable to formal and (partly) automated analysis. If this is of no interest
to the Java community, then we will all be in for a lot of trouble.
Java designers, implementors, testers and maintainers are running scarred of
its multithreading primitives, but applications force their use. In future,
if not already, finance-critical and safety-critical applications will be
multithreaded. A CSP model of multithreading puts this on a solid engineering
foundation. Noone can ignore that. Anyone doing so would, at the very least,
be exposed to a risk of litigation after a messy accident gets traced back
to a system failure arising from some race condition:
"So, Bill, you sold your system without really knowing whether it was
deadlock-free ... and you never even tried these standard procedures that
might have found out?!! Disclaimer notices notwithstanding, would you
say that that shows a lack of due care or a lack of diligence towards
your customer?"
For example, it should now become possible to build a CSP model of the
erroneous ALT algorithm:
http://www.hensa.ac.uk/parallel/groups/wotug/java/discussion/4.html
together with the stress test (described in the "weekend" posting), apply
Jeremy's deadlock checker and watch it quickly pop up a trace that exposes
the deadlock. Wow -- I wish we had the chance of this a couple of years ago!
Of course, we had better repeat this for the `corrected' ALT algorithm
and see if anything nasty pops up! Assuming it doesn't ;-), we can move
on to *prove* that it is a correct implementation of the ALT.
Most Java multithreaded code using the sync/wait/notify primitives --
including my own code -- looks suspicious. As Tony Hoare said in his 1980
Turing Award speech, there are two kinds of computer systems that sell:
o those that are obviously correct ...
o and those that are not obviously wrong ...
and he noted that it's much easier, of course, to produce the latter.
Guess which kind we are peddling! I wonder how many surprises will pop up
when we start applying CSP tools to our Java codes?
SYNTAX:
=======
This model is built with the version of CSP in Hoare's book -- so it'll be
a little old-fashioned I'm afraid.
Also, I don't know the proper ASCII syntax for CSP so I'm making up my own.
Subscripts:
"A-subscript-i" will be written in array notation as "A[i]"
Replicated operators:
Performing a external choice (say) indexed over some set - e.g.
[] event[i] --> P
i in X
will be written in one line:
[] (event[i] --> P | i in X)
Alphabets:
The set of events in which a process, "P", is interested is "alpha (P)"
OBJECTS AND THREADS:
====================
Let "Objects" be an ennumeration of all Java objects.
For any particular Java application being CSP-modelled, this can be
restricted to just those objects on which any threads in the application
are ever `synchronized'. Usually, this will be finite and small.
Let "Threads" be an ennumeration of all Java threads.
For any particular Java application being CSP-modelled, this can be
restricted to just those threads that are created and started.
Sometimes, this may be unbounded or large.
SYNCHRONIZED:
=============
Let "claim" and "release" be the sets of events:
claim = {claim[o][t] | o in Objects, t in Threads}
release = {release[o][t] | o in Objects, t in Threads}
Let "LOCK" be the set of processes:
LOCK = {LOCK[o] | o in Objects}
Each "LOCK[o]" process is simply a binary semaphore -- i.e.
alpha (LOCK[o]) = {claim[o][t] | t in Threads} union
{release[o][t] | t in Threads}
LOCK[o] = [] (claim[o][t] --> release[o][t] --> LOCK[o] | t in Threads)
So, once a "LOCK[o]" has been claimed by a thread, only a "release[o]" from
that same thread will release it. All other events from the alphabet of
"LOCK[o]" will be refused.
A Java "synchronized" block:
synchronized (o) {
P
}
is modelled by:
(claim[o][me] --> P) ; (release[o][me] --> SKIP)
where "me" is the index of the thread executing this code. The body of
the block, "P", is (recursively modelled as) a process.
Notice that when there are competing processes, it is undefined which one
gets the lock. The "LOCK[o]" process makes no promises as to how competing
claimants are managed (e.g. in a FIFO). This is as defined (or, rather,
undefined) in Java.
That was easy! Actually, that was too easy :-( ...
Java allows a lock-owning thread to re-claim/release the lock `atomically'
(e.g. when a "synchronized" method invokes another "synchronized" method on
the same object). Modelling such a stack of claims/releases in the "LOCK[o]"
process would be complicated by having to deal with a lock-owning thread
letting it go via a "wait" -- since it must re-claim the lock (if and when
it is `notified') at the same stack level and any number of threads, at
different stack levels, could have locked it in the interim).
So let's leave "LOCK[o]" alone and model the "synchronized" block in the
way any sensible implementation would do.
Each thread (index "me") maintains a count, "count[o][me]", for any object,
"o", on which it may be "synchronized". Each "count[o][me]" is initialised
to zero. Then, excusing the occam-style syntax, the Java "synchronized" block:
synchronized (o) {
P
}
is modelled by:
IF
count[o][me] = 0
(claim[o][me] --> P') ; (release[o][me] --> SKIP)
count[o][me] > 0
P'
where, deeply apologising for the "++"/"--" (but it then fits on one line):
P' = count[o][me]++ ; P ; count[o][me]--
i.e. the thread doesn't bother to synchronise on any claim/release events
if it already has the lock!
WAIT AND NOTIFY:
================
Let "wait-a", "wait-b" and "notify" be the sets of events:
wait-a = {wait-a[o][t] | o in Objects, t in Threads}
wait-b = {wait-b[o][t] | o in Objects, t in Threads}
notify = {notify[o][t] | o in Objects, t in Threads}
Let "WAIT" be the set of processes:
WAIT = {WAIT[o] | o in Objects}
where:
alpha (WAIT[o]) = {wait-a[o][t] | t in Threads} union
{wait-b[o][t] | t in Threads} union
{notify[o][t] | t in Threads}
WAIT[o] = WAIT[o][0]
where:
alpha (WAIT[o][n]) = alpha (WAIT[o]), for all n >= 0
WAIT[o][0] = ([] (wait-a[o][t] --> WAIT[o][1] | t in Threads))
[]
([] (notify[o][t] --> WAIT[o][0] | t in Threads))
and where, for n > 0:
WAIT[o][n] = ([] (wait-a[o][t] --> WAIT[o][n + 1] | t in Threads))
[]
([] (notify[o][t] -->
[] (wait-b[o][s] --> WAIT[o][n - 1] | s in Threads)
| t in Threads))
So, there is a "WAIT[o]" process for each object "o". Initially, this has
a count field (represented by a second index to "WAIT") set to zero. This
count holds the number of threads that are executing a Java "wait ()" on
the object "o".
The Java code:
o.wait ();
is modelled by:
release[o][me] --> wait-a[o][me] --> wait-b[o][me] --> claim[o][me] --> SKIP
where "me" is the index of the thread executing this code. Note that these
operations happen regardless of the value of "count[o]" and do not change it.
One of the constraints in Java is that an "o.wait" (or "o.notify") can only
be invoked by a thread currently holding the monitor lock for "o" -- i.e.
from inside a "synchronized (o)" block. If a thread violates this constraint,
the above model for "o.wait ()", running in parallel with "LOCK[o]", means
that that invoking thread will be permanently blocked (i.e. deadlocked).
This seems a suitable punishment (and is one that should be detectable by
CSP analysis tools).
The first "release[o][me]" in this sequence releases the "LOCK[o]" semaphore.
Assuming the application has not violated the above constraint, this will
definitely happen since "LOCK[o]" will be waiting for that "release[o][me]"
(which only this thread is going to send).
Next, the "wait-a[o][me]" in this sequence is always acceptable to the
waiting "WAIT[o][n]" process and just increments its internal count.
Note that no guarantee is made that the "WAIT[o][n]" process *will* ever
accept that "wait-a[o][me]". Just as in Java, no guarantee is made that
a waiting thread will ever be noticed and, subsequently, released.
Assuming the "wait-a[o][me]" has happened, the thread now blocks on the
"wait-b[o][me]" event. This cannot take place until the "WAIT[o][n]"
process receives a "notify[o][t]". See below.
Finally, this thread must compete, "claim[o][me]", against all its rivals
to reclaim the "LOCK[o]".
The Java code:
o.notify ();
is modelled by:
notify[o][me] --> SKIP
where "me" is the index of the thread executing this code. Note that this
does not involve releasing and, subsequently, re-claiming the "LOCK[o]",
which is as per the (till now unformalised) rules of Java. For the moment,
the model does allow an "o.notify ()" to be invoked without holding that
"LOCK[o]" -- but this is dealt with in the next section.
This "notify[o][me]" is always acceptable to a "WAIT[o][n]" process.
If "n" is zero, there are no threads waiting on "o" and the notification
is accepted but ignored. This is as required by Java.
If "n" is more than zero, there is at least one thread waiting on "o". Any
thread, "s", waiting on "o" will be trying to synchronise on "wait-b[o][s]".
So, when "WAIT[o][n]" accepts a "notify[o][t]", it will accept one of those
"wait-b[o][s]" events. If more than one is on offer, an arbitrary choice
is made. This is in line with Java's (informal) rules -- no promises are
made about `fair' releases of waiting threads upon a notify. After accepting
one of the "wait-b[o][s]" events, the "WAIT[o][n]" decrements its count.
The lucky thread "s", that has just been released from its "wait-b[o][s]",
now has to re-acquire the lock on "o" -- see the final "claim[o][me]" in
the "o.wait ()" model. No special privileges are made for getting this
lock back ... it has to compete with all other threads trying to claim it
... hence the "What, no chickens?" difficulty! This sad behaviour is
properly captured by this CSP.
An important observation is that "WAIT[o][n]" can never get stuck waiting
for a "wait-b[o][s]" -- one, at least, will always be available. Hence,
we can guarantee that "WAIT[o][n]", for *some* value of "n", will always
be able to service any "wait-a[o][t]" or "notify[o][t]" events. This is
what is meant be the phrase "is always acceptable" used twice above (and
below).
The above model does not enforce that a "notify[o][me]" will always be
accepted -- just that it is always acceptable. A Java thread executing
an "o.notify ()" probably should not block indefinitely, but should always
terminate having released a waiting thread (if one exists). However,
some synchronisation *must* be involved because data structures shared
with the "o.wait ()" method must be inspected and updated. It may be that
that synchronisation is `fair' and guarantees eventual servicing of the
"o.notify ()" ... but I've not seen that promise made in the (informal)
Java specification? Given that no promises are made with respect to entry
to a "synchronized" block or eventual release from a "wait", such a promise
would be inconsistent.
If special versions of Java emerge (e.g. for real-time applications) that
*do* make promises of eventual entry to "synchronized" blocks and release
from "o.wait ()" methods, we will have to model them with an extended CSP
that captures the notion of `eventual' or `fair' servicing of events.
Adrian Lawrence's CSPP, which addresses the notion of priority, would
be able to do this.
MONITOR RULES ON SYNC AND WAIT/NOTIFY:
======================================
A Java rule, assumed by the above CSP model, is that "o.wait ()" and
"o.notify ()" can only be invoked by threads that currently hold the
lock on object "o". This rule is enforced by the Java run-time system.
Java applications breaking this rule may compile, but an exception will
be thrown at run-time.
For checking out Java applications that do not break this rule, the
above model is sufficient. However, if we are concerned, the constraints
are trivial to model in CSP.
As already described, the rule for the "o.wait ()" is already imposed by
the current model. To get the rule for "o.notify ()", the simplest way
is to modify the "LOCK[o]" process to include all the "notify[o][t]"
events as well. Then, it can refuse any `notify' from a thread that has
not claimed it ... and accept a `notify' from a thread that has:
alpha (LOCK[o]) = {claim[o][t] | t in Threads} union
{release[o][t] | t in Threads} union
{notify[o][t] | t in Threads}
LOCK[o] = [] (claim[o][t] --> LOCKED[o][t] | t in Threads)
where:
alpha (LOCKED[o][t]) = alpha (LOCK[o])
LOCKED[o][t] = (release[o][t] --> LOCK[o]) [] (notify[o][t] --> LOCKED[o][t])
NOTIFY ALL:
===========
To complete the model of Java thread synchronisations, we need to capture
the "o.notifyAll ()". The meaning of this is to release *all* threads that
are waiting on the "o" -- i.e. have executed an "o.wait ()". If no threads
are waiting, that's fine and nothing happens as a result of the invocation.
This is easy: define a set of "notifyAll[o][t]" methods, add the relevant
subsets to the alphabets of "LOCK[o]" and "WAIT[o]" and make the obvious
changes. The "LOCK[o]" extension prevents a "notifyAll[o][t]" when thread
"t" does not have the claim. The "WAIT[o]" extension releases all currently
waiting threads -- no new threads will be able to execute an "o.wait ()"
until this has completed.
The complete model is presented in one piece in the next section.
SUMMARY OF THE MODEL:
=====================
Sets (Ennumerable):
-------------------
Objects, Threads
Events:
-------
claim = { claim[o][t] | o in Objects, t in Threads}
release = { release[o][t] | o in Objects, t in Threads}
wait-a = { wait-a[o][t] | o in Objects, t in Threads}
wait-b = { wait-b[o][t] | o in Objects, t in Threads}
notify = { notify[o][t] | o in Objects, t in Threads}
notifyAll = {notifyAll[o][t] | o in Objects, t in Threads}
`JVM' Processes:
----------------
There are two processes for each Object:
LOCK = {LOCK[o] | o in Objects}
WAIT = {WAIT[o] | o in Objects}
where:
alpha (LOCK[o]) = { claim[o][t] | t in Threads} union
{ release[o][t] | t in Threads} union
{ notify[o][t] | t in Threads} union
{notifyAll[o][t] | t in Threads}
LOCK[o] = [] (claim[o][t] --> LOCKED[o][t] | t in Threads)
and where:
alpha (LOCKED[o][t]) = alpha (LOCK[o])
LOCKED[o][t] = ( release[o][t] --> LOCK[o])] []
( notify[o][t] --> LOCKED[o][t]) []
(notifyAll[o][t] --> LOCKED[o][t])
Also:
alpha (WAIT[o]) = { wait-a[o][t] | t in Threads} union
{ wait-b[o][t] | t in Threads} union
{ notify[o][t] | t in Threads} union
{notifyAll[o][t] | t in Threads}
WAIT[o] = WAIT[o][0]
where:
alpha (WAIT[o][n]) = alpha (WAIT[o]), for all n >= 0
WAIT[o][0] = ([] (wait-a[o][t] --> WAIT[o][1] | t in Threads))
[]
([] (notify[o][t] --> WAIT[o][0] | t in Threads))
[]
([] (notifyAll[o][t] --> WAIT[o][0] | t in Threads))
and where, for n > 0:
WAIT[o][n] = ([] (wait-a[o][t] --> WAIT[o][n + 1] | t in Threads))
[]
([] (notify[o][t] -->
[] (wait-b[o][s] --> WAIT[o][n - 1] | s in Threads)
| t in Threads))
[]
([] (notifyAll[o][t] --> RELEASE[o][n] | t in Threads))
and where:
alpha (RELEASE[o][n]) = alpha (WAIT[o]), for all n >= 0
RELEASE[o][0] = WAIT[o][0]
and where, for n > 0:
RELEASE[o][n] = [] (wait-b[o][s] --> RELEASE[o][n - 1] | s in Threads)
Counts (could be processes):
----------------------------
count = {count[o][t] | o in Objects, t in Threads}
User processes:
---------------
Each started thread in an application/applet/whateveret is a process.
These processes form the ennumerated set:
USER = {USER[me] | me in Users}
where Users is a subset of (all Java) Threads.
The alphabet for USER[me] is:
alpha (USER[me]) = { claim[o][me] | o in Objects} union
{ release[o][me] | o in Objects} union
{ wait-a[o][me] | o in Objects} union
{ wait-b[o][me] | o in Objects} union
{ notify[o][me] | o in Objects} union
{notifyAll[o][me] | o in Objects}
Each "USER[me]" must initialise "count[o][me]" to zero, for each "o"
in Objects.
A Java "synchronized" block:
synchronized (o) {
P
}
is part of the life of "USER[me]", for some "me", and is modelled by:
IF
count[o][me] = 0
(claim[o][me] --> P') ; (release[o][me] --> SKIP)
count[o][me] > 0
P'
where:
P' = count[o][me]++ ; P ; count[o][me]--
The Java code:
o.wait ();
is modelled by:
release[o][me] --> wait-a[o][me] --> wait-b[o][me] --> claim[o][me] --> SKIP
The Java code:
o.notify ();
is modelled by:
notify[o][me] --> SKIP
The Java code:
o.notifyAll ();
is modelled by:
notifyAll[o][me] --> SKIP
A Complete System:
------------------
A complete Java multithreaded system is just the parallel composition of
all the `JVM' and user processes:
(|| (WAIT[o] | o in Objects))
||
(|| (LOCK[o] | o in Objects))
||
(|| (USER[t] | t in Users))
CONCLUSIONS:
============
See up-front!
Peter Welch.
(26th. March, 1999)