[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

CSP, JMM and better processor architectures

There has also been some discussion about:

> The new Java Memory Model (JMM -- JSR133) is not an earth-shattering
> revision -- it mainly just clarifies atomicity, visibility, and
> ordering rules (replacing the JLS spec chapter 17). But in doing so,
> it helps reveal the underlying continuum between shared-memory and
> messaging.  For example, it points out cases where updates to a
> variable by one thread MAY be completely ignored by another thread. In
> general, any race condition due to lack of use of "synchornized",
> "volatile" and/or "final" may (but need not) be optimized away by
> compilers, CPUs, cache controllers, DSM subsystems etc. The main
> parallel to messaging is that both parties must cooperate in order to
> reliably communicate via a shared variable, but here the cooperation
> is mainly just via declarations (synch/volatile/final).

I've been observing the discussions on the JMM mailing list and am pretty
scared!  Here's a typical mind bender:

> Example 1
> Initially, x = y = 0
> Thread 1:
> r1 = x
> if r1 >= 0
>    y = 1
> Thread 2:
> r2 = y
> x = r2
> Question: is it legal for this program to result in r1 = r2 = 1?

Of course, the example looks strange to occam people since the equivalent
logic, written in occam, would fail to compile (with a parallel useage error).
Occamists should therefore imagine parallel useage checks suppressed (as
if!) and that the threads (processes) are really being executed in parallel
on a shared-memory architecture.  We must also bear in mind that:

>                           ...  Machines, especially SMPs, have gotten
> substantially more complicated over the past decade (caching,
> out-of-order execution, CPU-level parallelism, etc).

The answers to the above question exercised the JMM mailing list for some
time.  The answer is not trivial and makes my head hurt.  Such questions
are what the new Java Memory Model (JSR133) is trying, and has, to answer.

Doug added:

>                        ...  But in CSP-style applications, only those
> people implementing message-passing primitives etc will need to deeply
> understand these rules. Similar comments apply to other styles.

which is absolutely right.  If we were implementing an occam kernel for
such a shared-memory architecture, we would certainly have to understand
those rules.  Channels (other event "objects" and scheduling queues) are
shared between concurrent processes and getting the kernel algorithms to
work this sharing correctly is the stuff of nightmares.

Gerald observed:

> I agree and I hope this (RT) JMM stuff is not meant for the average (RT)
> Java programmer.

and Doug added:

> It would be great if this helps propel better unifying frameworks,
> possibly some new language constructs, and ultimately some better
> underlying theory.

Right.  The problem with defining this JMM is that Java programmers have
to be aware of it.  They can write code like "Example 1" above and need
to be know what it means!  This is not right ... it's far too hard and
will always lead to tears for all but the superhero concurrent programmers
(and even they will be a little anxious).

Gordon wrote:

> The changes in hardware architecture absolutely must filter up the stack 
> all the way to the programmers mind.

This is what we must resist!  Improvements in machine architecture must not
force unnatural reasoning requirements on programmers.

>                                ...  A programmer level model is necessary
> to displace the natural and incorrect 'sequential consistency' model we
> all tend to start out with.

I want to keep a natural view of the world :( ...

>                   ...  Thus, the JMM stuff is a friend to java programmers,
> it helps them deal with the real world problems that exist whether the JMM
> does or not. 

Indeed, it is a friend to Java programmers.  But that's not because of having
to deal with real world problems - it's because of having to deal with Java
problems and modern machine architecture problems - see below.

> The CSP model avoids the problems here not because of its many other good 
> qualities but due to the simple fact that threads do not passively share
> state data (memory).

This is the occam/CSP model you are talking about.  Yes, that's right - but
that simple fact is one of its most important good properties.  We claim
that, once we have the shared sync primitives sorted, we need never burdon
ourselves again with the problems of passively shared data that the JMM
addresses - but which, even with the rules made precise, are too difficult
to deal with on an everyday basis.  We get the superheroes to deal with
them once (for the sync and scheduling mechanics) and then relax!

Java/C++/C# etc all require coping with these memory problems.  Therefore,
they are not suitable for common use.  Maybe, just maybe, design patterns
can be defined for them whose semantics are worked out and proven safe.
Then, we might be able to train mortals to stick to these and get our
multithreading safe.

But CSP gives Doug's requested "better underlying theory" and CSP packages
for Java (JCSP/CTJ) are a basis around which safe patterns can be taught.
And occam gives the "new language frameworks" from which safe languages can
be devised - where safe means that the right design patterns do not have
to be learnt (since they are unavoidable).

The problems with memory models and concurrency also tell us about the poor
state of modern processor architecture.  David May pointed out 20 years ago
(with the transputer) that it's daft using memory loads and stores to try
to synchronise concurrent processes.  You need special instructions - both
for efficiency reasons (they can be microcoded to perform complex operations,
like run-queue management) and to avoid the problems of caching, out-of-order
execution, CPU-level parallelism, etc.  Actually, the transputer didn't have
the latter aspects to deal with but they are relevant today!

So, what (I think?) David is now saying is that all those hard problems of
shared data structures (e.g. run-queues, channels, CREW-locks etc.) should
be handled by the architecture.  There are only a finite number of such
synchronising primitives needed.  Do all the serial computation stuff with
usual memory loads and stores.  Let them be rearranged by the processor as
it wills - the programmer (whose processes will be guaranteed not to be
passively sharing any memory locations) need not be concerned.  DO NONE OF
the process synchronisation and scheduling stuff with normal memory loads
and stores.  Do them with special instructions whose logic is executed (with
"sequential consistency") by separate synchronising hardware on the side of
the superscalar optimised (or whatever) serial computation processor(s).
The synchronisation processor does not need to be excessively pipelined or
indulge in out-of-order execution - synchronisation ops (which includes
channel communication) will not be needed at the same frequency as those
for computation.  The synchronisation processor *will* be loading and storing
to ordinary memory, but we know that the locations it is using will not be
being used by any of the computation processors.  At least, we will know
that so long as the software compiled for it comes with the same safety
guarantees that occam pioneered.

Obvious really ... ;) ;) ;)

How long to wait for this to sink in to those with the resources to build it?