[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Occam channel based on polling techniques !
Dear all,
There is small rectification on the Channel class I wrote with ComsTime.
The Channel type is NOT a memory-less channel as in Occam but a memory
channel. Peter noticed this... thanks.
In the search for a channel with the best performance I ended up this
one-place memory channel. I call it a fast channel (APPROACH 4).
This channel is derived from several other Occam like channels.
APPROACH 1: memory-less channel with wait() and notify() (failure)
APPROACH 2: memory-less channel with yield() context switching
APPROACH 3: modification of APPROACH 2 with renamed methods in as
wwread and out as wwrite and new methods sread and swrite
to construct the PAR.
APPROACH 4: modification of APPROACH 3 but as a memory channel for better
performance (implemented in channel with ComsTime).
(In the following code exception handling is omitted)
/* APPROACH 1 */
The first approach was based on the monitors wait() and notify(). Waiting
clients are be placed on a waiting queue when communication is not ready
and returned to the processing queue when communication is ready.
Thread writer, reader;
int buffer;
boolean writeready = false;
public synchronized void out(int value)
{
buffer = value;
writeready = true;
writer = currentThread(); // must be outside this method
reader.notify();
writer.wait();
}
public synchronized int in(int value)
{
if (!writeready)
{
reader = currentThread(); // must be outside this method
writer.wait();
}
writerready = false;
reader.notify();
return buffer;
}
This approach has four problems:
1. it is slow and fragile.
2. a context switch before writer.wait() or reader.wait() can cause dead-
lock.
3. constructing a PAR is only possible with input/output threads (Oyvind
Teig named it a kicker). Making a kicker was also my idea but I found
this complex and the fear for overhead made me searching for an other
solution.
4. The writer = Thread.current() and reader = Thread.current() statements
must be brought outside in and out else it won't work. This makes the
use of the channel difficult.
Because I ran into these problems I tried to find another solution with
the use of a context switching. I have no test results and I tested this
in a simple program.
Peter's CHAN_OF_INT is the right sollution here.
The second approach was based on the idea of scheduling on context
switching by channel communication. In Java a Thread.yield() causes a
context switch.
/* APPROACH 2 */
public void out(int value)
{
while (!writerready) Thread.yield();
buffer[0] = value;
writeready = true;
while (writerready) Thread.yield();
}
public void in(int value[])
{
while (!writeready) Thread.yield();
value[0] = buffer[0];
writerready = false;
}
This channel performed much better (in ComsTime between 2.7 and 2.1 ms)
and it is simpler than the previous one. The yield method is written in
native code. This could be the reason why this is faster than wait() and
notify().
If the current running thread yields the CPU, then the scheduler
implements a simple non-preemptive round-robin scheduling order.
Approach 3 is an update to the previous one and I renamed in as wread and
out as wwrite. This is because I add sread and swrite to construct a PAR.
In wwrite the first yield spin is deleted.
/* APPROACH 3 */
public boolean wwrite(int value)
{
buffer[0] = value;
writeready = true;
while (writerready) Thread.yield();
return true;
}
public boolean wread(int value[])
{
while (!writeready) Thread.yield();
value[0] = buffer[0];
writerready = false;
return true;
}
The sread and swrite are used for polling. Using sread and swrite makes
the channel memory-less.
public boolean swrite(int value)
{
if (writerready) return false;
buffer[0] = value;
writeready = true;
while(writeready) Thread.yield(); // delete this for the fast
channel
return true;
}
public boolean sread(int value[])
{
if (!writeready) return false;
value[0] = buffer[0];
writerready = false;
return true;
}
The performance is between 2 and 1.1 ms. I got better performance with a
memory channel of one element (on-place FIFO) which I used in the channel
class of ComsTime (between 1.5 and 0.6 ms), see below.
/* APPROACH 4 */
public boolean wwrite(int value)
{
while (writerready) Thread.yield();
buffer[0] = value;
writeready = true;
return true;
}
Methods wread, sread and swrite stay the same.
The return flag lets the client know if the read/write is succeeded or
not.
Method swrite returns false if the channel is not read or if the buffer
is full, else it returns true. Method sread returns false if no value is
written to the channel or if the buffer is empty. (resp. memoryless and
memory channel). The swrite and sread are used for a polling mechanism to
contruct parallel input and/or ouput behavior.
The PAR-out construct is:
done1 = done2 = ... = donei = false;
do
{
if (!done1) done1 = swrite(value1);
if (!done2) done2 = swrite(value2);
..
if (!donei) donei = swrite(valuei);
if (!(done1 & done2 & .. & donei)) Thread.yield();
} while (!(done1 & done2 & .. & donei));
The PAR-in construct is:
done1 = done2 = ... = donei = false;
do
{
if (!done1) done1 = sread(value1[]);
if (!done2) done2 = sread(value2[]);
..
if (!donei) donei = sread(valuei[]);
if (!(done1 & done2 & .. & donei)) Thread.yield();
} while (!(done1 & done2 & .. & donei));
Both PAR-constructs can be mixed.
The PAR-in-out is:
done1 = done2 = ... = donei = false;
do
{
if (!done1) done1 = sread(value1[]);
if (!done2) done2 = swrite(value2);
if (!done3) done3 = swrite(value3);
if (!done4) done4 = sread(value4[]);
..
if (!donei) donei = swrite(valuei);
if (!(done1 & done2 & .. & donei)) Thread.yield();
} while (!(done1 & done2 & .. & donei));
The PAR-constructs are "busy-loops" by only one cycle. The
"if(!(done1...donei)) Thread.yield();" forces a context switch when one
of the done flags is false. If the loop has run once and one or more done
flags are false an other run does not change the done flags. A context
switch prevents this idle situation. If the loop is large then every run
can takes time. This can decrease performance. As long as the PAR is
short this technique performs well.
The variables value1[] and value4[] carry the value returned by sread.
Because Java does not allow pointers and does not know a by reference
modifier, a one-array is used to return the value.
The PRI-PAR, FAIR-ALT and PRI-ALT are can also be constructed with sread
and swrite. More information about this I wrote in the paper Concurrent
Programming in Java (draft). This paper describes new ideas and it will
be updated from time to time.
(ftp//ftp.rt.el.utwente.nl/pub/Java/ComsTime/comstime.ps)
A nice example of sread and swrite is the advanced delta building block
advDelta. advDelta can create parallel outputs by passing the number of
outputs as an argument together with an array of channels. Extension at
run-time could also be a feature.
Class advDelta ....
.....
.....
class advDelta extends Threads
{
protected boolean keepRunning = true;
protected Channel chanin, chanout[];
int n;
public AdvancedDelta(Channel ci, Channel co[], int cn)
{
chanin = ci;
chanout = co;
n = cn;
}
public void run()
{
int x[] = new int[1];
boolean done[] = new boolean[n];
boolean alldone;
int i;
while (keepRunning)
{
chanin.wread(x);
for (i=0; i<n; i++) done[i] = false;
do
{
alldone = false;
for (i=0; i<n; i++)
{
if (!done[i]) done[i] = chanout[i].swrite(x[0]);
alldone = alldone | !done[i];
}
if (alldone) Thread.yield();
} while(alldone);
}
}
}
In the same way you could make an advPlus, advMultiplexer,
advDemultiplexer, or a neuron with many input and output channels. This
construction does only work on channel input and output. Using a kicker
would perform slower I believe. However a kicker could run formulas in
parallel.
PAR
b := 2*a
c := k*d*3.14
Running formulas in parallel is I believe something to be careful.
Processors with instruction pipelining perform better on sequencial
formulas (statements) then formulas running in random order. And there is
no scheduler overhead.
SEQ
b := 2*a
c := k*d*3.14
The compiler optimizes these statements for the processor and cache. Nops
in a instruction pipeline or extra cache flushes are time consuming.
DYNAMIC CHANNEL BEHAVIOR
The channel that came with ComsTime has three different behaviors (fast,
fifo, lifo) but the code is hard to read and difficult to maintain as
well. This is because I have experimented with this channel to get the
best performance.
Now, Im writing a channel with more object-oriented features which makes
the channel extensible with there behaviors. My opinion is that a channel
can have more behaviors than just an Occam channel. The behavior should
be easy to be configured at run-time. For certain applications a
particular channel could perform better or it could make the program
behave more dynamically. You have to be careful with this. It think it is
a research worth it.
Imagine a channel with a buffer where the buffer size can range between
zero and the available memory; buffersize=n where 0 <= n <= m, m =
available memory (not 1 <= n <= m). The buffer size determines the
communication behavior. When buffersize = 0 then the channel behavior is
fully synchronized. The channel is memoryless and behaves as an Occam
channel. When buffersize = n and 1 <= n <= k (k is small, k=2) then the
channel is quasi-synchronized. If the channel is empty or full the
channel is synchronized. If the channel is not empty or not full the
channel is asynchronuous. This is reader or writer process dependent.
When buffersize>>1 (large enough) and the empty and full states are never
reached, the channel behavior is fully asynchronuous. The buffer can also
be of FIFO or LIFO type. The buffer is by default FIFO. The LIFO type can
be used as a stack mechanism. The channel can be configured as a
memory-less or as a memory channel.
Declaration of a channel is as follows. Here the Channel declaration is
for integer data types. Channel configurations are given by arguments.
Examples are,
Channel ch1 = new Channel(); // memoryless channel
//(Occam channel)
Channel ch1 = new Channel(0); // same as previous
Channel ch1 = new Channel(2); // memory channel with two
// buffer elements (FIFO)
Channel ch1 = new Channel(2,FIFO); // same as previous memory
Channel ch1 = new Channel(2,LIFO); // channel with two buffer
// elements (LIFO) where data
// is read in reverse order
The behavior of channels can be adjusted at run-time by statements as,
ch1.setBufferSize(1); // reconfigure buffer size to
1
// (solve deadlock situation)
There are other applications, such as buffer is to small, make it larger.
There are more features to build in. Think of debugging and monitoring
channels at run-time.
Now I have four behaviors:
1. rendezvous channel (occam channel)
2. fast channel (one-element buffer)
3. fifo channel (n-element queue buffer)
4. lifo channel (n-element stack buffer)
The same wread/wwrite and sread/swrite methods can be used for all the
different channels. No radical changes in the program are necessary.
IMPORTANT
There are some limits to this technique. Method synchronization is
impossible because Thread.yield() does not release the monitor. Therefor
a shared buffer with multiple readers and writers is impossible to
implement. In addition, threads can only yield the CPU to other threads
of the same priority. Attempts to yield to a lower priority are ignored.
This conclusion is interesting.
The performance and the flexibility of this polling technique drove my in
this direction.
Regards,
Gerald.