[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lightweight coroutines in C (take 2)

Hi All,

[Summary: this approach to lightweight threads puts a heavy penalty on sub-routine calls, and hence is no basis for a CSP kernel]

From the page:

For example, we would like to be able to write a function that says

|int function(void) {
   int i;
   for (i = 0; i < 10; i++)
       return i;   /* won't work, but wouldn't it be nice */

and have ten successive calls to the function return the numbers 0 through 9.

Python now supports this style of programming through "generators"

def zeroToNine():
   for i in range(10):
       yield i

The is called a "generator function", calling it gets you a "generator object" (the function does not run). Each call to the "next" method causes the function to run until it hits a yield, at which point the value of the yield is returned to the caller of "next". Call "next" again, and the function carries on where it left off. The call to "next" is often implicit, as in:

nums = zeroToNine()
for x in nums:
   print x

C# now has something very similar, although I've only galnced at that.

A number of people have observed that you can use this as a basis to build a lightweight threads mechanism. The basic idea is that you code all your threads up as generator functions, and then write a scheduler that invokes all of these, and calls the next method when it wants a "thread" to run. When that thread hits a yield, control returns to the scheduler.

There's a BIG snag. The threads can only yield at the top level. If you put a yield in a sub-routine, that routine becomes a generator itself. e.g.

def a():
   while True:
       ...do some work
       yield None
       ...do work

def b():
   ...do work
   yield None
   ...do work

This doesn't work. "b" was not called by the scheduler, it's next method is never called so it never runs. To make it work, the call to 'b()' has to be coded as:

for x in b(): yield x

In other words, 'b' has to yield to 'a' which in turn yields to the scheduler. In general, if your thread needs to yield control n sub-routine calls down, that costs you n context switches + the overhead of n loop iterations.

I find this totally unsuitable as a basis for implementing a CSP kernel. As far as I can make out, this C coroutine idiom suffers from the same problem. If this is what people mean by coroutine, I guess all implementations do.

A communicating-process approach to programming is such a powerful idea, that it seems to be inevitable. In ignorance of the right way to do it, people are just stumbling blindfolded in the general direction.

[Aside: A real lightweight thread mechanism for Python is available in Stackless Python, although Stackless is a somewhat marginal project. The PyPy project is building a new VM for Python which will also be 'stackless'. PyPy is a rather more serious project, with European (academic) funding]