[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: commstime not scaling



Thanks Ruth, that is an interesting insight!

The Dell is indeed a P-III (coppermine), and the Compaq seems to be a P-IV (prescott).

I didn't try any variation in terminals for this round (just a standard off-the-shelf KDE terminal), since commstime only outputs between timing loops, but for another program I noticed that using the console vs. a KDE term under X resulted in an order of magnitude penalty for the same output.  Very surprising!

Probably there is no good reason, just a lack of maintenance in the console code (or the implementation of character mode in the graphics chipset).

Alan

On 11/27/05, Ruth Ivimey-Cook <Ruth.Ivimey-Cook@xxxxxxxxxx> wrote:

Alan,

 

My first though on this problem is that your Dell could easily be a Pentium-III, and the Compaq is probably a Pentium 4. The P4 is widely noted for having a very long pipeline (some 24 stages if memory serves) while the P3 is under half that. Commstime is a benchmark with an extraordinarily large number of jumps in it; almost none of the code is inline. Therefore, it makes sense that the P4 will have a harder time of it than the P3.

 

You're right to worry about the memory bandwidth; the pipeline misses will cost much more if the code is missing L1 cache much.

 

I too have found that xterm is consistently the fastest terminal around, beating even a plain console in some cases.

 

Hope this helps,


Ruth

 

I was recently comparing commstime values for my python implementation of CSP-style primitives and kroc and came away with some surprising (to me anyway :o) results.  I tried the commstime metrics on two different machines, a 2.4 GHz Compaq laptop and a 1.0 GHz Dell. 

for the python implementation (using threading.Thread):

Compaq: 935 millseconds for the commstime loop
Dell: 895 milliseconds

for kroc (1.4.0-pre2):

Compaq: 440 nanoseconds
Dell: 385 nanoseconds

It wasn't terribly suprising to me that the results for the python implementation would be similar (no chance it would fit in cache, using OS scheduled threads, etc), but it was suprising to me that results for kroc were similar on both machines and that in both cases the slower machine had better results.