[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Scientific processors



Denis, 

Regarding bit-exactness.

On 25 Nov 2020, at 09:28, Denis A Nicole <dan@xxxxxxxxxxxxxxx> wrote:

3. In most systems, the real load is taken by the floating point units.  The IEEE standard is important here for several reasons.

  1. Floating point arithmetic is famously not associative. This heavily restricts the optimisations which can be performed while retaining bit-for-bit identical results. You either accept that the answers can change, write your code very carefully to pre-implement the optimisations, or go slow.
Non-associativity is a problem. It’s why these two pieces of code can give different results for the value sum

sum = 0.0
for i = 0 for x.size
    sum = sum + x[I]

and

sumFirstPart = 0.0
for i = 0 for x.size/2
    sumFirstPart = sum + x[I]

sumSecondPart = 0.0
for i = x.size/2 for (x.size - x.size/2) 
    sumSecondPart = sumSecondPart + x[I]

Sum = sumFirstPart + sumSecondPart

The second code segment shows exactly what you want to do to perform the two partial sums in parallel. I believe that in some real world systems the decisions about parallelisation get made at run-time, depending on what the computer is doing at the time, and so different runs on identical data give rise to different results.


4. Getting bit-for-bit matching answers from consecutive runs is really difficult. Obviously, we need to seed our PRNGs consistently, but there are all sorts of internal code and optimisation issues that break things. This leads to real difficulty in verifying benchmark tests. Overlaid on this are sometimes genuine instabilities in important scientific codes. For example, global ocean models can be very hard to "spin up"; you need exactly the right initial conditions or the circulation never converges to that of the planet we know. This may not even be a problem in the models; perhaps the real equations of motion have multiple fixed points? There are similar difficulties in molecular dynamics around hydrogen bonding. Sadly, that is the case we care about most; it covers protein folding in hydrated systems.


The non-associativity of f.p. arithmetic is the cause of many problems. Is the repeatability problem you mention due to effects other than this?

Roger



--
Roger Shepherd




Attachment: signature.asc
Description: Message signed with OpenPGP