Scalable abstractions for general-purpose parallel computation

Dear all,

I recently completed a PhD at the University of Bristol, in which I investigated the practical aspects of a highly-parallel model of computation. I thought it would be worth posting about it here since the work is based on occam, CSP and Valiantâs theory of universal communication networks, and might be of interest to some of you.

I have included the abstract and a link to the PDF below but to very briefly summarise: it presents designs for a practical and scalable parallel architecture, which can be implemented with current technologies, and a programming language, which can be compiled to the architecture using simple techniques. The architecture uses the XMOS processor and the language builds on occam with new notations that provide ways to easily build and structure parallel programs.

I hope you enjoy it and would like to hear your thoughts.

Regards,

Jamie

http://jwhanlon.com/thesis.php

http://jwhanlon.com/docs/thesis.pdf

Abstract

Parallelism has become the principal means of sustaining growth in computational performance but there has been relatively little development in general-purpose computer architectures or programming models that can deal effectively with large amounts of it. A new general-purpose model of parallel computing would enable standardisation between architectures, high-volume production and software that is portable between different machines, now and as they develop with future technology. There is substantial opportunity to support this in emerging areas of embedded computing, where the problems of sensing, interaction and decision making can exploit large amounts of parallelism.

This thesis demonstrates the essential aspects of a scalable general-purpose model of parallel computation by proposing a Universal Parallel Architecture (UPA), based on a highly-connected communication network, and a high-level parallel programming language for it called sire that can be compiled using simple techniques. The design of sire combines the essential capabilities of shared-memory programming with the benefits of message passing to support a range of programming paradigms and to provide powerful capabilities for abstraction to build and compose subroutines and data structures in a distributed context. The design also enables program code to be distributed at run time to reuse memory and for processor allocation to be dealt with during compilation so that the overheads of using distributed parallelism are minimal.

To evaluate whether the UPA is practical to build, a high-level implementation model using current technologies is described. It demonstrates that the cost of generality is relatively small; for a system with 4,096 processors, an overall investment of around 25% of the system is required for the communication network. Executing on specific UPA implementations, sire's primitives for parallelism, communication and abstraction incur minimal overheads, demonstrating its close correspondence to the UPA and its scalability. Furthermore, as well as executing highly-parallel programs, the UPA can support sequential programming techniques by emulating large memories, allowing general sequential programs to be executed with a factor of 2 to 3 overhead when compared to contemporary sequential machines.