Language Runtimes 2005

Validating the Productivity of Next Generation Multicore Programming Models

Presentations

Fundamentals of the Cell Broadband Engine

H. Peter Hofstee
IBM Systems and Technology Group

Abstract
This talk will explain the motivation for the architecture and implementation of the Cell Broadband Engine processor ("Cell"), and discuss performance aspects and some programming models we have explored. The intent of the talk is to explain the fundamentals and provide the necessary background to those interested in exploring programming paradigms for Cell.

Speaker Bio
H. Peter Hofstee is Cell BE Chief Scientist and Cell BE Synergistic Processor Chief Architect. He received his PhD from Caltech in 1995 and after teaching at Caltech for 2 years, joined the IBM Austin research laboratory (ARL) in 1996. At the ARL Peter led the logic design for two GHz PowerPC processor prototypes. In 2000 Peter started work on the concept for Cell processor and he has been working on Cell development since then.

An Applications Perspective of Multicore Programming Models

Yahya H. Mirza
Aurora Borealis Software LLC

Abstract
The objective of this presentation is to give a high level overview of the issues related to leveraging multicore programming models. Work in progress on a 3D computer animation “effect”, will be used as a motivating example. I will illustrate a portion of the “animation effect” or simulation, that utilizes Multigrid for the efficient solution of a linear elliptic partial differential equation, i.e. the Poisson equation.

Speaker Bio
Yahya H. Mirza is the founder of Aurora Borealis Software LLC. For the last four years, Yahya has been the organizer of the Language Runtimes (LaR) workshop series at the OOPSLA and Supercomputing conferences. Through these workshops, Yahya has had the pleasure of interacting with several really creative individuals who have made a deep impact in the computing industry, and have inspired him greatly. Yahya created the LaR workshops to provide a relaxed, non-formal environment to foster technically deep yet innovative discussions. Prior to entering the software industry, Yahya's background was aeronautical engineering. Yahya's interest in scalable high productivity programming solutions is driven by his passion to create and deliver online, a real-time interactive feature film.

An Overview of the NAS Multigrid Implementation

Bradford Chamberlain
Cray Inc.

Abstract
This talk will overview the implementation of the NAS Multigrid Benchmark.

Speaker Bio
Bradford Chamberlain is an employee of Cray Inc. and is working on the design and implementation of the Chapel language as part of Cray's Cascade project in the DARPA HPCS program. He received his PhD from the University of Washington where he contributed to the ZPL parallel array language.

NAS MG: Productivity Challenges and a Chapel-based Solution

Bradford Chamberlain
Cray Inc.

Abstract
In this talk, I will describe aspects of the NPB MG (multigrid) benchmark -- and its implementations in conventional parallel languages -- that I believe to be most at odds with productivity. I'll walk through some excerpts of the Chapel implementation of the benchmark, highlighting ways in which Chapel's features address these productivity challenges.

Speaker Bio
Bradford Chamberlain is an employee of Cray Inc. and is working on the design and implementation of the Chapel language as part of Cray's Cascade project in the DARPA HPCS program. He received his PhD from the University of Washington where he contributed to the ZPL parallel array language.

The Accelerator API: Data Parallel Computing on the Desktop Using GPUs

Jose Oglesby
Microsoft Research

Abstract
Computing power on the desktop keeps growing at an amazing rate. GPUs, for example, keep reaching higher performance levels every few months. However, harnessing this computing power is difficult for developers. The Accelerator project makes it easy to create applications that use new parallel computing capabilities effectively. Accelerator provides a high level data parallel programming model based on a data parallel array ADT in a .Net library. The library can be used from different .Net languages and is retargettable to a variety of hardware. In this talk we will describe the programming model through sample code from the NPB MG application.

Speaker Bio
Jose Oglesby is a Software Development Engineer in the Advanced Compiler Technology group in Microsoft Research. Jose has been working on programming language implementations since 1979. At companies such as Intermetrics (1979-1987), Multiflow(1987-1990), and Microsoft(1990-), Jose has worked on a large number of commercial and custom compilers and interpreters for a wide variety of languages.

X10: Computing at Scale

Vijay Saraswat
IBM TJ Watson Research Labs

Abstract
We present the design of X10, a modern programming language intended to address the twin challenges of high performance and high productivity on high-end computers operating with hundreds of thousands of hardware threads. At such scales, latency and bandwidth can vary widely across the machine, making it impractical to support (in mainstream architectures) a programming model based on uniform shared memory. Similarly, it does not appear that at such scales the SPMD model can provide the needed flexibility and performance (e.g. because of load imbalance).

X10 develops a few simple and powerful ideas. First, like other Partitioned Global Address Space languages, X10 abandons the notion of a monolithic address space: a *place* contains data, together with one or more activities operating on the data. A computation may have millions of places. Aggregate data-structures, such as multi-dimensional arrays, may be scattered across multiple places. Second, like Cilk, X10 emphasizes asynchronous, lightweight activities (recursive parallelism) rather than SPMD computations. An activity may spawn other activities locally or in remote places. Data must be operated upon only by activities in that place. Multiple activities mutating shared data may use atomic blocks (atomic stm) to ensure that data invariants are preserved. An activity may wait for subactivities to terminate before progressing (finish stm). Clocks (dynamic barriers) can be created dynamically and permit activities registered on the clock to progress determinately in coordinated phases.

Third, X10 leverages the flexibility, versatility and modularity of the mainstream object-oriented model. X10 may be viewed as an extension of sequential Java to support concurrency and distribution. Fourth, the X10 design guarantees that the programmer cannot commit large classes of errors. X10 adds to the guarantees (such as type safety, memory safety) it inherits from Java: X10 computations that do not use conditional atomic blocks cannot deadlock.
Taken together, these ideas provide a very rich, scalable and yet disciplined framework for concurrent programming over distributed data structures. X10 covers those areas which are today handled by a combination of OpenMP, MPI and SPMD languages. We illustrate through several programming examples (e.g. from NAS parallel benchmarks) and discuss implementation issues.

Joint work with Vivek Sarkar, Kemal Ebcioglu, Christoph von Praun, Christian Grothoff, Radha Jagadeesan, Philippe Charles, Allan Kielstra, Christopher Donawa and Armando Solar-Lezama. Work is funded in part by DARPA under the PERCS HPCS project.

Speaker Bio
Vijay Saraswat joined IBM Research in 2003, after a year as a Professor at Penn State, a couple of years at startups and 13 years at PARC and AT&T Research. His main interests are in programming languages, constraints, logic and concurrency. At IBM, he leads the work on the design of X10, a modern object-oriented programming language intended for scalable concurrent computing.

Programming the Memory Hierarchy

Timothy Knight
Stanford University

Abstract
I will be presenting a new programming model that is the subject of an ongoing research effort within Stanford's Computer Systems Laboratory. We abstract both exposed-communication architectures (e.g. the CELL and Stanford's Merrimac processor) and conventional systems (e.g. P4 clusters) as hierarchies of software-managed memories and present the programmer with a language in which they can express the decomposition of their programs onto such hierarchies in a portable manner. The programming model accepts that some amount of hand-tuning by the programmer is needed for non-trivial applications to realize high performance levels, but separates this target-specific tuning from the expression of the algorithm to yield programs which are source-code-portable across differing target architectures.

Speaker Bio
Timothy Knight is a Ph.D. student in the Computer Science department of Stanford University and is a member of Professor Bill Dally's Concurrent VLSI Architecture (CVA) group. His past work has involved stream architectures for scientific computing, and his current research interests include languages and compilers for exposed-communication architectures.

Mapping a High Level Algorithm Description to a Streaming Runtime

Richard Lethin
Reservoir Labs, Inc.

Abstract
An ideal is to separate the high level expression of an algorithm from the concrete realization on a piece of hardware, with automated translation from the high level expression of the algorithm to themapped expression for a particular machine. Moving to this ideal results in problems of algorithm expression, automated mapping, and the expression of mapping. Such problems are complicated by the evolving features of advanced hardware.

I will discuss Reservoir's work to solve some of these problems, particularly the R-Stream compiler and the Streaming Virtual Machine, illustrated by examples from the workshop, and with reference to other solutions."

Speaker Bio
Richard Lethin is President of Reservoir Labs, Inc. an independent systems research company with a focus area in advanced compilers, and PI on a research program to develop High Level Compilers within the DARPA Polymorphous Computer Architecture program.

Parallel Programming Technology: Can We Please Do It Right This Time?

Timothy G. Mattson
Intel Corporation

Abstract
The computer industry has a problem. In the near future, our products will change from having a single CPU core per chip to multiple cores. When placed in SMP systems, clusters, and large scale grids, parallel systems will be ubiquitous. And if something isn't done soon to convert the key application software into a form that can exploit parallelism, these great parallel systems will only be marginally useful.

Where will this parallel software come from? Currently, with few exceptions, only graduate slaves and other strange people are willing to write parallel software. Professional software engineers almost never write parallel software. It's uncomfortable to admit this, but after almost two decades of hard work, we just haven't figured out how to attract normal programmers to parallel computing. How can we solve this problem?

We believe the first step is to go back to basics and understand the parallel programming problem from the programmer's perspective. In other words, we must first understand how a programmer thinks about algorithms in general and then use this understanding to craft a new approach to writing parallel software. Our approach uses the formalism of design patterns to capture the process of reasoning about parallel algorithms. Our pattern language has now been released as a book, and over the next several months we will work with established parallel software engineers to validate our language and correct its inevitable short comings. Once an effective, consensus parallel pattern language is established, we can investigate the tools and APIs needed to help programmers turn a design in terms of these patterns into real software.

Speaker Bio
Tim Mattson earned a PhD. for his work on quantum molecular scattering theory (UCSC, 1985). This was followed by a Post-doc at Caltech where he worked on the Caltech/JPL hypercubes. Since then, he has held a number of commercial and academic positions with high performance computers as the common thread. Application areas have included mathematics libraries, exploration geophysics, computational chemistry, molecular biology, and bioinformatics.

Dr. Mattson joined Intel in 1993. Among his many roles at Intel, he was applications manager for the ASCI teraFLOPS project, helped create OpenMP, founded the Open Cluster Group (OSCAR), and launched Intel's programs in computing for the Life Sciences.

Currently, Dr. Mattson is conducting research on abstractions that bridge across parallel system design, parallel programming environments, and application software. This work builds on his recent book on Design Patterns in Parallel Programming (written with Professors Beverly Sanders and Berna Massingill and published by Addison Wesley). The patterns provide the "human angle" and help keep his research focused on technologies that help general programmers solve real problems.


Best viewed on a browser which supports cascading style sheets