Workshop on Enabling Data-Intensive Computing: from Systems to Applications


July 30-31, 2009

Pittsburgh, PA



Sponsored by:

Friday, July 31, 2009

(At 531 Alumni Hall)



8:30 AM– 9:00 AM

Continental Breakfast

9:00 AM– 10:20 AM

Snapshot Presentations by Workshop Participants

Enabling Knowledge Discovery in a Virtual Universe (abstract, slides)

Jeffrey Gardner (U. of Washington)

Green Bank Telescope Focal Plane Array Data Processing Challenges (abstract, slides)

John Ford (National Radio Astronomy Observatory)

Petascale Visualization and Analysis - A Data Intensive Application on Numerically Intensive Supercomputers (abstract, slides)

David Daniel (Los Alamos National Labs)

Collaborative Data Life- cycle Management - until we invent the "Storage Time Machine" (abstract, slides)

Arun Jagatheesan (San Diego Supercomputer Center)

RIVER; A Dynamic Data Caching System (abstract, slides)

Tanu Malik (Purdue U.)

Automated Finger Pointing in Data-Intensive Distributed Systems (abstract, slides)

Priya Narasimhan (Carnegie Mellon U.)

Frameworks to Support Multi-Platform Distributed Data Intensive Applications (abstract, slides)

Shantenu Jha (LSU)

Current and Future Data Intensive Computing at DOE BES User Facilities (abstract, slides)

Steve Miller (Oak Ridge National Labs)

10:20 AM– 10:45 AM


10:45 AM– 1:15 PM

Breakout Sessions + Lunch

Science: Data Intensive Computing Models

Leader: Ian Foster


Technology: Resource management in Data Intensive systems

Leader: Carlos Maltzahn


Applications: How will applications drive future Data Intensive Systems

Leader: Phillip Gibbons

1:15 PM– 2:15 PM

Plenary Session – Presentations by Breakout Leaders

Thursday, July 30, 2009

(At 531 Alumni Hall)



8:30 AM– 9:00 AM

Continental Breakfast

9:00 AM– 9:10 AM

Welcome and Introduction

Rami Melhem (University of Pittsburgh)

9:10 AM– 10:00 AM

Opening Session

Session Chair: Rami Melhem (U Pitt.)

Data-Intensive Computing: The Prospects and the Challenges (slides)

Randy Bryant  (Carnegie Mellon U.)

NSF Program on Data Intensive Computing

Taieb Znati (The National Science Foundation)

10:00 AM– 10:30 AM


10:30 AM– 12:00 PM

Applications and Algorithms

Session Chair: Nick Nystrom (Pitt. Supercomputing Center)

Scientific Discovery through Advanced Computing (slides)

Lucy Nowell (U.S. Department of Energy, Office of Science)

Large Scale DNA Sequence Analysis using MapReduce, MPI

and Threading (abstract, slides)

Judy Qiu (Indiana U.)

Social Science TeraGrid Gateway At Virtual RDC (abstract, slides)

John Abowd (Cornell U.)

Mining the Transient Sky

Michael Wood-Vasey (U. Pittsburgh)

12:00 PM– 1:30 PM


1:30 PM– 3:00 PM

Hardware Architectures

Session Chair: Narayana Tummala (Google Pitt.)

Reliability Issues in Data Intensive Computing (abstract)

Mootaz Elnozahy (IBM)

The Ceph Distributed Object-Based Storage System

Scott Brandt (U. California at Santa Cruz)

Building Energy-Efficient Clusters for I/O-Intensive Workloads: A Fast Array of Wimpy Nodes (FAWN) (abstract, slides)

David Andersen (Carnegie Mellon U.)

The Role of New Memory Structures in Data Intensive Computing (abstract)

Mazin Yousif  (U. Arizona)

3:00 PM– 3:30 PM


3:30 PM– 5:00 PM

Software Architectures and Frameworks

Session Chair: Phillip Gibbons (Intel Research Pitt.)

Runtime Query Management (abstract, slides)

Magda Balazinska (U. Washington)

Ultraparallel Computing for Data Analysis (slides)

Roger Barga (Microsoft Research)

Big Data Meets Streaming Data (abstract, slides)

Dave O’Hallaron (Intel Research)

Technologies for Data-Intensive Science (abstract, slides)

Ian Foster (Argonne National Labs)