Clustered workflow execution of retargeted data analysis scripts

Abstract

Supercomputing advances have enabled computational science data volumes to grow at ever increasing rates, commonly resulting in more data produced than can be practically analyzed. Whole-dataset download costs have grown to impractical heights, even with multi-Gbps networks, forcing scientists to rely on server-side subsetting and limiting the scope of data they can analyze on a workstation. Our system supplements existing scientific data services with lightweight computational capability, providing a means of safely relocating analysis from the desktop to the server where clustered execution can be coordinated, exploiting data locality, reducing unnecessary data transfer, and providing end-users with results several times faster. We show how dataflow and other compiler-inspired analyses of shell scripts of scientists' most common analysis tools enables parallelization and optimizations in disk and network I/O bandwidth. We benchmark using an actual geoscience analysis script, illustrating the crucial performance gains of extracting workflows defined in scripts and optimizing their execution. Current results quantify significant improvements in performance, showing the promise of bringing transparent high-performance analysis to the scientist's desktop. © 2008 IEEE.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 91,349

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Similar books and articles

Scientists' Responses to Anomalous Data: Evidence from Psychology, History, and Philosophy of Science.William F. Brewer & Clark A. Chinn - 1994 - PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1994:304 - 313.
Pitfalls and promises: The use of secondary data analysis in educational research.Emma Smith - 2008 - British Journal of Educational Studies 56 (3):323-339.

Analytics

Added to PP
2017-03-08

Downloads
2 (#1,784,141)

6 months
2 (#1,232,442)

Historical graph of downloads

Sorry, there are not enough data points to plot this chart.
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references