Scaling spark on HPC systems

Abstract

Copyright © 2016 by the Association for Computing Machinery, Inc..We report our experiences porting Spark to large production HPC systems. While Spark performance in a data center installation is dominated by the network, our results show that file system metadata access latency can dominate in a HPC installation using Lustre: it determines single node performance up to 4× slower than a typical workstation. We evaluate a combination of software techniques and hardware configurations designed to address this problem. For example, on the software side we develop a file pooling layer able to improve per node performance up to 2.8×. On the hardware side we evaluate a system with a large NVRAM buffer between compute nodes and the backend Lustre file system: this improves scaling at the expense of per-node performance. Overall, our results indicate that scalability is currently limited to O cores in a HPC installation with Lustre and default Spark. After careful configuration combined with our pooling we can scale up to O. As our analysis indicates, it is feasible to observe much higher scalability in the near future.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 91,963

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Similar books and articles

Making a Difference.Peter Lipton - 1993 - Philosophica 51.
Brittle System Analysis.Stephen F. Bush, John Hershey & Kirby Vosburgh - forthcoming - Arxiv Preprint Cs/9904016.
Ignition systems and spark plug requirements.R. C. Teasel & R. D. MUlerf - 1968 - In Peter Koestenbaum (ed.), Proceedings. [San Jose? Calif.,: [San Jose? Calif.. pp. 182--15.
In Search of a Pragmatic Systems Method.Steven A. Cavaleri - 2011 - World Futures 67 (4-5):266 - 281.
Network Management of Predictive Mobile Networks.Stephen Bush, Frost F., S. Victor, Joseph Evans & B. - 1999 - Journal of Network and Systems Management 7 (2).

Analytics

Added to PP
2017-05-17

Downloads
1 (#1,902,042)

6 months
1 (#1,472,961)

Historical graph of downloads

Sorry, there are not enough data points to plot this chart.
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references