Article

FlashGraph: processing billion-node graphs on an array of commodity SSDs

Authors:

Joshua Vogelstein,

Carey E. Priebe,

Alexander S. SzalayAuthors Info & Claims

FAST'15: Proceedings of the 13th USENIX Conference on File and Storage Technologies

Pages 45 - 58

Published: 16 February 2015 Publication History

Abstract

Graph analysis performs many random reads and writes, thus, these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the graph size. We demonstrate that a multicore server can process graphs with billions of vertices and hundreds of billions of edges, utilizing commodity SSDs with minimal performance loss. We do so by implementing a graph-processing engine on top of a user-space SSD file system designed for high IOPS and extreme parallelism. Our semi-external memory graph engine called FlashGraph stores vertex state in memory and edge lists on SSDs. It hides latency by overlapping computation with I/O. To save I/O bandwidth, FlashGraph only accesses edge lists requested by applications from SSDs; to increase I/O throughput and reduce CPU overhead for I/O, it conservatively merges I/O requests. These designs maximize performance for applications with different I/O characteristics. FlashGraph exposes a general and flexible vertex-centric programming interface that can express a wide variety of graph algorithms and their optimizations. We demonstrate that FlashGraph in semi-external memory performs many algorithms with performance up to 80% of its in-memory implementation and significantly outperforms PowerGraph, a popular distributed in-memory graph engine.

References

[1]

ABADI, D., BONCZ, P., HARIZOPOULOS, S., IDREOS, S., AND MADDEN, S. The design and implementation of modern column-oriented database systems. Foundations and Trends in Databases 5 (2013), 197-280.

Digital Library

[2]

ABELLO, J., BUCHSBAUM, A. L., AND WESTBROOK, J. R. A functional approach to external graph algorithms. In Algorithmica (1998), Springer-Verlag, pp. 332-343.

Digital Library

[3]

BEAMER, S., ASANOVIC, K., AND PATTERSON, D. Direction-optimizing breadth-first search. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (2012), SC '12.

Digital Library

[4]

BECCHETTI, L., BOLDI, P., CASTILLO, C., AND GIONIS, A. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2008).

Digital Library

[5]

BLONDEL, V. D., LOUP GUILLAUME, J., LAMBIOTTE, R., AND LEFEBVRE, E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment (2008).

[6]

BRANDES, U. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25 (2001), 163-177.

[7]

BRIN, S., AND PAGE, L. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International Conference on World Wide Web 7 (1998).

Digital Library

[8]

CHEN, R., WENG, X., HE, B., YANG, M., CHOI, B., AND LI, X. On the efficiency and programmability of large graph processing in the cloud. Tech. rep., Microsoft Research, 2010.

[9]

DEAN, J., AND GHEMAWAT, S. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6 (2004).

Digital Library

[10]

Apache giraph. https://giraph.apache.org/, Accessed 4/9/2014.

[11]

GONZALEZ, J. E., LOW, Y., GU, H., BICKSON, D., AND GUESTRIN, C. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (2012).

Digital Library

[12]

HAN, W.-S., LEE, S., PARK, K., LEE, J.-H., KIM, M.-S., KIM, J., AND YU, H. Turbograph: a fast parallel graph engine handling billion-scale graphs in a single pc. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (2013), ACM, pp. 77-85.

Digital Library

[13]

KANG, U., TSOURAKAKIS, C. E., AND FALOUTSOS, C. PEGASUS: A peta-scale graph mining system implementation and observations. In Proceedings of the 2009 Ninth IEEE International Conference on Data Mining (2009).

Digital Library

[14]

KEPNER, J., AND GILBERT, J. Graph Algorithms in the Language of Linear Algebra. Society for Industrial & Applied Mathematics, 2011.

[15]

KWAK, H., LEE, C., PARK, H., AND MOON, S. What is twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web (2010).

Digital Library

[16]

KYROLA, A., BLELLOCH, G., AND GUESTRIN, C. Graphchi: Large-scale graph computation on just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (2012).

Digital Library

[17]

LESKOVEC, J., LANG, K. J., DASGUPTA, A., AND MAHONEY, M. W. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29-123.

[18]

LOW, Y., BICKSON, D., GONZALEZ, J., GUESTRIN, C., KYROLA, A., AND HELLERSTEIN, J. M. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow. (2012).

Digital Library

[19]

LUGOWSKI, A., ALBER, D., BULU, A., GILBERT, J., REINHARDT, S., TENG, Y., AND WARANIS, A. A flexible open-source toolbox for scalable complex graph analysis. In Proceedings of the 2012 SIAM International Conference on Data Mining (2012).

[20]

MALEWICZ, G., AUSTERN, M. H., BIK, A. J., DEHNERT, J. C., HORN, I., LEISER, N., AND CZAJKOWSKI, G. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (2010).

Digital Library

[21]

NGUYEN, D., LENHARTH, A., AND PINGALI, K. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (2013).

Digital Library

[22]

PEARCE, R., GOKHALE, M., AND AMATO, N. M. Multithreaded asynchronous graph traversal for in-memory and semiexternal memory. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (2010).

Digital Library

[23]

ROY, A., MIHAILOVIC, I., AND ZWAENEPOEL, W. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (2013).

Digital Library

[24]

SHAO, B., WANG, H., AND LI, Y. Trinity: A distributed graph engine on a memory cloud. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (2013).

Digital Library

[25]

SHUN, J., AND BLELLOCH, G. E. Ligra: A lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2013).

Digital Library

[26]

WANG, H., TANG, M., PARK, Y., AND PRIEBE, C. E. Locality statistics for anomaly detection in time series of graphs. IEEE Transactions on Signal Processing 62, 3 (2014).

Digital Library

[27]

WANG, H., ZHENG, D., BURNS, R., AND PRIEBE, C. Active community detection in massive graphs. CoRR abs/1412.8576 (2015).

[28]

WATTS, D. J., AND STROGATZ, S. H. Collective dynamics of 'small-world' networks. Nature 393, 440-442 (1998).

[29]

Web graph. http://webdatacommons.org/hyperlinkgraph/, Accessed 4/18/2014.

[30]

ZHANG, Y., GAO, Q., GAO, L., AND WANG, C. Maiter: An asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Transactions on Parallel and Distributed Systems (2014).

[31]

ZHENG, D., BURNS, R., AND SZALAY, A. S. A parallel page cache: Iops and caching for multicore systems. In Proceedings of the 4th USENIX Conference on Hot Topics in Storage and File Systems (2012).

Digital Library

[32]

ZHENG, D., BURNS, R., AND SZALAY, A. S. Toward millions of file system IOPS on low-cost, commodity hardware. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (2013).

Digital Library

[33]

ZHU, X., AND GHAHRAMANI, Z. Learning from labeled and unlabeled data with label propagation. Tech. rep., Carnegie Mellon University, 2002.

Cited By

Yang TChen YLiang YYang MMa XWon Y(2024)SeraphProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650719(373-387)Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.5555/3650697.3650719
Yang TEngland CLi YLi BYang MTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Grafu: Unleashing the Full Potential of Future Value Computation for Out-of-core Synchronous Graph ProcessingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640409(467-481)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640409
Haddadi ABlack-Schaffer DPark C(2023)Large-scale Graph Processing on Commodity Systems: Understanding and Mitigating the Impact of SwappingProceedings of the International Symposium on Memory Systems10.1145/3631882.3631884(1-11)Online publication date: 2-Oct-2023
https://dl.acm.org/doi/10.1145/3631882.3631884
Show More Cited By

Recommendations

On the Multichromatic Number of s-Stable Kneser Graphs

For positive integers n and s, a subset Sï [n] is s-stable if sï |i-j|ï n-s for distinct i,j∈S . The s-stable r-uniform Kneser hypergraph KGrn,ks-stable is the r-uniform hypergraph that has the collection of all s-stable k-element subsets of [n] as ...
Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes

An adjacent vertex-distinguishing edge coloring of a simple graph G is a proper edge coloring of G such that incident edge sets of any two adjacent vertices are assigned different sets of colors. A total coloring of a graph G is a coloring of both the ...
Forbidden Subgraphs and Weak Locally Connected Graphs

A graph is called H-free if it has no induced subgraph isomorphic to H. A graph is called $$N^i$$Ni-locally connected if $$G[\{ x\in V(G): 1\le d_G(w, x)\le i\}]$$G[{x?V(G):1≤dG(w,x)≤i}] is connected and $$N_2$$N2-locally connected if $$G[\{uv: \{uw, vw\...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

FAST'15: Proceedings of the 13th USENIX Conference on File and Storage Technologies

February 2015

386 pages

ISBN:9781931971201

Program Chairs:
Jiri Schindler
SimpliVity
,
Erez Zadok
Stony Brook University

Sponsors

USENIX Assoc: USENIX Assoc

In-Cooperation

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

USENIX Association

United States

Publication History

Published: 16 February 2015

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

73
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang TChen YLiang YYang MMa XWon Y(2024)SeraphProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650719(373-387)Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.5555/3650697.3650719
Yang TEngland CLi YLi BYang MTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Grafu: Unleashing the Full Potential of Future Value Computation for Out-of-core Synchronous Graph ProcessingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640409(467-481)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640409
Haddadi ABlack-Schaffer DPark C(2023)Large-scale Graph Processing on Commodity Systems: Understanding and Mitigating the Impact of SwappingProceedings of the International Symposium on Memory Systems10.1145/3631882.3631884(1-11)Online publication date: 2-Oct-2023
https://dl.acm.org/doi/10.1145/3631882.3631884
Sano SBando YHiwada KKajihara HSuzuki TNakanishi YTaki DKaneko AShiozawa T(2023)GPU Graph Processing on CXL-Based Microsecond-Latency External MemoryProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624173(962-972)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624173
Wang SZhang MYang KChen KMa SJiang JWu YAamodt TJerger NSwift M(2023)NosWalker: A Decoupled Architecture for Out-of-Core Random Walk ProcessingProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582025(466-482)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582025
Liu CChoi WKhadirsharbiyani SKandemir MMohror KArnold DBadia R(2023)MBFGraph: An SSD-based External Graph System for Evolving GraphsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607070(1-13)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607070
Kim JSwanson SWolf FShende SCulhane CAlam SJagode H(2022)BlazeProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571943(1-15)Online publication date: 13-Nov-2022
https://dl.acm.org/doi/10.5555/3571885.3571943
Tench DWest EZhang VBender MChowdhury ADellas JFarach-Colton MSeip TZhang KIves ZBonifati AEl Abbadi A(2022)GraphZeppelin: Storage-Friendly Sketching for Connected Components on Dynamic Graph StreamsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526146(325-339)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526146
Cheng WWu CChang YLin IMitra TYoung EXiong J(2022)GraphRCProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549408(1-9)Online publication date: 30-Oct-2022
https://dl.acm.org/doi/10.1145/3508352.3549408
Jang MJo YKim S(2021)RealGraphwebProceedings of the VLDB Endowment10.14778/3476311.347634214:12(2775-2778)Online publication date: 28-Oct-2021
https://dl.acm.org/doi/10.14778/3476311.3476342
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents