skip to main content
research-article

FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk Framework

Published: 31 May 2024 Publication History

Abstract

Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarrassed to parallelize. In this paper, we propose FlowWalker, a GPU-based dynamic graph random walk framework. FlowWalker implements an efficient parallel sampling method to fully exploit the GPU parallelism and reduce space complexity. Moreover, it employs a sampler-centric paradigm alongside a dynamic scheduling strategy to handle the huge amounts of walking queries. FlowWalker stands as a memory-efficient framework that requires no auxiliary data structures in GPU global memory. We examine the performance of FlowWalker extensively on ten datasets, and experiment results show that FlowWalker achieves up to 752.2×, 72.1×, and 16.4× speedup compared with existing CPU, GPU, and FPGA random walk frameworks, respectively. Case study shows that FlowWalker diminishes random walk time from 35% to 3% in a pipeline of ByteDance friend recommendation GNN training.

References

[1]
Paolo Boldi and Marco Rosa. 2012. Arc-community detection via triangular random walks. In 2012 Eighth Latin American Web Congress. IEEE, 48--56.
[2]
Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks. In Proceedings of the 20th international conference on World Wide Web, Sadagopan Srinivasan, Krithi Ramamritham, Arun Kumar, M. P. Ravindra, Elisa Bertino, and Ravi Kumar (Eds.). ACM Press, 587--596.
[3]
Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, 595--601.
[4]
Min-Te Chao. 1982. A general purpose unequal probability sampling plan. Biometrika 69, 3 (1982), 653--656.
[5]
Thomas H Cormen and Michael T Goodrich. 1996. A bridging model for parallel computation, communication, and I/O. ACM Computing Surveys (CSUR) 28, 4es (1996), 208--es.
[6]
Xiaoheng Deng, Genghao Li, Mianxiong Dong, and Kaoru Ota. 2017. Finding overlapping communities based on Markov chain and link clustering. Peer-to-peer Networking and Applications 10 (2017), 411--420.
[7]
Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 135--144.
[8]
Dániel Fogaras, Balázs Rácz, Károly Csalogány, and Tamás Sarlós. 2005. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics 2, 3 (2005), 333--358.
[9]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855--864.
[10]
Yu He, Yangqiu Song, Jianxin Li, Cheng Ji, Jian Peng, and Hao Peng. 2019. HeteSpaceyWalk: A Heterogeneous Spacey Random Walk for Heterogeneous Information Network Embedding. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (Beijing, China) (CIKM '19). Association for Computing Machinery, New York, NY, USA, 639--648.
[11]
W. Daniel Hillis and Guy L. Steele. 1986. Data Parallel Algorithms. Commun. ACM 29, 12 (dec 1986), 1170--1183.
[12]
Lorenz Hübschle-Schneider and Peter Sanders. 2022. Parallel weighted random sampling. ACM Transactions on Mathematical Software (TOMS) 48, 3 (2022), 1--40.
[13]
Abhinav Jangda, Sandeep Polisetty, Arjun Guha, and Marco Serafini. 2021. Accelerating graph sampling for graph machine learning using GPUs. In Proceedings of the Sixteenth European Conference on Computer Systems. 311--326.
[14]
Yong-Yeon Jo, Myung-Hwan Jang, Hyungsoo Jung, and Sang-Wook Kim. 2018. A High-Performance Graph Engine for Efficient Social Network Analysis. In Companion of the The Web Conference 2018 on The Web Conference 2018, WWW 2018, Lyon, France, April 23--27, 2018, Pierre-Antoine Champin, Fabien Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.). ACM, 61--62.
[15]
Peter M Kogge and Harold S Stone. 1973. A parallel algorithm for the efficient solution of a general class of recurrence equations. IEEE transactions on computers 100, 8 (1973), 786--793.
[16]
Aapo Kyrola. 2013. DrunkardMob: billions of random walks on just a PC. Proceedings of the 7th ACM conference on Recommender systems (2013).
[17]
Aapo Kyrola. 2013. Drunkardmob: billions of random walks on just a pc. In Proceedings of the 7th ACM conference on Recommender systems. 257--264.
[18]
Aapo Kyrola, Guy E. Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC. In 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012, Hollywood, CA, USA, October 8--10, 2012. 31--46. https://www.usenix.org/conference/osdi12/technicalsessions/presentation/kyrola
[19]
Ni Lao and William W Cohen. 2010. Relational retrieval using a combination of path-constrained random walks. Machine learning 81 (2010), 53--67.
[20]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
[21]
Xueting Liao, Yubao Wu, and Xiaojun Cao. 2019. Second-Order CoSimRank for Similarity Measures in Social Networks. In ICC 2019 - 2019 IEEE International Conference on Communications (ICC). 1--6.
[22]
Xin Lv, Yuxian Gu, Xu Han, Lei Hou, Juanzi Li, and Zhiyuan Liu. 2019. Adapting Meta Knowledge Graph Information for Multi-Hop Reasoning over Few-Shot Relations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3--7, 2019, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 3374--3379.
[23]
Pedro J. Martín, Luis F. Ayuso, Roberto Torres, and Antonio Gavilanes. 2012. Algorithmic strategies for optimizing the parallel reduction primitive in CUDA. In 2012 International Conference on High Performance Computing & Simulation (HPCS). 511--519.
[24]
Junyi Mei, Shixuan Sun, Chao Li, Cheng Xu, Cheng Chen, Yibo Liu, Jing Wang, Cheng Zhao, Xiaofeng Hou, Minyi Guo, Bingsheng He, and Xiaoliang Cong. 2024. FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework. arXiv:2404.08364 [cs.DC]
[25]
NVIDIA. 2022. CUB Documentation. https://nvlabs.github.io/cub/index.html, Last accessed on 2023-6-25.
[26]
NVIDIA. 2023. CUDA Toolkit Documentation, cuRAND. https://docs.nvidia.com/cuda/curand/index.html, Last accessed on 2023-6-25.
[27]
S. Olver and A. Townsend. 2013. Fast inverse transform sampling in one and two dimensions. arXiv: Numerical Analysis (2013).
[28]
Santosh Pandey, Lingda Li, Adolfy Hoisie, Xiaoye S Li, and Hang Liu. 2020. CSAW: A framework for graph sampling and random walk on GPUs. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--15.
[29]
Serafeim Papadias, Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz, and Volker Markl. 2022. Space-efficient random walks on streaming graphs. Proceedings of the VLDB Endowment 16, 2 (2022), 356--368.
[30]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701--710.
[31]
Christian Robert and George Casella. 2013. Monte Carlo statistical methods. In Springer Science & Business Media.
[32]
Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The Graph Neural Network Model. IEEE Transactions on Neural Networks 20, 1 (2009), 61--80.
[33]
Shubhabrata Sengupta, Aaron Lefohn, and John D Owens. 2006. A work-efficient step-efficient prefix sum algorithm. (2006).
[34]
Yingxia Shao, Shiyue Huang, Xupeng Miao, Bin Cui, and Lei Chen. 2020. Memory-aware framework for efficient second-order random walk on large graphs. In Proceedings of the 2020 ACM SIGMOD international conference on management of data. 1797--1812.
[35]
Baoxu Shi and Tim Weninger. 2016. Discriminative predicate path mining for fact checking in knowledge graphs. Knowledge-based systems 104 (2016), 123--133.
[36]
Shixuan Sun, Yuhang Chen, Shengliang Lu, Bingsheng He, and Yuchen Li. 2021. ThunderRW: An In-Memory Graph Random Walk Engine. Proc. VLDB Endow. 14, 11 (2021), 1992--2005. http://www.vldb.org/pvldb/vol14/p1992-sun.pdf
[37]
Yizhou Sun and Jiawei Han. 2012. Mining heterogeneous information networks: a structural analysis approach. SIGKDD Explor. 14, 2 (2012), 20--28.
[38]
Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, and Weng-Fai Wong. 2023. LightRW: FPGA Accelerated Graph Dynamic Random Walks. Proc. ACM Manag. Data 1, 1, Article 90 (may 2023), 27 pages.
[39]
Alok Tripathy, Katherine Yelick, and Aydin Buluc. 2023. Distributed Matrix-Based Sampling for Graph Neural Network Training. arXiv preprint arXiv:2311.02909 (2023).
[40]
Jeffrey S Vitter. 1985. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS) 11, 1 (1985), 37--57.
[41]
Alastair J. Walker. 1977. An Efficient Method for Generating Discrete Random Variables with General Distributions. ACM Trans. Math. Softw. 3, 3 (1977), 253--256.
[42]
Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, B. Zhao, and D. Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018).
[43]
Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, et al. 2019. Deep graph library: Towards efficient and scalable deep learning on graphs. arXiv preprint arXiv:1909.01315 (2019).
[44]
Pengyu Wang, Chao Li, Jing Wang, Taolei Wang, Lu Zhang, Jingwen Leng, Quan Chen, and Minyi Guo. 2021. Skywalker: Efficient Alias-Method-Based Graph Sampling and Random Walk on GPUs. In 30th International Conference on Parallel Architectures and Compilation Techniques, PACT 2021, Atlanta, GA, USA, September 26--29, 2021. IEEE, 304--317.
[45]
Pengyu Wang, Cheng Xu, Chao Li, Jing Wang, Taolei Wang, Lu Zhang, Xiaofeng Hou, and Minyi Guo. 2023. Optimizing GPU-based Graph Sampling and Random Walk for Efficiency and Scalability. IEEE Trans. Comput. (2023).
[46]
Rui Wang, Yongkun Li, Hong Xie, Yinlong Xu, and John CS Lui. 2020. {GraphWalker}: An {I/O-Efficient} and {Resource-Friendly} Graph Analytic System for Fast and Scalable Random Walks. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 559--571.
[47]
Shuke Wang, Mingxing Zhang, Ke Yang, Kang Chen, Shaonan Ma, Jinlei Jiang, and Yongwei Wu. 2023. NosWalker: A Decoupled Architecture for Out-of-Core Random Walk Processing. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. 466--482.
[48]
Yubao Wu, Yuchen Bian, and Xiang Zhang. 2016. Remember where you came from: on the second-order random walk based proximity measures. Proceedings of the VLDB Endowment 10, 1 (2016), 13--24.
[49]
Ke Yang, Xiaosong Ma, Saravanan Thirumuruganathan, Kang Chen, and Yongwei Wu. 2021. Random walks on huge graphs at cache efficiency. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles. 311--326.
[50]
Ke Yang, Mingxing Zhang, Kang Chen, Xiaosong Ma, Yang Bai, and Yong Jiang. 2019. KnightKing: a fast distributed graph random walk engine. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP 2019, Huntsville, ON, Canada, October 27--30, 2019, Tim Brecht and Carey Williamson (Eds.). ACM, 524--537.
[51]
Hongbo Yin, Yingxia Shao, Xupeng Miao, Yawen Li, and Bin Cui. 2022. Scalable Graph Sampling on GPUs with Compressed Graph. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (Atlanta, GA, USA) (CIKM '22). Association for Computing Machinery, New York, NY, USA, 2383--2392.
[52]
Dalong Zhang, Xin Huang, Ziqi Liu, Jun Zhou, Zhiyang Hu, Xianzheng Song, Zhibang Ge, Lin Wang, Zhiqiang Zhang, and Yuan Qi. 2020. AGL: A Scalable System for Industrial-Purpose Graph Machine Learning. Proc. VLDB Endow. 13, 12 (aug 2020), 3125--3137.
[53]
Rong Zhu, Kun Zhao, Hongxia Yang, Wei Lin, Chang Zhou, Baole Ai, Yong Li, and Jingren Zhou. 2019. AliGraph: a comprehensive graph neural network platform. Proceedings of the VLDB Endowment 12, 12 (2019), 2094--2105.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 17, Issue 8
April 2024
335 pages
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 31 May 2024
Published in PVLDB Volume 17, Issue 8

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 42
    Total Downloads
  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)7
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media