skip to main content
10.1145/2939672.2939862acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Compressing Graphs and Indexes with Recursive Graph Bisection

Published: 13 August 2016 Publication History

Abstract

Graph reordering is a powerful technique to increase the locality of the representations of graphs, which can be helpful in several applications. We study how the technique can be used to improve compression of graphs and inverted indexes.
We extend the recent theoretical model of Chierichetti et al. (KDD 2009) for graph compression, and show how it can be employed for compression-friendly reordering of social networks and web graphs and for assigning document identifiers in inverted indexes. We design and implement a novel theoretically sound reordering algorithm that is based on recursive graph bisection.
Our experiments show a significant improvement of the compression rate of graph and indexes over existing heuristics. The new method is relatively simple and allows efficient parallel and distributed implementations, which is demonstrated on graphs with billions of vertices and hundreds of billions of edges.

References

[1]
A, B. Shalita, I. K. Karrer, A. Sharma, A. Presta, A. Adcock, H. Kllapi, and M. Stumm. Social hash: An assignment framework for optimizing distributed systems operations on social networks. In Networked Systems Design and Implementation, 2016.
[2]
A. Apostolico and G. Drovandi. Graph compression by BFS. Algorithms, 2(3):1031--1044, 2009.
[3]
S. Arora, S. Rao, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning. Journal of the ACM, 56(2):5, 2009.
[4]
R. Bar-Yehuda, G. Even, J. Feldman, and J. Naor. Computing an optimal orientation of a balanced decomposition tree for linear arrangement problems. JGAA, 5(4):1--27, 2001.
[5]
R. Blanco and Á. Barreiro. Document identifier reassignment through dimensionality reduction. In Adv. Inf. Retr., pages 375--387. 2005.
[6]
D. Blandford and G. Blelloch. Index compression through document reordering. In Data Compression Conference, pages 342--351, 2002.
[7]
P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In World Wide Web, pages 587--596, 2011.
[8]
P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In World Wide Web, pages 595--602, 2004.
[9]
A. Z. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences, pages 21--29, 1997.
[10]
M. Charikar, M. T. Hajiaghayi, H. Karloff, and S. Rao. l2 over 2 spreading metrics for vertex ordering problems. Algorithmica, 56(4):577--604, 2010.
[11]
M. Charikar, K. Makarychev, and Y. Makarychev. A divide and conquer algorithm for d-dimensional arrangement. In Symposium on Discrete Algorithms, pages 541--546, 2007.
[12]
F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On compressing social networks. In Knowledge Discovery and Data Mining, pages 219--228, 2009.
[13]
N. R. Devanur, S. A. Khot, R. Saket, and N. K. Vishnoi. Integrality gaps for sparsest cut and minimum linear arrangement problems. In Symposium on Theory of Computing, pages 537--546, 2006.
[14]
S. Ding, J. Attenberg, and T. Suel. Scalable techniques for document identifier assignment in inverted indexes. In World Wide Web, pages 311--320, 2010.
[15]
G. Even, J. S. Naor, S. Rao, and B. Schieber. Divide-and-conquer approximation algorithms via spreading metrics. Journal of the ACM, 47(4):585--616, 2000.
[16]
C. M. Fiduccia and R. M. Mattheyses. A linear-time heuristic for improving network partitions. In Design Automation, pages 175--181, 1982.
[17]
M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979.
[18]
M. D. Hansen. Approximation algorithms for geometric embeddings in the plane with applications to parallel processing problems. In Foundations of Computer Science, pages 604--609, 1989.
[19]
M. Juvan and B. Mohar. Optimal linear labelings and eigenvalues of graphs. Discrete Applied Mathematics, 36(2):153--168, 1992.
[20]
B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal, 49(2):291--307, 1970.
[21]
H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In World Wide Web, pages 591--600, 2010.
[22]
Y. Lim, U. Kang, and C. Faloutsos. SlashBurn: Graph compression and mining beyond caveman communities. IEEE Transactions on Knowledge and Data Engineering, 26(12):3077--3089, 2014.
[23]
S. Maneth and F. Peternek. A survey on methods and systems for graph compression. arXiv preprint arXiv:1504.00616, 2015.
[24]
A. Moffat and L. Stuiver. Binary interpolative coding for effective index compression. Information Retrieval, 3(1), 2000.
[25]
G. Ottaviano and R. Venturini. Partitioned Elias-Fano indexes. In SIGIR, pages 273--282, 2014.
[26]
J. Petit. Addenda to the survey of layout problems. Bulletin of EATCS, 3(105), 2013.
[27]
U. N. Raghavan, R. Albert, and S. Kumara. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3):036106, 2007.
[28]
K. H. Randall, R. Stata, R. G. Wickremesinghe, and J. L. Wiener. The link database: Fast access to graphs of the web. In Data Compression Conference, pages 122--131, 2002.
[29]
S. Rao and A. W. Richa. New approximation techniques for some ordering problems. In Symposium on Discrete Algorithms, pages 211--219, 1998.
[30]
I. Safro and B. Temkin. Multiscale approach for the network compression-friendly ordering. Journal of Discrete Algorithms, 9(2):190--202, 2011.
[31]
W.-Y. Shieh, T.-F. Chen, J. J.-J. Shann, and C.-P. Chung. Inverted file compression through document identifier reassignment. Information Processing & Management, 39(1):117--131, 2003.
[32]
J. Shun, L. Dhulipala, and G. E. Blelloch. Smaller and faster: Parallel processing of compressed graphs with Ligra. In Data Compression Conference, pages 403--412, 2015.
[33]
F. Silvestri. Sorting out the document identifier assignment problem. In European Conference on IR Research, pages 101--112. Springer, 2007.
[34]
H. D. Simon and S.-H. Teng. How good is recursive bisection? SIAM Journal on Scientific Computing, 18(5):1436--1445, 1997.
[35]
J. Ugander and L. Backstrom. Balanced label propagation for partitioning massive graphs. In Web Search and Data Mining, pages 507--516, 2013.
[36]
B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in Facebook. In Workshop on Social Networks, 2009.
[37]
I. H. Witten, A. Moffat, and T. C. Bell. Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann, 1999.

Cited By

View all
  • (2024)Efficient List Intersection Algorithm for Short Documents by Document ReorderingMathematics10.3390/math1209132812:9(1328)Online publication date: 26-Apr-2024
  • (2024)Reordering Functions in Mobiles Apps for Reduced Size and Faster Start-UpACM Transactions on Embedded Computing Systems10.1145/366063523:4(1-54)Online publication date: 20-Apr-2024
  • (2024)Graph Summarization: Compactness Meets EfficiencyProceedings of the ACM on Management of Data10.1145/36549432:3(1-26)Online publication date: 30-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. algorithms
  2. approximation algorithms
  3. compression
  4. data mining
  5. distributed algorithms
  6. experimentation
  7. graph algorithms
  8. linear arrangement
  9. social networks

Qualifiers

  • Research-article

Conference

KDD '16
Sponsor:

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)102
  • Downloads (Last 6 weeks)2
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient List Intersection Algorithm for Short Documents by Document ReorderingMathematics10.3390/math1209132812:9(1328)Online publication date: 26-Apr-2024
  • (2024)Reordering Functions in Mobiles Apps for Reduced Size and Faster Start-UpACM Transactions on Embedded Computing Systems10.1145/366063523:4(1-54)Online publication date: 20-Apr-2024
  • (2024)Graph Summarization: Compactness Meets EfficiencyProceedings of the ACM on Management of Data10.1145/36549432:3(1-26)Online publication date: 30-May-2024
  • (2024)HERO: A Hierarchical Set Partitioning and Join Framework for Speeding up the Set Intersection Over GraphsProceedings of the ACM on Management of Data10.1145/36392842:1(1-25)Online publication date: 26-Mar-2024
  • (2024)Faster Learned Sparse Retrieval with Block-Max PruningProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657906(2411-2415)Online publication date: 10-Jul-2024
  • (2024)Revisiting Document Expansion and Filtering for Effective First-Stage RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657850(186-196)Online publication date: 10-Jul-2024
  • (2024)Improved Learned Sparse Retrieval with Corpus-Specific VocabulariesAdvances in Information Retrieval10.1007/978-3-031-56063-7_12(181-194)Online publication date: 23-Mar-2024
  • (2024)Meta-colored Compacted de Bruijn GraphsResearch in Computational Molecular Biology10.1007/978-1-0716-3989-4_9(131-146)Online publication date: 17-May-2024
  • (2023)Optimizing Function Layout for Mobile ApplicationsProceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596277(52-63)Online publication date: 13-Jun-2023
  • (2023)Khronos: A Real-Time Indexing Framework for Time Series Databases on Large-Scale Performance Monitoring SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614944(1607-1616)Online publication date: 21-Oct-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media