skip to main content
article

Compressed representations for web and social graphs

Published: 01 August 2014 Publication History

Abstract

Compressed representations have become effective to store and access large Web and social graphs, in order to support various graph querying and mining tasks. The existing representations exploit various typical patterns in those networks and provide basic navigation support. In this paper, we obtain unprecedented results by finding "dense subgraph" patterns and combining them with techniques such as node orderings and compact data structures. On those representations, we support out-neighbor and out/in-neighbor queries, as well as mining queries based on the dense subgraphs. First, we propose a compression scheme for Web graphs that reduces edges by representing dense subgraphs with "virtual nodes"; over this scheme, we apply node orderings and other compression techniques. With this approach, we match the best current compression ratios that support out-neighbor queries (i.e., nodes pointed from a given node), using 1.0---1.8 bits per edge (bpe) on large Web graphs, and retrieving each neighbor of a node in 0.6---1.0 microseconds ( $$\upmu $$ s). When supporting both out- and in-neighbor queries, instead, our technique generally offers the best time when using little space. If the reduced graph, instead, is represented with a compact data structure that supports bidirectional navigation, we obtain the most compact Web graph representations (0.9---1.5 bpe) that support out/in-neighbor navigation; yet, the time per neighbor extracted raises to around 5---20 $$\upmu $$ s. We also propose a compact data structure that represents dense subgraphs without using virtual nodes. It allows us to recover out/in-neighbors and answer other more complex queries on the dense subgraphs identified. This structure is not competitive on Web graphs, but on social networks, it achieves 4---13 bpe and 8---12 $$\upmu $$ s per out/in-neighbor retrieved, which improves upon all existing representations.

References

[1]
Adler M, Mitzenmacher M (2001) Towards compressing web graphs. In: Proceedings of the data compression conference (DCC). Snowbird, UT, pp 203---212
[2]
Aggarwal C, Wang H (2010) Managing and mining graph data. Springer, Berlin
[3]
Anh V, Moffat A (2010) Local modeling for webgraph compression. In: Proceedings of the data compression conference (DCC). Snowbird UT, p 519
[4]
Apostolico A, Drovandi G (2009) Graph compression by BFS. Algorithms 2(3):1031---1044
[5]
Bader D, Madduri K (2005) Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. In: Proceedings of the 12th international high performance computing (HiPC). Goa, India, pp 465---476
[6]
Becchetti L, Castillo C, Donato D, Baeza-Yates R, Leonardi S (2008) Link analysis for web spam detection. ACM Trans Web 2(1):2
[7]
Boldi P, Vigna S (2004) The Webgraph framework I: compression techniques. In: Proceedings of the 13th international conference on the world wide web (WWW), New York, NY, pp 595---602
[8]
Boldi P, Santini M, Vigna S (2009) Permuting web graph. In: The 6th workshop on algorithms and models for the web graph (WAW), Barcelona, Spain, pp 116---126
[9]
Boldi P, Rosa M, Santini M, Vigna S (2011) Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 20th international conference on world wide web (WWW), Hyderabad, India, pp 587---596
[10]
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw 1(7):107---117
[11]
Brisaboa N, Ladra S, Navarro G (2009) K2-trees for compact web graph representation. In: Proceedings of the 16th international symposium on string processing and information retrieval (SPIRE), Saariselkä, Finland, pp 18---30
[12]
Brisaboa N, Ladra S, Navarro G (2012) Personal communication including code
[13]
Broder A (2000) Min-wise independent permutations: theory and practice. In: Proceedings of the 27th international colloquium on automata, languages and programming (ICALP), Geneva, Italy, p 808
[14]
Brohée S, Van Helden J (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics 7:488
[15]
Bron C, Kerbosch J (1973) Finding all cliques of an undirected graph (Algorithm 457). Commun ACM 16(9):575---576
[16]
Buehrer G, Chellapilla K (2008) A scalable pattern mining approach to web graph compression with communities. In: Proceedings of the international conference on web search and web data mining (WSDM), Palo Alto, CA, pp 95---106
[17]
Cha M, Mislove A, Gummadi P (2009) A measurement-driven analysis of information propagation in the Flickr social networking. In: Proceedings of the 20th international conference on world wide web (WWW), Madrid, Spain, pp 721---730
[18]
Chakrabarti D, Zhan Y, Faloutsos C (2004) R-MAT: a recursive model for graph mining. In: Proceedings of the 4th SIAM international conference on data mining (SDM), Lake Buena Vista, FL
[19]
Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), Paris, France, pp 219---228
[20]
Claude F, Navarro F (2010) Extended compact web graph representations. In: Algorithms and applications. Lecture notes in computer science 6060. Springer, Berlin, pp 77---91
[21]
Claude F, Navarro G (2010) Fast and compact web graph representations. ACM Trans Web 4(4):16
[22]
Claude F, Navarro G (2008) Practical rank/select queries over arbitrary sequences. In: Proceedings of the 15th international symposium on string processing and information retrieval (SPIRE), Melbourne, Australia, pp 176---187
[23]
Claude F, Ladra S (2011) Practical representations for web and social graphs. In: Proceedings of the 20th ACM conference on information and knowledge management (CIKM), Glasgow, UK, pp 1185---1190
[24]
Clark D (1996) Compact Pat trees. Ph.D. Thesis, University of Waterloo, Canada
[25]
Demetrescu C, Finocchi I, Ribichini A (2006) Trading off space for passes in graph streaming problems. In: Proceedings of the 17th ACM-SIAM symposium on discrete algorithms (SODA), Miami, FL, pp 714---723
[26]
Donato D, Millozzi S, Leonardi S, Tsaparas P (2005) Mining the inner structure of the web graph. In: Proceedings of the 8th workshop on the web and databases (WebDB), Baltimore, MD, pp 145---150
[27]
Dourisboure Y, Geraci F, Pellegrini M (2007) Extraction and classification of dense communities in the web. In: Proceedings of the 16th international conference on world wide web (WWW) Banff, Alberta, Canada, pp 461---470
[28]
Gibson D, Kumar R, Tomkins A (2005) Discovering large dense subgraphs in massive graphs. In: Proceedings of the 31st international conference on very large data bases (VLDB), Trondheim, Norway, pp 721---732
[29]
González R, Grabowski S, Mäkinen V, Navarro G (2005) Practical implementation of rank and select queries. In: Poster Proceedings of the volume of 4th workshop on efficient and experimental algorithms (WEA), Santorini Island, Greece, pp 27---38
[30]
Golynski A, Munro J, Rao S (2006) Rank/select operations on large alphabets: a tool for text indexing. In: Proceedings of the seventeenth annual ACM-SIAM symposium on discrete algorithms (SODA), Miami, FL, pp 368---373
[31]
Grabowski S, Bieniecki W (2010) Tight and simple web graph compression. CoRR abs/006.0809
[32]
Grabowski S, Bieniecki W (2011) Merging adjacency lists for efficient web graph compression. Adv Intell Soft Comput 103(1):385---392
[33]
Grossi R, Gupta A, Vitter J (2003) High-order entropy-compressed text indexes. In: Proceedings of the 14th annual ACM-SIAM symposium on discrete algorithms (SODA), Baltimore, MD, pp 841---850
[34]
Hasan M, Salem S, Zaki M (2011) SimClus: an effective algorithm for clustering with a lower bound on similarity. Knowl Inf Syst 28(3):665---685
[35]
Hernández C, Navarro G (2011) Compression of web and social graphs supporting neighbor and community queries. In: Proceedings of the 6th ACM workshop on social network mining and analysis (SNAKDD), San Diego, CA
[36]
Hernández C, Navarro G (2012) Compressed representation of web and social networks via dense subgraphs. In: Proceedings of the 19th international symposium on string processing and information retrieval (SPIRE), Cartagena de Indias, Colombia, pp 264---276
[37]
Katarzyna M, Przemyslaw K, Piotr B (2009) User position measures in social networks. In: Proceedings of the 4th ACM workshop on social network mining and analysis (SNAKDD), Paris, France, pp 1---9
[38]
Kleinberg J (1999) Authoritative sources in a hyperlinked environment. JACM 46(5):604---632
[39]
Kumar R, Raghavan P, Rajagopalan S, Tomkins A (1999) Trawling the web for emerging cyber-communities. Comput Netw 31(11):1481---1493
[40]
Larsson N, Moffat A (1999) Offline dictionary-based compression. In: Proceedings of the data compression conference (DCC), Snowbird, Utah, pp 296---305
[41]
Lee V, Ruan N, Jin R, Aggarwal C (2010) A survey of algorithms for dense subgraph discovery. Manag Min Graph Data 2010:303---336
[42]
Macropol K, Singh A (2010) Scalable discovery of best clusters on large graphs. PVLDB J 3(1):693---702
[43]
Maserrat H, Pei J (2010) Neighbor query friendly compression of social networks. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 533---542
[44]
Mcpherson J, Ma K, Ogawa M (2005) Discovering parametric clusters in social small-world graphs. In: Proceedings of the ACM symposium on applied computing, Santa Fe, New Mexico, USA
[45]
Mislove A, Marcon M, Gummadi P, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the internet measurement conference (IMC), San Diego, CA, pp 29---42
[46]
Mishra R, Shukla S, Arora D, Kumar M (2011) An effective comparison of graph clustering algorithms via random graphs. Int J Comput Appl 22(1):22---27
[47]
Morik K, Kaspari A, Wurst M (2012) Multi-objective frequent termset clustering. Knowl Inf Syst 30(3):715---738
[48]
Randall K, Stata R, Wiener J, Wickremesinghe R (2002) The link database: fast access to graphs of the web. In: Proceedings of the data compression conference (DCC), Snowbird, UT, pp 122---131
[49]
Raman R, Raman V, Rao S (2002) Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proceedings of the 13th annual ACM-SIAM symposium on discrete algorithms (SODA), San Francisco, CA, pp 233---242
[50]
Saito H, Toyoda M, Kitsuregawa M, Aihara K (2007) A large-scale study of link spam detection by graph algorithms. In: Proceedings of adversarial information retrieval on the web (AIRWeb), Banff, Alberta, Canada
[51]
Saito K, Kimura M, Ohara K, Motoda H (2012) Efficient discovery of influential nodes for SIS models in social networks. Knowl Inf Syst 30(3):613---635
[52]
Suel T, Yuan J (2001) Compressing the graph structure of the web. In: Proceedings of the data compression conference (DCC), Snowbird, UT, pp 213---222
[53]
Suri S, Vassilvitskii S (2011) Counting triangles and the curse of the last reducer. In: Proceedings of the 20th international conference on the world wide web (WWW), Hyderabad, India, pp 607---614
[54]
Van Dongen, S (2000) Graph clustering by flow simulation. Ph.D. Thesis, University of Utrecht, The Netherlands
[55]
Van Dongen S (2008) Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl 30(1):121---141
[56]
Vitter J (2001) External memory algorithms and data structures: dealing with massive data. ACM Comput Surv 33(2):209---271
[57]
Zhuge H (2009) Communities and emerging semantics in semantic link network: discovery and learning. IEEE Trans Knowl Data Eng 21(6):785---799

Cited By

View all

Index Terms

  1. Compressed representations for web and social graphs
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Knowledge and Information Systems
      Knowledge and Information Systems  Volume 40, Issue 2
      August 2014
      240 pages

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 01 August 2014

      Author Tags

      1. Compressed data structures
      2. Graph mining
      3. Social networks
      4. Web graphs

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 12 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Graph compression based on transitivity for neighborhood queryInformation Sciences: an International Journal10.1016/j.ins.2021.06.050576:C(312-328)Online publication date: 22-Apr-2022
      • (2022)Graph Compression for Adjacency-Matrix MultiplicationSN Computer Science10.1007/s42979-022-01084-23:3Online publication date: 21-Mar-2022
      • (2022)Compact and efficient representation of general graph databasesKnowledge and Information Systems10.1007/s10115-018-1275-x60:3(1479-1510)Online publication date: 10-Mar-2022
      • (2019)Fast, Small, and Simple Document Listing on Repetitive Text CollectionsString Processing and Information Retrieval10.1007/978-3-030-32686-9_34(482-498)Online publication date: 7-Oct-2019
      • (2018)An effective graph summarization and compression technique for a large-scaled graphThe Journal of Supercomputing10.1007/s11227-018-2245-576:10(7906-7920)Online publication date: 15-Jan-2018
      • (2017)ZipGProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3064012(1149-1164)Online publication date: 9-May-2017
      • (2017)Faster compression methods for a weighted graph using locality sensitive hashingInformation Sciences: an International Journal10.1016/j.ins.2017.07.033421:C(237-253)Online publication date: 1-Dec-2017
      • (2017)Set-based unified approach for summarization of a multi-attributed graphWorld Wide Web10.1007/s11280-016-0388-y20:3(543-570)Online publication date: 1-May-2017
      • (2017)Document retrieval on repetitive string collectionsInformation Retrieval10.1007/s10791-017-9297-720:3(253-291)Online publication date: 1-Jun-2017
      • (2015)Visual Classification by $\ell _1$ -Hypergraph ModelingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.241549727:9(2564-2574)Online publication date: 1-Sep-2015

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media