skip to main content
10.1145/1135777.1135823acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

To randomize or not to randomize: space optimal summaries for hyperlink analysis

Published: 23 May 2006 Publication History

Abstract

Personalized PageRank expresses link-based page quality around user selected pages. The only previous personalized PageRank algorithm that can serve on-line queries for an unrestricted choice of pages on large graphs is our Monte Carlo algorithm [WAW 2004]. In this paper we achieve unrestricted personalization by combining rounding and randomized sketching techniques in the dynamic programming algorithm of Jeh and Widom [WWW 2003]. We evaluate the precision of approximation experimentally on large scale real-world data and find significant improvement over previous results. As a key theoretical contribution we show that our algorithms use an optimal amount of space by also improving earlier asymptotic worst-case lower bounds. Our lower bounds and algorithms apply to the SimRank as well; of independent interest is the reduction of the SimRank computation to personalized PageRank.

References

[1]
Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Sampling algorithms: Lower bounds and applications. Proc of 33rd STOC, 2001.
[2]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970.
[3]
P. Boldi and S. Vigna. The webgraph framework I: Compression techniques. Proc of 13th WWW, pp. 595--602, 2004.
[4]
A. Z. Broder and M. Mitzenmacher. Network applications of Bloom filters: A survey. Internet Mathematics, 1(4):485--509, 2005.
[5]
M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. Proc of 29th ICALP, pp. 693--703, 2002.
[6]
Y.-Y. Chen, Q. Gan, and T. Suel. Local methods for estimating PageRank values. Proc of 12th CIKM, pp. 381--389, 2004.
[7]
G. Cormode and S. Muthukrishnan. An improved data stream summary: The Count-Min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005.
[8]
G. Cormode and S. Muthukrishnan. Summarizing and mining skewed data streams. Proc of 5th SIAM Intl. Conf. on Data Mining, 2005.
[9]
R. Fagin, R. Kumar, M. Mahdian, D. Sivakumar, and E. Vee. Comparing and aggregating rankings with ties. Proc of 23rd PODS, 2004.
[10]
D. Fogaras. Where to start browsing the web? Proc of 3rd I2CS, Springer LNCS vol. 2877, pp. 65--79, 2003.
[11]
D. Fogaras and B. Rácz. Towards scaling fully personalized PageRank. Proc of 3rd WAW, pp. 105--117, 2004. Full version to appear in Internet Mathematics.
[12]
D. Fogaras and B. Rácz. Scaling link-based similarity search. Proc of 14th WWW, pp. 641--650, 2005. Full version available at www.ilab.sztaki.hu/websearch/Publications/.
[13]
T. H. Haveliwala. Topic-sensitive PageRank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering, 15(4):784--796, 2003.
[14]
M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. In External Memory Algorithms, DIMACS Book Series vol. 50., pp. 107--118. American Mathematical Society, 1999.
[15]
J. Hirai, S. Raghavan, H. Garcia-Molina, and A. Paepcke. WebBase: A repository of web pages. Proc of 9th WWW, pp. 277--293, 2000.
[16]
G. Jeh and J. Widom. SimRank: A measure of structural-context similarity. Pro of 8th SIGKDD, pp. 538--543, 2002.
[17]
G. Jeh and J. Widom. Scaling personalized web search. Proc of 12th WWW, pp. 271--279, 2003.
[18]
S. Kamvar, T. H. Haveliwala, C. Manning, and G. Golub. Exploiting the block structure of the web for computing PageRank. Technical Report 2003-17, Stanford University, 2003.
[19]
M. G. Kendall. Rank Correlation Methods. Hafner Publishing Co., New York, 1955.
[20]
J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.
[21]
E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1997.
[22]
F. McSherry. A uniform approach to accelerated PageRank computation. Proc of 14th WWW, pp. 575--582, 2005.
[23]
S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Comp. Sci., 1(2), 2005.
[24]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford University, 1998.
[25]
C. R. Palmer, P. B. Gibbons, and C. Faloutsos. ANF: A fast and scalable tool for data mining in massive graphs. Proc of 8th SIGKDD, pp. 81--90, 2002.
[26]
P. K. C. Singitham, M. S. Mahabhashyam, and P. Raghavan. Efficiency-quality tradeoffs for vector score aggregation. Proc of 30th VLDB, pp. 624--635, 2004.
[27]
J. S. Vitter. External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys, 33(2):209--271, 2001.

Cited By

View all
  • (2024)Efficient Algorithms for Personalized PageRank Computation: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337600036:9(4582-4602)Online publication date: Sep-2024
  • (2023)Efficient Resistance Distance Computation: The Power of Landmark-based ApproachesProceedings of the ACM on Management of Data10.1145/35889221:1(1-27)Online publication date: 30-May-2023
  • (2022)GraphGuess: Approximate Graph Processing System with Adaptive CorrectionEuro-Par 2022: Parallel Processing10.1007/978-3-031-12597-3_18(285-300)Online publication date: 22-Aug-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '06: Proceedings of the 15th international conference on World Wide Web
May 2006
1102 pages
ISBN:1595933239
DOI:10.1145/1135777
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 May 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data streams
  2. link-analysis
  3. scalability
  4. similarity search

Qualifiers

  • Article

Conference

WWW06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Algorithms for Personalized PageRank Computation: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337600036:9(4582-4602)Online publication date: Sep-2024
  • (2023)Efficient Resistance Distance Computation: The Power of Landmark-based ApproachesProceedings of the ACM on Management of Data10.1145/35889221:1(1-27)Online publication date: 30-May-2023
  • (2022)GraphGuess: Approximate Graph Processing System with Adaptive CorrectionEuro-Par 2022: Parallel Processing10.1007/978-3-031-12597-3_18(285-300)Online publication date: 22-Aug-2022
  • (2021)Tensor SketchTensor Computation for Data Analysis10.1007/978-3-030-74386-4_13(299-321)Online publication date: 3-May-2021
  • (2020)V-CombinerProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392739(1-13)Online publication date: 29-Jun-2020
  • (2020)A survey on network node ranking algorithms: Representative methods, extensions, and applicationsScience China Technological Sciences10.1007/s11431-020-1683-264:3(451-461)Online publication date: 14-Oct-2020
  • (2019)A Survey on Personalized PageRank Computation AlgorithmsIEEE Access10.1109/ACCESS.2019.29526537(163049-163062)Online publication date: 2019
  • (2019)Local Algorithms for Sparse Spanning GraphsAlgorithmica10.1007/s00453-019-00612-6Online publication date: 3-Aug-2019
  • (2018)Fast Exact CoSimRank Search on Evolving and Static GraphsProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186126(599-608)Online publication date: 10-Apr-2018
  • (2018)Buffered Count-Min SketchAdvanced Technologies, Systems, and Applications II10.1007/978-3-319-71321-2_22(249-255)Online publication date: 31-Jan-2018
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media