skip to main content
10.1145/3209280.3229099acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
short-paper

GOWDA: Goal-oriented Web Documents Querying tool

Published: 28 August 2018 Publication History

Abstract

Each day, a vast amount of data is published on the web. In addition, the rate at which content is being published is growing, which has the potential to overwhelm users, particularly those who are technically unskilled. Furthermore, users from various domains of expertise face challenges when trying to retrieve the data they require. They may rely on IT experts, but these experts have limited knowledge of individual domains, making data extraction a time-consuming and error-prone task. It would be beneficial if domain experts were able to retrieve needed data and create relatively complex queries on top of web documents. The existing query solutions either are limited to a specific domain or require beginning with a predefined knowledge base or sample ontologies. To address these limitations, we propose a goal-oriented platform that enables users to easily extract data from web documents. This platform enables users to express their goals in natural language, after which the platform elicits the corresponding result type using the algorithm proposed. The platform also applies the concept of ontology to semantically improve search results. To retrieve the most relevant results from web documents, the segments of a user's query are mapped to the entities of the ontology. Two types of ontologies are used: goal ontologies and domain-specific ones, which comprise domain concepts and the relationships among them. In addition, the platform helps domain experts to generate the domain ontologies that will be used to extract data from web documents. Placing ontologies at the center of the approach integrates a level of semantics into the platform, resulting in more-precise output. The main contributions of this research are that it provides a goal-oriented platform for extracting data from web documents and integrates ontology-based development into web-document searches.

References

[1]
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R Hruschka, and Tom M Mitchell. 2010. Toward an Architecture for Never-Ending Language Learning. AAAI 5 (2010), 3.
[2]
Hamish Cunningham, Valentin Tablan, Angus Roberts, and Kalina Bontcheva. 2013. Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics. PLoS Computational Biology 9, 2 (feb 2013), e1002854.
[3]
Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shaohua Sun, and Wei Zhang. 2014. From Data Fusion to Knowledge Fusion. Proceedings of the VLDB Endowment 7 (2014), 881--92.
[4]
Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S Weld, and Alexander Yates. 2004. Web-Scale Information Extraction in KnowItAll (Preliminary Results). In In Proceedings of the 13th international conference on World Wide Web. 100--110.
[5]
Alexander Faaborg and Henry Lieberman. 2006. A goal-oriented web browser. CHI '06: Proceedings of the SIGCHI conference on Human Factors in computing systems (2006), 751--760.
[6]
Emilio Ferrara, Pasquale De Meo, Giacomo Fiumara, and Robert Baumgartner. 2012. Web Data Extraction, Applications and Techniques: A Survey. Knowledge-Based Systems journal 70 (2012), 301--323. arXiv:1207.0246
[7]
Neha Kaushik and Niladri Chatterjee. 2017. Automatic relationship extraction from agricultural text for ontology construction. Information Processing in Agriculture 5, 1 (mar 2017), 60--73.
[8]
R. Lakshmi Tulasi, Meda Sreenivasa Rao, K. Ankita, and R. Hgoudar. 2017. Ontology-Based Automatic Annotation: An Approach for Efficient Retrieval of Semantic Results of Web Documents. Springer, Singapore, 331--339.
[9]
Yunyao Li, Frederick R Reiss, and Laura Chiticariu. 2011. SystemT: A Declarative Information Extraction System. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations. 109--114.
[10]
Ndapandula Nakashole, Martin Theobald, and Gerhard Weikum. 2011. Scalable Knowledge Harvesting with High Precision and High Recall. In Proceedings of the fourth ACM international conference on Web search and data mining. 227--236.
[11]
A. Sudha Ramkumar and B. Poorna. 2014. Ontology Based Semantic Search: An Introduction and a Survey of Current Approaches. 2014 International Conference on Intelligent Computing Applications (2014), 372--376.
[12]
Abhishek Singh Rathore and Devshri Roy. 2014. Ontology based Web Page Topic Identification. International Journal of Computer Applications 85, 6 (2014), 35--40.
[13]
Warren Shen, Anhai Doan, Jeffrey F Naughton, and Raghu Ramakrishnan. 2007. Declarative Information Extraction Using Datalog with Embedded Extraction Predicates. In Proceedings of the 33rd international conference on Very large data bases. 1033--1044.
[14]
R P Singh and Director Bimla Devi. 2012. OntDR: An Ontology-based Augmented Method for Document Retrieval. International Journal of Computer Applications 53, 17 (2012), 975--8887.
[15]
Sukanta Sinha, Rana Dattagupta, and Debajyoti Mukhopadhyay. 2012. Designing an Ontology based Domain Specific Web Search Engine for Commonly used Products using RDF. In Proceedings of the CUBE International Information Technology Conference. 612--617.
[16]
Ahmet Soylu, Martin Giese, Ernesto Jimenez-ruiz Evgeny, Kharlamov Dmitriy, and Zheleznyakov Ian. 2017. Ontology-based End-user Visual Query Formulation: Why, what, who, how, and which? Universal Access in the Information Society 16, 2 (2017), 435--467.
[17]
Michael Spahn, Joachim Kleb, Stephan Grimm, and Stefan Scheidl. 2008. Supporting business intelligence by providing ontology-based end-user information selfservice. Proceedings of the first international workshop on Ontology-supported business intelligence - OBI '08 (2008), 1--12.
[18]
Mark J. Weal, Harith Alani, Sanghee Kim, Paul H. Lewis, David E. Millard, P. AS Sinclair, David C. De Roure, and Nigel R. Shadbolt. 2007. Ontologies as facilitators for repurposing web documents. International Journal of Human Computer Studies 65, 6 (2007), 537--562.
[19]
Gerhard Wohlgenannt and Filip Minic. 2016. Using word2vec to build a simple ontology learning system. CEUR Workshop Proceedings 1690 (2016), 2--5.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018
August 2018
311 pages
ISBN:9781450357692
DOI:10.1145/3209280
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • SIGDOC: ACM Special Interest Group on Systems Documentation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Ontology-based development
  2. goal-oriented solution
  3. web document's query

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Funding Sources

Conference

DocEng '18
Sponsor:
DocEng '18: ACM Symposium on Document Engineering 2018
August 28 - 31, 2018
NS, Halifax, Canada

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 53
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media