Information systems

Applied Filters

People

Publications

Conferences

Publication Date

31 Results for: Book/Issue: DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Edit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,790,104 records)|Limit your search to The ACM Full-Text Collection (766,390 records)

Showing 1 - 20of31 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

invited-talk
August 2018
Can Deep Learning Compensate for a Shallow Evaluation?
- Gerald Penn
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 5, Page 1https://doi.org/10.1145/3209280.3236023

The last ten years have witnessed an enormous increase in the application of "deep learning" methods to both spoken and textual natural language processing. Have they helped? With respect to some well-defined tasks such as language modelling and ...
0
61
Metrics
Total Citations0
Total Downloads61
Last 12 Months0
Last 6 weeks0
Get Access
abstract
August 2018
Document Changes: Modeling, Detection, Storage and Visualization (DChanges 2018)
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 3, Pages 1–2https://doi.org/10.1145/3209280.3232792

The DChanges series of workshops focuses on changes in all their aspects and applications: algorithms to detect changes, models to describe them and techniques to present them to the users are only some of the topics that are investigated. This year, we ...
0
71
Metrics
Total Citations0
Total Downloads71
Last 12 Months3
Last 6 weeks0
Get Access
tutorial
August 2018
Automatic Text Summarization and Classification
- Steven J. Simske,
- Rafael Lins
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 1, Pages 1–2https://doi.org/10.1145/3209280.3232791

In this tutorial, we consider important aspects (algorithms, approaches, considerations) for tagging both unstructured and structured text for downstream use. This includes summarization, in which text information is compressed for more efficient ...
2
185
Metrics
Total Citations2
Total Downloads185
Last 12 Months4
Last 6 weeks0
Get Access
invited-talk
August 2018
The Quest for Total Recall
- Gordon V. Cormack,
- Maura R. Grossman
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 6, Pages 1–2https://doi.org/10.1145/3209280.3232788

The objective of high-recall information retrieval (HRIR) is to identify substantially all information relevant to an information need, where the consequences of missing or untimely results may have serious legal, policy, health, social, safety, defence,...
1
125
Metrics
Total Citations1
Total Downloads125
Last 12 Months1
Last 6 weeks0
Get Access
short-paper
August 2018
ARCHANGEL: Trusted Archives of Digital Public Documents
- J. Collomosse,
- T. Bui,
- A. Brown,
- J. Sheridan,
- A. Green,
- M. Bell,
- J. Fawcett,
- J. Higgins,
- O. Thereaux
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 31, Pages 1–4https://doi.org/10.1145/3209280.3229120

We present ARCHANGEL; a decentralised platform for ensuring the long-term integrity of digital documents stored within public archives. Document integrity is fundamental to public trust in archives. Yet currently that trust is built upon institutional ...
28
297
Metrics
Total Citations28
Total Downloads297
Last 12 Months38
Last 6 weeks5
Get Access
short-paper
August 2018
Main Content Detection in HTML Journal Articles
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 36, Pages 1–4https://doi.org/10.1145/3209280.3229115

Web content extraction algorithms have been shown to improve the performance of web content analysis tasks. This is because noisy web page content, such as advertisements and navigation links, can significantly degrade performance. This paper presents a ...
0
77
Metrics
Total Citations0
Total Downloads77
Last 12 Months4
Last 6 weeks0
1
Supplementary Material
rae
Get Access
short-paper
August 2018
Text Mining and Recommender Systems for Predictive Policing
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 15, Pages 1–4https://doi.org/10.1145/3209280.3229112

We present some results from a joint project between HP Labs, Cardiff University and Dyfed Powys Police on predictive policing. Applications of the various techniques from recommender systems and text mining to the problem of crime patterns recognition ...
2
205
Metrics
Total Citations2
Total Downloads205
Last 12 Months10
Last 6 weeks1
Get Access
short-paper
August 2018
Query Expansion in Enterprise Search
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 33, Pages 1–4https://doi.org/10.1145/3209280.3229111

Although web search remains an active research area, interest in enterprise search has not kept up with the information requirements of the contemporary workforce. To address these issues, this research aims to develop, implement, and study the query ...
0
129
Metrics
Total Citations0
Total Downloads129
Last 12 Months5
Last 6 weeks1
Get Access
short-paper
August 2018
The Causal Graph CRDT for Complex Document Structure
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 34, Pages 1–4https://doi.org/10.1145/3209280.3229110

Commutative Replicated Data Types (CRDTs) are an emerging tool for real-time collaborative editing. Existing work on CRDTs mostly focuses on documents as a list of text content, but large documents (having over 7,000 pages) with complex sectional ...
1
235
Metrics
Total Citations1
Total Downloads235
Last 12 Months14
Last 6 weeks1
Get Access
short-paper
August 2018
Document clustering as a record linkage problem
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 39, Pages 1–4https://doi.org/10.1145/3209280.3229109

This work examines document clustering as a record linkage problem, focusing on named-entities and frequent terms, using several vector and graph-based document representation methods and k-means clustering with different similarity measures. The JedAI ...
1
94
Metrics
Total Citations1
Total Downloads94
Last 12 Months3
Last 6 weeks0
Get Access
short-paper
August 2018
SlideDiff: Animating Textual and Media Changes in Slides
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 37, Pages 1–4https://doi.org/10.1145/3209280.3229107

SlideDiff is a system that automatically creates an animated rendering of textual and media differences between two versions of a slide presentation. While previous work focused on either textual or image data, SlideDiff integrates both text and media ...
3
64
Metrics
Total Citations3
Total Downloads64
Last 12 Months6
Last 6 weeks1
Get Access
short-paper
August 2018
Measuring the Centrality of the References in Scientific Papers
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 44, Pages 1–4https://doi.org/10.1145/3209280.3229104

Citation analysis is considered as major and one of the most popular branches of bibliometrics. Citation analysis is based on the assumption that all citations have similar values and weights each equally. Specific research fields like content-based ...
1
72
Metrics
Total Citations1
Total Downloads72
Last 12 Months8
Last 6 weeks2
Get Access
short-paper
August 2018
Helmholtz Principle on word embeddings for automatic document segmentation
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 40, Pages 1–4https://doi.org/10.1145/3209280.3229103

Automatic document segmentation gets more and more attention in the natural language processing field. The problem is defined as text division into lexically coherent fragments. In fact, most of realistic documents are not homogeneous, so extracting ...
0
113
Metrics
Total Citations0
Total Downloads113
Last 12 Months3
Last 6 weeks0
Get Access
short-paper
August 2018
Annotation Data Management with JeDIS
- Erik Faessler,
- Udo Hahn
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 42, Pages 1–4https://doi.org/10.1145/3209280.3229102

This paper introduces the Jena Document Information System (JeDIS). The focus lies on its capability to partition annotation graphs into modules. Annotation modules are defined in terms of types from the annotation schema. Modules allow easy ...
0
68
Metrics
Total Citations0
Total Downloads68
Last 12 Months2
Last 6 weeks1
Get Access
short-paper
August 2018
Automatic Term Extraction in Technical Domain using Part-of-Speech and Common-Word Features
- Nisha Ingrid Simon,
- Vlado Kešelj
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 51, Pages 1–4https://doi.org/10.1145/3209280.3229100

Extracting key terms from technical documents allows us to write effective documentation that is specific and clear, with minimum ambiguity and confusion caused by nearly synonymous but different terms. For instance, in order to avoid confusion, the ...
6
200
Metrics
Total Citations6
Total Downloads200
Last 12 Months12
Last 6 weeks1
Get Access
short-paper
August 2018
GOWDA: Goal-oriented Web Documents Querying tool
- Bahareh Zarei,
- Martin Gaedke
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 47, Pages 1–4https://doi.org/10.1145/3209280.3229099

Each day, a vast amount of data is published on the web. In addition, the rate at which content is being published is growing, which has the potential to overwhelm users, particularly those who are technically unskilled. Furthermore, users from various ...
0
53
Metrics
Total Citations0
Total Downloads53
Last 12 Months0
Last 6 weeks0
Get Access
short-paper
August 2018
Semantically Weighted Similarity Analysis for XML-based Content Components
- Jan Oevermann,
- Christoph Lüth
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 20, Pages 1–4https://doi.org/10.1145/3209280.3229098

Uncontrolled variants and duplicate content are ongoing problems in component content management; they decrease the overall reuse of content components. Similarity analyses can help to clean up existing databases and identify problematic texts, however, ...
0
66
Metrics
Total Citations0
Total Downloads66
Last 12 Months0
Last 6 weeks0
Get Access
short-paper
August 2018
diffi: diff improved; a preview
- Gioele Barabucci
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 38, Pages 1–4https://doi.org/10.1145/3209280.3229084

diffi (diff improved) is a comparison tool whose primary goal is to describe the differences between the content of two documents regardless of their formats.

diffi examines the stacks of abstraction levels of the two documents to be compared, finds ...
3
116
Metrics
Total Citations3
Total Downloads116
Last 12 Months12
Last 6 weeks2
Get Access
research-article
August 2018
iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing: Percentile Based Binarization
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 24, Pages 1–8https://doi.org/10.1145/3209280.3209538

End-to-end Optical Character Recognition (OCR) systems are heavily used to convert document images into machine-readable text. Commercial and open-source OCR systems (like Abbyy, OCRopus, Tesseract etc.) have traditionally been optimized for contemporary ...
3
85
Metrics
Total Citations3
Total Downloads85
Last 12 Months4
Last 6 weeks1
Get Access
research-article
August 2018
Exploiting patterns and templates for technical documentation
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018Article No.: 30, Pages 1–9https://doi.org/10.1145/3209280.3209537

There are several domains in which the documents are made of reusable pieces. Template languages have been widely studied by the document engineering community to deal with common structures and textual fragments. Though, templating mechanisms are often ...
2
285
Metrics
Total Citations2
Total Downloads285
Last 12 Months19
Last 6 weeks3
Get Access