A software to construct and visualize Word Sense Disambiguation models based on JoBimText models. This project implements the method described in the following paper, please cite it if you use the paper in a research project:
- Panchenko A., Marten F., Ruppert E., Faralli S., Ustalov D., Ponzetto S.P., Biemann C. Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation. In Proceedings of the the Conference on Empirical Methods on Natural Language Processing (EMNLP 2017). 2017. Copenhagen, Denmark. Association for Computational Linguistics
@inproceedings{Panchenko:17:emnlp,
author = {Panchenko, Alexander and Marten, Fide and Ruppert, Eugen and Faralli, Stefano and Ustalov, Dmitry and Ponzetto, Simone Paolo and Biemann, Chris},
title = {{Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation}},
booktitle = {In Proceedings of the the Conference on Empirical Methods on Natural Language Processing (EMNLP 2017)},
year = {2017},
address = {Copenhagen, Denmark},
publisher = {Association for Computational Linguistics},
language = {english}
}
- Java 1.8
- Docker Engine (1.13.0+), see Docker installation guide
- Docker Compose (1.10.0+), see Compose installation guide
- (Spark 2.0+, to build your own model)
We provide a ready for use database and a dump of pictures for all senses in the database. To download and prepare the project with those two artifacts, you can use the following command:
To download and untar it, you will need 300 GB of free disk space!
./wsd model:download
Note: For instructions on how to rebuild the DB with the model, please see below: Build your own DB
To start the application:
./wsd web-app:start
The web application runs with Docker Compose. To customize your installation adjust docker-compose.override.yml
. See the official documentation for general information on this file.
To get further information on the running containers you can use all Docker Compose commands, such as docker-compose ps
and docker-compose logs
.
First set the $SPARK_HOME
environment variable or provide spark-submit
on your path.
By modifying the script scripts/spark_submit_jar.sh
you can adjust the amount of memory used by Spark (consider changing --conf 'spark.driver.memory=4g'
and --conf 'spark.executor.memory=1g'
).
We recommend to first use a toy training data set to build a toy model within a few minutes.
./wsd model:build-toy
This model only provides senses for the word "Python" but is fully functional.
Building the full model will take nearly 11 hours on an eight core machine with 30 GB of memory and needs around 300 GB of free disk space. It will also download 4 GB of training data.
./wsd model:build-full
./wsd --help