Multi-Word Units Miner, based on different approaches
This repository is meant to hold code, reference data and evaluation tools for MWU detection and processing based on different approaches. Among the approaches under investigation are:
- distance-based metrics: spread and flexibility (El Maarouf & Oakes, 2015)
- eco-diversity inspired indices: Shannon-Wiener's Entropy and Equitability, Simpson's D1, D2 and Evenness
- graph-based approaches: shortest path, PageRank adaptations to quantitative language processing.