- Python 3.6 installed
- Pip (
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && python get-pip.py
) - Jupyter notebook
- Anaconda: https://www.anaconda.com/download/#windows
- cmder: https://github.com/cmderdev/cmder/releases/download/v1.3.6/cmder.zip
- Clone this repository
- Download the training data from: http://goren.ml/pdnlp
- Extract it to
data/
- Make sure all the requirements are installed
pip3 install -r requirements.txt
ORconda install --yes --file requirements.txt
if you're with Anaconda - Launch Jupyter by running
cd notebooks; jupyter notebook
in your terminal
data.zip
- The raw contracts, classified by their filenamestemmed.zip
- The contracts after preprocessing and stemming (here to save you time)w2v.pickle
- Word2Vec model trained on the data (gensim
model)test_data.zip
- Unlabeled contracts, for those who would like to participate in the competition ( http://goren4u.com/nlp_classification/ )
For more details, contact me at goren.ml .