textutils

process corpus

this can be used for sst , just for what I have experience

parameters

corpus_path: corpus_path.
label_num: the number of label. (sst has 5 labels).
pkl_path: glove.840B.300d.txt size is 5G, it is too big to use every trainning, so i use pickle to save as a small file for saving time, this is saving path.
pkl_name: same as above, this is saving name.
embedding_file: Choose to use which one embedding, this is embedding path. (d:/glove.840B.300d.txt).
embedding_name: same as above, this is embedding name.
embedding_dim: the embedding you choose dim.
batch_size: batch size.
train=None: when generate train set, this is None; when generate dev set and test set, this is train set.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
batch.py		batch.py
data.py		data.py
embedding.py		embedding.py
examples.py		examples.py
vocabulary.py		vocabulary.py