BILN Language Model for describing modified and non-modified peptides
You will need to install the following packages:
pip install transformers[torch] datasets tokenizers mapchiral molfeat rdkit scipy scikit-learn tqdm optuna typer tensorboard lightgbm xgboost IPython hestia-ood
pip install git+https://github.com/Boehringer-Ingelheim/pyPept.git
pip install git+https://github.com/novonordisk-research/pepfunn.git
pip install SmilesPE omegaconf mlflow
conda install dgl -c conda-forge
Execute the download_data.py
script to download both the pretraining and benchmarking datasets. data_dir_path
refers to the directory where you want
to save the files.
Both collections:
python code/download_data.py data_dir_path
Only the pretraining data:
python code/download_data.py data_dir_path --collection pretraining
Only the downstream data:
python code/download_data.py data_dir_path --collection pretraining
Execute the train.py
script. log_dir
refers to the directory where the training logs will be saved. --overwrite
flag can be used if you want to overwrite the log_dir
.
python code/run_hpo.py log_dir `data_dir_path`
To evaluate a pretrained model, execute the fingerprint_evaluation.py
.
python code/fingerprint_evaluation.py data_dir_path BILN-LM:log_dir