Skip to content
/ BILN-LM Public

Code for reproducing paper describing methods for representing peptides

License

Notifications You must be signed in to change notification settings

IBM/BILN-LM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BILN_LM

BILN Language Model for describing modified and non-modified peptides

0. Installation

You will need to install the following packages:

pip install transformers[torch] datasets tokenizers mapchiral molfeat rdkit scipy scikit-learn tqdm optuna typer tensorboard lightgbm xgboost IPython hestia-ood
pip install git+https://github.com/Boehringer-Ingelheim/pyPept.git
pip install git+https://github.com/novonordisk-research/pepfunn.git

pip install SmilesPE omegaconf mlflow
conda install dgl -c conda-forge

1. Download data

Execute the download_data.py script to download both the pretraining and benchmarking datasets. data_dir_path refers to the directory where you want to save the files.

Both collections:

python code/download_data.py data_dir_path 

Only the pretraining data:

python code/download_data.py data_dir_path --collection pretraining

Only the downstream data:

python code/download_data.py data_dir_path --collection pretraining

2. Pretrain the model

Execute the train.py script. log_dir refers to the directory where the training logs will be saved. --overwrite flag can be used if you want to overwrite the log_dir.

python code/run_hpo.py log_dir `data_dir_path`

3. Evaluate model or reproduce baselines

To evaluate a pretrained model, execute the fingerprint_evaluation.py.

python code/fingerprint_evaluation.py data_dir_path BILN-LM:log_dir

About

Code for reproducing paper describing methods for representing peptides

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages