TextBrewer 0.1.9

New Features

Added an option is_caching_logits to DistillationConfig. If is_caching_logits is True, the distiller will cache the batches and the output logits of the teacher model, so that those logits will only be calcuated once. It will speed up the distillation process. This feature is only available for BasicDistiller and MultiTeacherDistiller. Be caution of setting it to True on large datasets, since it will store the batches and logits into the memory.

Added new argument max_grad_norm to distillers' train method. It sets the strength of gradient clipping. Default -1, i.e., no gradient clipping.
Added new arguments scheduler_class and scheduler_args to distillers' train method. The old scheduler may cause convergence problem and is deprecated in favor of scheduler_class and scheduler_args. See the documentation for details.
Removed print in thedisplay_paramters. Now it won't print the statistics directly to the screen.