Add docs and good defaults for DistillationTrainer#5500
Add docs and good defaults for DistillationTrainer#5500cmpatino wants to merge 53 commits intohuggingface:mainfrom
DistillationTrainer#5500Conversation
…o kd-distillation-trainer
DistillationTrainerDistillationTrainer
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
sergiopaniego
left a comment
There was a problem hiding this comment.
thanks!! For the documentation side, do you feel like that from a user view, the difference between this and GKDTrainer would be clear?
| @@ -136,7 +136,9 @@ class DistillationConfig(_BaseConfig): | |||
| > Parameters that control logging | |||
|
|
|||
| log_completions (`bool`, *optional*, defaults to `False`): | |||
There was a problem hiding this comment.
The help in log_completions: bool = field(... should contain the same message, no?
There was a problem hiding this comment.
You're right. Thanks for pointing it out
There was a problem hiding this comment.
The main differences between this trainer and the GKDTrainer are a buffer for better on-policy generation and support for an external teacher server.
In terms of design, it doesn't inherit from SFTTrainer as GKDTrainer does, and the long-term plan is to upgrade DistillationTrainer from experimental status.
I mentioned the GKDTrainer explicitly in the docs to make the differences more explicit.
There was a problem hiding this comment.
| metadata={ | |
| "help": "Whether to log a sample of (prompt, completion) pairs every `log_completions_steps` steps. If `rich` is " | |
| "installed, it prints the sample. If `wandb` and/or `trackio` logging is enabled, it logs it to `wandb` " | |
| "and/or `trackio`." | |
| }, |
What does this PR do?
Adds docs and better defaults for the
DistillationTrainer.Before submitting
AI writing disclosure
We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
Note
Medium Risk
Adds new documentation but also changes multiple
DistillationConfigdefaults (e.g., on-policy fraction, KL mode, top-k, completion length, vLLM memory, learning rate), which can materially alter training behavior for users relying on defaults.Overview
Adds a new
DistillationTrainerdocumentation page (with quickstart, dataset expectations, and external teacher-server constraints) and links it from the experimental docs toctree.Updates
DistillationConfigto use more opinionated defaults geared toward on-policy distillation and teacher-server usage (e.g.,lmbda=1.0,beta=1.0,loss_top_k=1, longermax_completion_length, lowervllm_gpu_memory_utilization, higherlearning_rate) and clarifies thelog_completionshelp text.Reviewed by Cursor Bugbot for commit 611379f. Bugbot is set up for automated code reviews on this repo. Configure here.