Skip to content

Add docs and good defaults for DistillationTrainer#5500

Open
cmpatino wants to merge 53 commits intohuggingface:mainfrom
cmpatino:kd-distillation-docs
Open

Add docs and good defaults for DistillationTrainer#5500
cmpatino wants to merge 53 commits intohuggingface:mainfrom
cmpatino:kd-distillation-docs

Conversation

@cmpatino
Copy link
Copy Markdown
Collaborator

@cmpatino cmpatino commented Apr 10, 2026

What does this PR do?

Adds docs and better defaults for the DistillationTrainer.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

  • AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.


Note

Medium Risk
Adds new documentation but also changes multiple DistillationConfig defaults (e.g., on-policy fraction, KL mode, top-k, completion length, vLLM memory, learning rate), which can materially alter training behavior for users relying on defaults.

Overview
Adds a new DistillationTrainer documentation page (with quickstart, dataset expectations, and external teacher-server constraints) and links it from the experimental docs toctree.

Updates DistillationConfig to use more opinionated defaults geared toward on-policy distillation and teacher-server usage (e.g., lmbda=1.0, beta=1.0, loss_top_k=1, longer max_completion_length, lower vllm_gpu_memory_utilization, higher learning_rate) and clarifies the log_completions help text.

Reviewed by Cursor Bugbot for commit 611379f. Bugbot is set up for automated code reviews on this repo. Configure here.

@cmpatino cmpatino changed the title Add docs page for DistillationTrainer Add docs and good defaults for DistillationTrainer Apr 10, 2026
@cmpatino cmpatino marked this pull request as ready for review April 10, 2026 09:08
@cmpatino cmpatino requested review from albertvillanova and kashif and removed request for albertvillanova and kashif April 10, 2026 10:21
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!! For the documentation side, do you feel like that from a user view, the difference between this and GKDTrainer would be clear?

@@ -136,7 +136,9 @@ class DistillationConfig(_BaseConfig):
> Parameters that control logging

log_completions (`bool`, *optional*, defaults to `False`):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help in log_completions: bool = field(... should contain the same message, no?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Thanks for pointing it out

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main differences between this trainer and the GKDTrainer are a buffer for better on-policy generation and support for an external teacher server.

In terms of design, it doesn't inherit from SFTTrainer as GKDTrainer does, and the long-term plan is to upgrade DistillationTrainer from experimental status.

I mentioned the GKDTrainer explicitly in the docs to make the differences more explicit.

Copy link
Copy Markdown
Collaborator

@kashif kashif Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
metadata={
"help": "Whether to log a sample of (prompt, completion) pairs every `log_completions_steps` steps. If `rich` is "
"installed, it prints the sample. If `wandb` and/or `trackio` logging is enabled, it logs it to `wandb` "
"and/or `trackio`."
},

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants