Add docs and good defaults for `DistillationTrainer` by cmpatino · Pull Request #5500 · huggingface/trl

cmpatino · 2026-04-10T09:07:55Z

What does this PR do?

Adds docs and better defaults for the DistillationTrainer.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

Note

Medium Risk
Adds new documentation but also changes multiple DistillationConfig defaults (e.g., on-policy fraction, KL mode, top-k, completion length, vLLM memory, learning rate), which can materially alter training behavior for users relying on defaults.

Overview
Adds a new DistillationTrainer documentation page (with quickstart, dataset expectations, and external teacher-server constraints) and links it from the experimental docs toctree.

Updates DistillationConfig to use more opinionated defaults geared toward on-policy distillation and teacher-server usage (e.g., lmbda=1.0, beta=1.0, loss_top_k=1, longer max_completion_length, lower vllm_gpu_memory_utilization, higher learning_rate) and clarifies the log_completions help text.

^{Reviewed by Cursor Bugbot for commit 611379f. Bugbot is set up for automated code reviews on this repo. Configure here.}

…o kd-distillation-trainer

… > 0

HuggingFaceDocBuilderDev · 2026-04-10T10:40:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sergiopaniego

thanks!! For the documentation side, do you feel like that from a user view, the difference between this and GKDTrainer would be clear?

sergiopaniego · 2026-04-10T12:36:55Z

trl/experimental/distillation/distillation_config.py

@@ -136,7 +136,9 @@ class DistillationConfig(_BaseConfig):
        > Parameters that control logging

        log_completions (`bool`, *optional*, defaults to `False`):


The help in log_completions: bool = field(... should contain the same message, no?

You're right. Thanks for pointing it out

The main differences between this trainer and the GKDTrainer are a buffer for better on-policy generation and support for an external teacher server.

In terms of design, it doesn't inherit from SFTTrainer as GKDTrainer does, and the long-term plan is to upgrade DistillationTrainer from experimental status.

I mentioned the GKDTrainer explicitly in the docs to make the differences more explicit.

kashif · 2026-04-10T12:47:34Z

trl/experimental/distillation/distillation_config.py

Suggested change

metadata={

"help": "Whether to log a sample of (prompt, completion) pairs every `log_completions_steps` steps. If `rich` is "

"installed, it prints the sample. If `wandb` and/or `trackio` logging is enabled, it logs it to `wandb` "

"and/or `trackio`."

},

…d-distillation-docs

cmpatino added 30 commits March 18, 2026 15:32

Use VLLMGeneration in GOLDTrainer

ffde2d0

Update with precommit

1797fc1

Initial DistillationTrainer implementation

f723677

Fix how we handle padding and special tokens

b629987

Initial implementation of distillation trainer

6746237

Address concern about vllm weight sync

cdc3196

Run precommit

f4c193e

Fix max len behavior for generation

2b41f84

Format docstring

91715cb

Merge branch 'kd-vllm-generation' into kd-distillation-trainer

0ded0db

Fix data collation issue

b8754db

Remove decode -> re-tokenization roundtrip

b94fc1f

Run precommit

dcfce59

Add check for tokenizers and prompt length

ff81a89

Merge branch 'kd-vllm-generation' into kd-distillation-trainer

d075a1e

Merge branch 'kd-distillation-trainer' of github.com:cmpatino/trl int…

6eeaa8f

…o kd-distillation-trainer

Implement efficient logprob generation

f5fc947

Fix top-k implementation

42f3d4d

Merge branch 'main' into kd-distillation-trainer

4a00c68

Migrate trainer to experimental

d13bc4a

Fix reverse KL calculation for top-1

64fdafd

Merge branch 'main' into kd-distillation-trainer

2730c58

Run precommit

0d4bd05

Address cursor comments

59aa007

Fix reverse KL computation

91bffa4

Add DistillationTrainer to table of contents

6d0fe13

Remove DistillationTrainer from toc

b55996e

Tighten logic for different top-k scenarios

9727ec7

Add tail bucket to reverse KL + server case

939b53b

Merge branch 'main' into kd-distillation-trainer

47eacd5

cmpatino added 14 commits March 31, 2026 12:27

Tighten alignment logic

16a5380

Run precommit

1c7b9de

Correct completion logic

3cb4bc3

Remove wandb config params

d1b97a8

Address Albert's comments

691f46c

Run precommit

9ef9192

Address PR comments

bb2eee6

Match behavior between local and external teacher when top-1 and beta…

088db42

… > 0

Run precommit

90c6e81

Address latest comments

864ba11

Merge branch 'main' into kd-distillation-trainer

0f65a7b

Add docs for DistillationTrainer

87d00f7

Better config defaults

302589e

Merge branch 'main' into kd-distillation-docs

7e87a5d

cmpatino changed the title ~~Add docs page for DistillationTrainer~~ Add docs and good defaults for DistillationTrainer Apr 10, 2026

cmpatino marked this pull request as ready for review April 10, 2026 09:08

cmpatino requested review from albertvillanova and kashif and removed request for albertvillanova and kashif April 10, 2026 10:21

Change gradient accumulation steps in example

cdfafe0

kashif approved these changes Apr 10, 2026

View reviewed changes

Simplify code example

0d26a16

sergiopaniego approved these changes Apr 10, 2026

View reviewed changes

Merge branch 'main' into kd-distillation-docs

a4fac23

kashif reviewed Apr 10, 2026

View reviewed changes

cmpatino added 3 commits April 10, 2026 17:47

Align help commands with the docs.

ca9e73b

Merge branch 'kd-distillation-docs' of github.com:cmpatino/trl into k…

a71332d

…d-distillation-docs

Mention main differences with GKDTrainer

611379f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add docs and good defaults for `DistillationTrainer`#5500

Add docs and good defaults for `DistillationTrainer`#5500
cmpatino wants to merge 53 commits intohuggingface:mainfrom
cmpatino:kd-distillation-docs

cmpatino commented Apr 10, 2026 •

edited by cursor bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 10, 2026

Uh oh!

sergiopaniego left a comment

Uh oh!

sergiopaniego Apr 10, 2026

Uh oh!

cmpatino Apr 10, 2026

Uh oh!

cmpatino Apr 10, 2026

Uh oh!

kashif Apr 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -136,7 +136,9 @@ class DistillationConfig(_BaseConfig):
		> Parameters that control logging

		log_completions (`bool`, optional, defaults to `False`):

+        metadata={
+            "help": "Whether to log a sample of (prompt, completion) pairs every `log_completions_steps` steps. If `rich` is "
+            "installed, it prints the sample. If `wandb` and/or `trackio` logging is enabled, it logs it to `wandb` "
+            "and/or `trackio`."
+        },

Conversation

cmpatino commented Apr 10, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

AI writing disclosure

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 10, 2026

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

sergiopaniego Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

cmpatino Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

cmpatino Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

kashif Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cmpatino commented Apr 10, 2026 •

edited by cursor bot

Loading

kashif Apr 10, 2026 •

edited

Loading