Fix YaRN RoPE bugs in model builder and add parity tests by titaiwangms · Pull Request #2076 · microsoft/onnxruntime-genai

titaiwangms · 2026-04-08T23:15:10Z

Summary

Fixes four bugs in the ModelBuilder's YaRN RoPE configuration resolution that caused completely wrong cos/sin caches, producing garbage output for ALL YaRN-based models (Ministral-3-3B, OpenAI OS-minier, and any model using beta_fast/beta_slow in rope_scaling).

Also adds comprehensive parity tests and addresses review feedback.

Bugs Fixed

Bug 1: `hasattr` on dict for `original_max_position_embeddings`

hasattr(config.rope_scaling, 'original_max_position_embeddings') always returns False for dicts. Fixed to use isinstance(Mapping) + in operator.

Bug 2: `rope_theta` fallback

Models that store rope_theta only in rope_scaling dict (not as a top-level config attribute) fell through to default theta=10000. Added fallback chain: config.rope_theta → config.rope_embedding_base → config.rope_scaling["rope_theta"] → 10000.

Bug 3: YaRN `mscale` override

Always computed mscale from factor via make_mscale(), ignoring explicit mscale=1.0 in config. Ministral-3-3B sets mscale=1.0 but was getting mscale≈1.277. Now respects the config value when > 0.

Bug 4: `inv_freq` double-inversion

make_inv_freq_rescaled_with_ntk computed 1/(factor * inv_freq) which double-inverts since inv_freq is already 1/pos_freqs, giving pos_freqs/factor (wrong). Fixed to inv_freq / factor.

Review Feedback Addressed

Changed isinstance(rope_scaling, dict) → isinstance(rope_scaling, Mapping) in 2 locations (handles FrozenDict, MappingProxyType, dict subclasses)
Added Mapping import from collections.abc
Added documentation comment for mscale_all_dim handling

Tests Added

Added test/python/test_yarn_rope_parity.py with 8 tests verifying cos/sin cache parity against HuggingFace transformers reference:

test_ministral_3b_cos_sin_match — end-to-end parity (128 positions)
test_bug_a_hasattr_on_dict — regression test for dict key access
test_bug_b_rope_theta_fallback — regression test for theta resolution
test_bug_c_mscale_override — regression test for explicit mscale
test_mscale_fallback_when_absent — tests .get("mscale", 0) fallback path
test_bug_d_inv_freq_no_double_inversion — regression test for NTK scaling
test_full_cache_length — parity at 2048 positions
test_mapping_isinstance_with_frozen_dict — validates Mapping check with MappingProxyType

Impact

Fixes ALL models using YaRN RoPE (beta_fast/beta_slow in rope_scaling)
Verified: cos/sin caches match HF transformers reference (atol=1e-5)
Verified: Ministral-3-3B generates correct output ("The capital of France is Paris.")

Files Changed

src/python/py/models/builders/base.py — 4 bug fixes + Mapping import + mscale docs
test/python/test_yarn_rope_parity.py — 8 new parity tests (new file)

Fix four bugs in the YaRN RoPE configuration that caused completely wrong cos/sin caches, producing garbage output for YaRN-based models (e.g. Ministral-3-3B, OpenAI OS-minier). Bug 1 - hasattr on dict (line 60): hasattr(config.rope_scaling, 'original_max_position_embeddings') always returns False for dict objects. Fixed to use 'in' operator: 'original_max_position_embeddings' in config.rope_scaling Bug 2 - rope_theta fallback (line 231): Models that store rope_theta only in rope_scaling dict (not as a top-level config attribute) fell through to default theta=10000. Added fallback to check config.rope_scaling['rope_theta']. Bug 3 - mscale override (line 464): Always computed mscale from factor via make_mscale_yarn(), ignoring the explicit mscale value from config. Models like Ministral-3-3B set mscale=1.0 to disable scaling, but got mscale=1.277 instead. Now respects the config value when explicitly provided. Bug 4 - inv_freq double-inversion (line 1750): make_inv_freq_rescaled_with_ntk computed: interpolation = 1.0 / (factor * inv_freq) extrapolation = 1.0 / inv_freq Since inv_freq is already 1/pos_freqs, this produced pos_freqs/factor and pos_freqs respectively (the raw frequencies, not their inverses). Fixed to match HF transformers _compute_yarn_parameters: interpolation = inv_freq / factor extrapolation = inv_freq Impact: Affects ALL models using YaRN RoPE (beta_fast/beta_slow config). Verified: cos/sin caches now match HuggingFace reference implementation. Tested with Ministral-3-3B: 'The capital of France is **Paris**.' Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Fixes YaRN RoPE configuration resolution in the Python ModelBuilder to prevent incorrect rotary cos/sin cache generation (which can produce unusable outputs for YaRN-based models).

Changes:

Fixes original_max_position_embeddings detection when rope_scaling is stored as a dict-like object.
Adds rope_theta fallback to read from rope_scaling when not present as a top-level config field.
Respects explicit YaRN mscale from config and corrects NTK rescaling math to avoid double-inversion.

src/python/py/models/builders/base.py

- Change isinstance(rope_scaling, dict) to isinstance(rope_scaling, Mapping) in 2 locations in base.py. This handles dict subclasses and MappingProxyType (e.g., FrozenDict) that HuggingFace configs may use. - Add test/python/test_yarn_rope_parity.py with 7 tests verifying cos/sin cache parity against HuggingFace reference for YaRN RoPE. Covers all 4 bugs fixed in this PR: (a) hasattr on dict — original_context_length from rope_scaling dict (b) rope_theta fallback — theta from rope_scaling when not top-level (c) mscale=1.0 override — explicit mscale respected, not recomputed (d) inv_freq double-inversion — uses inv_freq/factor, not 1/(factor*inv_freq) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 8, 2026 23:15

titaiwangms mentioned this pull request Apr 8, 2026

Add Mistral3/Pixtral VLM support #2077

Draft

Copilot started reviewing on behalf of titaiwangms April 8, 2026 23:16 View session

titaiwangms marked this pull request as draft April 8, 2026 23:17

Copilot AI reviewed Apr 8, 2026

View reviewed changes

src/python/py/models/builders/base.py Show resolved Hide resolved

src/python/py/models/builders/base.py Show resolved Hide resolved

titaiwangms force-pushed the pr/fix-yarn-config-resolution branch from 2bb8e6d to dbb6147 Compare April 9, 2026 20:25

titaiwangms force-pushed the pr/fix-yarn-config-resolution branch from dbb6147 to d786f35 Compare April 9, 2026 20:28

titaiwangms marked this pull request as ready for review April 9, 2026 21:00

titaiwangms changed the title ~~Fix 4 ModelBuilder config resolution bugs for YaRN RoPE models~~ Fix YaRN RoPE bugs in model builder and add parity tests Apr 9, 2026

titaiwangms requested review from baijumeswani, kunal-vaishnavi and xadupre April 9, 2026 21:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix YaRN RoPE bugs in model builder and add parity tests#2076

Fix YaRN RoPE bugs in model builder and add parity tests#2076
titaiwangms wants to merge 2 commits intomainfrom
pr/fix-yarn-config-resolution

titaiwangms commented Apr 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

titaiwangms commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bugs Fixed

Bug 1: hasattr on dict for original_max_position_embeddings

Bug 2: rope_theta fallback

Bug 3: YaRN mscale override

Bug 4: inv_freq double-inversion

Review Feedback Addressed

Tests Added

Impact

Files Changed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

titaiwangms commented Apr 8, 2026 •

edited

Loading

Bug 1: `hasattr` on dict for `original_max_position_embeddings`

Bug 2: `rope_theta` fallback

Bug 3: YaRN `mscale` override

Bug 4: `inv_freq` double-inversion