[Draft] Parakeet export by jiafatom · Pull Request #1977 · microsoft/onnxruntime-genai

jiafatom · 2026-02-12T22:23:01Z

No description provided.

- C++ implementation: NemotronModel + NemotronState with 3-session RNNT pipeline (encoder, decoder, joint) and greedy decode - Model type registration: nemotron_asr added to ALM types - Config parsing: encoder/decoder I/O names, audio parameters - Export tooling: ONNX export, tokenizer conversion, graph fusion + INT4 quantization (3.6x encoder size reduction) - E2E tests: dummy audio + real speech validation - Documentation: architecture overview and usage guide

Add RunStreamingEncoder with cache carry-forward (MHA + causal conv). Add GreedyDecodeIncremental for per-chunk RNNT decode. Auto-detect streaming mode via encoder ONNX input probing. Batch mode fallback supports per-chunk re-inference. Export script: --streaming flag wraps encoder with cache I/O. Streaming encoder: 5 inputs (audio + length + 3 caches), 5 outputs. Cache shapes: channel [B,24,70,1024], time [B,24,1024,8], len [B].

forward_for_export() handles [B, n_layers, ...] <-> [n_layers, B, ...] transposition internally. The wrapper was incorrectly adding another transpose, causing RuntimeError in multi_head_attention during export. Fix: Remove transpose calls from StreamingEncoderWrapper.forward(). ONNX I/O consistently uses [B, n_layers, ...] format for caches. Also: clarify cache format comments in nemotron.h/cpp.

- export_nemotron_to_onnx.py: add generate_genai_config() and generate_audio_processor_config() that extract model dimensions, I/O names, and audio params from the loaded NeMo model - optimize_encoder.py: annotate genai_config.json with optimization metadata (fusion type, quantization method) when INT4 is applied

…d quantization - Mel replay buffer: save mel chunks during blank periods, replay after encoder+decoder reset to recover lost audio instead of hallucinating - Encoder cache reset: zero all 3 encoder cache tensors when stuck detector triggers (2+ consecutive blank chunks), not just decoder LSTM state - Hybrid decoder reset: reset decoder state on first blank chunk, encoder caches on second+ blank, then replay buffered mel data - k_quant_mixed quantization: mixed-precision INT4 that preserves FP32 for sensitive layers (attention Q/K/V/Out, first/last encoder, pre_encode) - HQQ quantization option: Half-Quadratic Quantization support - optimize_encoder.py: --quant_method flag (rtn|k_quant_mixed|hqq), sensitive node detection, external data filename preservation - generators.cpp: graceful stop via shouldStop flag, drain-on-stop, CommitAudio + polling for clean shutdown

…ig parser Support streaming cache-aware encoder inputs (cache_last_channel, cache_last_time, cache_last_channel_len) and outputs (*_next variants) in genai_config.json parsing. Also add optimization section sink to silently consume encoder.optimization metadata.

tools/nemotron_export/export_nemotron_to_onnx.py

+"""
+
+import argparse
+import os


tools/nemotron_export/test_e2e.py

+        print(f"  ✓ Full sequence ({len(token_list)} tokens): {token_list[:20]}{'...' if len(token_list) > 20 else ''}")
+
+        # Decode tokens to text (skip the first token which is the dummy BOS)
+        decoded_ids = np.array(token_list[1:], dtype=np.int32)  # Skip dummy BOS


tools/nemotron_export/test_real_speech.py

+        resampler = torchaudio.transforms.Resample(sr, 16000)
+        waveform_t = resampler(waveform_t)
+        waveform_np = waveform_t.squeeze(0).numpy()
+        sr = 16000


tools/parakeet_export/export_parakeet_to_onnx.py

tools/parakeet_export/test_real_speech.py

+        resampler = torchaudio.transforms.Resample(sr, 16000)
+        waveform_t = resampler(waveform_t)
+        waveform_np = waveform_t.squeeze(0).numpy()
+        sr = 16000


tools/parakeet_export/test_real_speech.py

+
+
+if __name__ == "__main__":
+    success = main()


tools/parakeet_export/export_parakeet_to_onnx.py

+import argparse
+import json
+import os
+import shutil


hanbitmyths added 6 commits February 12, 2026 22:19

jiafatom changed the title ~~Parakeet export~~ [Draft] Parakeet export Feb 12, 2026

jiafatom marked this pull request as draft February 12, 2026 22:23

github-advanced-security bot found potential problems Feb 12, 2026

View reviewed changes

jiafatom force-pushed the parakeet_export branch 2 times, most recently from 4fb3784 to f456a3d Compare February 17, 2026 17:36

github-advanced-security bot found potential problems Feb 17, 2026

View reviewed changes

tools/parakeet_export/export_parakeet_to_onnx.py Fixed Show fixed Hide fixed

tools/parakeet_export/export_parakeet_to_onnx.py Fixed Show fixed Hide fixed

jiafatom force-pushed the parakeet_export branch from f456a3d to 79c5025 Compare February 27, 2026 02:06

parakeet export

f9160cd

jiafatom force-pushed the parakeet_export branch from 79c5025 to f9160cd Compare February 27, 2026 02:19

github-advanced-security bot found potential problems Feb 27, 2026

View reviewed changes

tools/parakeet_export/export_parakeet_to_onnx.py

import argparse

import json

import os

import shutil

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'shutil' is not used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Parakeet export#1977

[Draft] Parakeet export#1977
jiafatom wants to merge 7 commits intomicrosoft:mainfrom
jiafatom:parakeet_export

jiafatom commented Feb 12, 2026

Uh oh!

Check notice

Check notice

Check notice

Uh oh!

Uh oh!

Check notice

Check notice

Uh oh!

Uh oh!

Check notice

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jiafatom commented Feb 12, 2026

Uh oh!

Check notice

Check notice

Check notice

Uh oh!

Uh oh!

Check notice

Check notice

Uh oh!

Uh oh!

Check notice

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants