Summary
Please add first-class support for .vtt / WebVTT subtitle files.
Why this would help
MarkItDown is already useful as a bridge from many document/media formats into Markdown for LLM workflows. WebVTT is a common interchange format for:
- meeting transcripts
- video subtitles/captions
- downloaded YouTube or conference transcripts
- course/video note pipelines
Right now, .vtt does not appear to be explicitly supported. It may sometimes fall through as generic text depending on MIME/charset detection, but that is fragile and does not produce a clean Markdown result.
Expected behavior
Given a .vtt file, MarkItDown should parse it intentionally and produce readable Markdown instead of raw subtitle syntax.
For example, it should handle:
WEBVTT header
- cue timestamps (
00:00:01.000 --> 00:00:03.000)
- cue identifiers
- multiline subtitle blocks
- optional speaker prefixes / metadata when present
Possible output shapes
Any of these would be better than raw passthrough:
-
Clean transcript mode
- strips timestamps/cue metadata
- preserves text paragraphs
-
Timestamp-preserving markdown mode
- keeps timestamps in a readable markdown form, e.g.
-
Metadata-aware transcript
- preserves speaker labels when present
- drops formatting noise
Why this seems aligned with MarkItDown
MarkItDown already supports text-oriented and transcription-oriented inputs (including audio transcription and YouTube transcript workflows). WebVTT feels like a natural input format for the same family of use cases.
Minimal ask
Even a first step would be great:
- explicitly recognize
.vtt / text/vtt
- convert it through a dedicated parser or lightweight cleaner
- output readable Markdown/text content rather than raw subtitle markup
Thanks — this would make MarkItDown much more useful in note-taking / transcript-to-markdown pipelines.
Summary
Please add first-class support for
.vtt/ WebVTT subtitle files.Why this would help
MarkItDown is already useful as a bridge from many document/media formats into Markdown for LLM workflows. WebVTT is a common interchange format for:
Right now,
.vttdoes not appear to be explicitly supported. It may sometimes fall through as generic text depending on MIME/charset detection, but that is fragile and does not produce a clean Markdown result.Expected behavior
Given a
.vttfile, MarkItDown should parse it intentionally and produce readable Markdown instead of raw subtitle syntax.For example, it should handle:
WEBVTTheader00:00:01.000 --> 00:00:03.000)Possible output shapes
Any of these would be better than raw passthrough:
Clean transcript mode
Timestamp-preserving markdown mode
[00:01] Speaker: text...Metadata-aware transcript
Why this seems aligned with MarkItDown
MarkItDown already supports text-oriented and transcription-oriented inputs (including audio transcription and YouTube transcript workflows). WebVTT feels like a natural input format for the same family of use cases.
Minimal ask
Even a first step would be great:
.vtt/text/vttThanks — this would make MarkItDown much more useful in note-taking / transcript-to-markdown pipelines.