Support .vtt / WebVTT subtitle files

## Summary

Please add first-class support for `.vtt` / WebVTT subtitle files.

## Why this would help

MarkItDown is already useful as a bridge from many document/media formats into Markdown for LLM workflows. WebVTT is a common interchange format for:

- meeting transcripts
- video subtitles/captions
- downloaded YouTube or conference transcripts
- course/video note pipelines

Right now, `.vtt` does not appear to be explicitly supported. It may sometimes fall through as generic text depending on MIME/charset detection, but that is fragile and does not produce a clean Markdown result.

## Expected behavior

Given a `.vtt` file, MarkItDown should parse it intentionally and produce readable Markdown instead of raw subtitle syntax.

For example, it should handle:

- `WEBVTT` header
- cue timestamps (`00:00:01.000 --> 00:00:03.000`)
- cue identifiers
- multiline subtitle blocks
- optional speaker prefixes / metadata when present

## Possible output shapes

Any of these would be better than raw passthrough:

1. **Clean transcript mode**
   - strips timestamps/cue metadata
   - preserves text paragraphs

2. **Timestamp-preserving markdown mode**
   - keeps timestamps in a readable markdown form, e.g.
     - `[00:01] Speaker: text...`

3. **Metadata-aware transcript**
   - preserves speaker labels when present
   - drops formatting noise

## Why this seems aligned with MarkItDown

MarkItDown already supports text-oriented and transcription-oriented inputs (including audio transcription and YouTube transcript workflows). WebVTT feels like a natural input format for the same family of use cases.

## Minimal ask

Even a first step would be great:

- explicitly recognize `.vtt` / `text/vtt`
- convert it through a dedicated parser or lightweight cleaner
- output readable Markdown/text content rather than raw subtitle markup

Thanks — this would make MarkItDown much more useful in note-taking / transcript-to-markdown pipelines.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support .vtt / WebVTT subtitle files #1682

Summary

Why this would help

Expected behavior

Possible output shapes

Why this seems aligned with MarkItDown

Minimal ask

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support .vtt / WebVTT subtitle files #1682

Description

Summary

Why this would help

Expected behavior

Possible output shapes

Why this seems aligned with MarkItDown

Minimal ask

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions