Problem
When you call MarkItDown(), all built-in converters are registered. There is no way to exclude specific converters without subclassing MarkItDown and overriding enable_builtins.
This creates friction in two common cases:
-
Security hardening. A deployment that serves untrusted files might want to disable ZipConverter (to prevent zip bombs even with the existing limits) or AudioConverter (to avoid network calls to Whisper APIs on untrusted audio files). Right now this requires a custom subclass.
-
Dependency management. Some converters require optional extras (mammoth for DOCX, pptx for PowerPoint, speech_recognition for audio). If a user installs markitdown without those extras, the converter registration silently does nothing. A user who wants to explicitly opt in to only a specific set of converters has no clean way to do that.
Proposed API
# Option A: exclusion list
md = MarkItDown(disabled_converters=["ZipConverter", "AudioConverter"])
# Option B: explicit inclusion (no built-ins except what you list)
md = MarkItDown(enable_builtins=False)
md.register_converter(PdfConverter())
md.register_converter(HtmlConverter())
# Option B already partially works - enable_builtins=False would just skip the auto-registration
Option A is the lower-friction change. The enable_builtins method could accept an exclude parameter:
def enable_builtins(self, md: "MarkItDown", exclude: list[str] | None = None) -> None:
...
Notes
- This is purely an API surface change with no impact on converter logic
- The
exclude list could match on class name strings to avoid importing converter classes just to exclude them
Problem
When you call
MarkItDown(), all built-in converters are registered. There is no way to exclude specific converters without subclassingMarkItDownand overridingenable_builtins.This creates friction in two common cases:
Security hardening. A deployment that serves untrusted files might want to disable
ZipConverter(to prevent zip bombs even with the existing limits) orAudioConverter(to avoid network calls to Whisper APIs on untrusted audio files). Right now this requires a custom subclass.Dependency management. Some converters require optional extras (
mammothfor DOCX,pptxfor PowerPoint,speech_recognitionfor audio). If a user installsmarkitdownwithout those extras, the converter registration silently does nothing. A user who wants to explicitly opt in to only a specific set of converters has no clean way to do that.Proposed API
Option A is the lower-friction change. The
enable_builtinsmethod could accept anexcludeparameter:Notes
excludelist could match on class name strings to avoid importing converter classes just to exclude them