Problem
PR #1628 added zip bomb protection to ZipConverter with three hardcoded module-level constants:
MAX_DECOMPRESSED_FILE_SIZE = 100 MB per file
MAX_DECOMPRESSION_RATIO = 100:1
MAX_TOTAL_DECOMPRESSED_SIZE = 500 MB total
These defaults are reasonable for general use, but they are not configurable. This creates a real problem for legitimate use cases:
- Scientific datasets: ZIP archives commonly contain files well over 100 MB (genomics data, satellite imagery, simulation outputs).
- Legal and financial document archives: SEC EDGAR bulk data packages and court document bundles regularly exceed 500 MB total.
- Internal tooling: An organization running markitdown on known-safe internal archives has no way to raise the limits without monkey-patching the module.
Users who hit these limits get silent skipping with only a logger warning, and no way to know programmatically that their content was truncated.
Proposed solution
Move the limits to constructor parameters on ZipConverter with the current values as defaults:
class ZipConverter(DocumentConverter):
def __init__(
self,
max_file_size: int = 100 * 1024 * 1024,
max_ratio: int = 100,
max_total_size: int = 500 * 1024 * 1024,
):
self.max_file_size = max_file_size
self.max_ratio = max_ratio
self.max_total_size = max_total_size
This is a non-breaking change: the defaults stay the same, and users who need larger limits can pass them explicitly. Users who want to disable the limits entirely can pass float("inf").
Additional context
This was flagged during review of PR #1628 as a blocker before merge. Opening as a tracked issue so it does not get lost if the PR merges first.
Problem
PR #1628 added zip bomb protection to
ZipConverterwith three hardcoded module-level constants:MAX_DECOMPRESSED_FILE_SIZE = 100 MBper fileMAX_DECOMPRESSION_RATIO = 100:1MAX_TOTAL_DECOMPRESSED_SIZE = 500 MBtotalThese defaults are reasonable for general use, but they are not configurable. This creates a real problem for legitimate use cases:
Users who hit these limits get silent skipping with only a logger warning, and no way to know programmatically that their content was truncated.
Proposed solution
Move the limits to constructor parameters on
ZipConverterwith the current values as defaults:This is a non-breaking change: the defaults stay the same, and users who need larger limits can pass them explicitly. Users who want to disable the limits entirely can pass
float("inf").Additional context
This was flagged during review of PR #1628 as a blocker before merge. Opening as a tracked issue so it does not get lost if the PR merges first.