Skip to content

EmlConverter: add support for converting email attachments #1662

@VANDRANKI

Description

@VANDRANKI

Problem

PR #1633 adds EmlConverter which correctly extracts email headers and the message body. However, email attachments are not processed. A .eml file that contains an attached PDF, Word document, or image is converted to just the email body text, with no indication that attachments exist or what they contain.

For use cases like:

  • Converting an inbox export to a searchable knowledge base
  • Extracting information from automated report emails with attached spreadsheets
  • Processing compliance email archives that include attached contracts

...the body text alone is not enough. The attachments often contain the most important content.

Proposed solution

Extend EmlConverter._get_body() (or add a new _get_attachments() method) to:

  1. Detect MIME parts that are attachments (i.e., Content-Disposition: attachment)
  2. For each attachment, pass the raw bytes back through the main MarkItDown converter using the attachment filename to determine the appropriate converter
  3. Append each converted attachment to the markdown output under a ## Attachment: filename.ext heading

Attachments that cannot be converted (unknown format, conversion error) should be listed with a note rather than silently skipped.

Example output

# Email Message

**From:** sender@example.com
**Subject:** Q1 Report

## Content

Please find the Q1 report attached.

## Attachment: Q1_Report.xlsx

| Quarter | Revenue | Costs |
|---------|---------|-------|
| Q1 2026 | 1.2M    | 0.8M  |

Additional context

The recursive converter call pattern (passing attachments back through MarkItDown) is already used in ZipConverter for processing zip contents. The same approach would work here with minimal new code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions