Problem
When using Azure Document Intelligence via markitdown,
DocumentAnalysisFeature.FORMULAS is always enabled, even when formula recognition is not required.
This behavior leads to degraded recognition accuracy, especially for documents that do not contain mathematical formulas.
The relevant code is here:
https://github.com/microsoft/markitdown/blob/main/packages/markitdown/src/markitdown/converters/_doc_intel_converter.py#L232
# _doc_intel_converter.py
features=[
DocumentAnalysisFeature.FORMULAS,
DocumentAnalysisFeature.TABLES,
]
Currently, FORMULAS is unconditionally included in the features list, making it impossible to disable.
Steps to Reproduce
- Configure Azure Document Intelligence and enable it in
markitdown
- Analyze a document that does not contain mathematical formulas
- Observe the extracted text / structure quality
- Compare results with and without the
FORMULAS feature enabled
Expected Behavior
Problem
When using Azure Document Intelligence via markitdown,
DocumentAnalysisFeature.FORMULASis always enabled, even when formula recognition is not required.This behavior leads to degraded recognition accuracy, especially for documents that do not contain mathematical formulas.
The relevant code is here:
https://github.com/microsoft/markitdown/blob/main/packages/markitdown/src/markitdown/converters/_doc_intel_converter.py#L232
Currently,
FORMULASis unconditionally included in thefeatureslist, making it impossible to disable.Steps to Reproduce
markitdownFORMULASfeature enabledExpected Behavior
DocumentAnalysisFeature.FORMULAS should be optional and only enabled when explicitly requested
or
It should be disabled by default for general-purpose document parsing