Remove unnecessary .vscode directory from .gitignore and add
pytest configuration to .vscode/settings.json to enable
testing with pytest in the project. This improves the
development workflow by ensuring that pytest is the default
testing framework.
Removes redundant paths for markitdown and tests in the
coverage configuration. This change streamlines the
coverage report by focusing on the primary source and
test directories, improving clarity and maintainability.
Add new test cases for MarkItDown to cover LLM, remote, and
local file conversions. Implement tests for handling
deprecation warnings, external URL queries, and EXIF data
processing. Ensure tests are skipped when necessary
environment conditions are not met, improving test reliability
and maintainability.
Sets up Dependabot to automatically check for updates to
GitHub Actions on a weekly basis, ensuring that the project
remains up-to-date with the latest dependencies and security
fixes.
Co-authored-by: gagb <gagb@users.noreply.github.com>
* Add support for Path objects in MarkItDown conversion methods
* Remove unnecessary blank line in test_markitdown_exiftool function
* Remove unnecessary blank line in test_markitdown_exiftool function
* remove pathlib path in test file
---------
Co-authored-by: afourney <adamfo@microsoft.com>
Co-authored-by: gagb <gagb@users.noreply.github.com>
* refactor(tests): simplify string validation in tests
Introduce a helper function `validate_strings` to streamline the
validation of expected and excluded strings in test cases. Replace
repetitive string assertions in the `test_markitdown_local` function
with calls to this new helper, improving code readability and
maintainability.
* run pre-commit
---------
Co-authored-by: lumin <71011125+l-melon@users.noreply.github.com>
Co-authored-by: gagb <gagb@users.noreply.github.com>
Use `textwrap.dedent()` to allow indented cli-helpdoc in `__main__.py` file. The indentation increases readability, while `textwrap.dedent` helps maintain the same functionality without breaking code.
fix: prevent path traversal vulnerabilities in ZipConverter
Added a secure check for path traversal vulnerabilities in the ZipConverter class.
Now validates extracted file paths using `os.path.commonprefix` to ensure all files
remain within the intended extraction directory. Raises a `ValueError` if a
path traversal attempt is detected.
- Normalized file paths using `os.path.normpath`.
- Added specific exception handling for `zipfile.BadZipFile` and traversal errors.
- Ensured cleanup of extracted files after processing when `cleanup_extracted` is enabled.