Commit graph

204 commits

Author SHA1 Message Date
ZeyuTeng96
38d6fcc001
Merge 0f948ade40 into 73ba69d8cd 2025-02-09 17:46:35 -08:00
wunde005
73ba69d8cd
For csv files mimetypes.guess_type is returning "application/vnd.ms-excel" on windows causing an invalid mime type in plaintextconverter. In reference to issue: https://github.com/microsoft/markitdown/issues/150 (#273) 2025-02-08 20:58:13 -08:00
Werner Robitza
2a4f7bb6a8
fix: argparse CLI option ordering, fixes #268 (#290)
* fix: argparse CLI option ordering, fixes #268
* Fixed formatting.
2025-02-08 20:50:38 -08:00
masquare
7cf5e0bb23
feat(pptx): support image description with LLM for pptx files (#306) 2025-02-08 20:37:34 -08:00
James Hickey
3090917a49
Typo fixed (#270) 2025-02-08 20:30:13 -08:00
ZeyuTeng96
7bea2672a0
remove leading and trailing \n for HtmlConverter (#262) 2025-02-08 20:28:35 -08:00
KennyZhang1
bf6a15e9b5
Kennyzhang/docintel docs (#312)
* updated docs to include doc intelligence

* include reference to doc intel setup docs
2025-01-31 22:23:26 -08:00
KennyZhang1
bfde857420
Add support for conversion via Document Intelligence (#303)
* added cli params for doc intel

* added DocumentIntelligenceConverter class implementation

* initialized doc intel client instance field

* added isolated doc_intel main conversion function

* temp fix for ContentFormat import bug

* ran tests for docintel and offline for many filetypes

* push doc intel converter to the top of the stack

* formatting changes

* modified project toml file
2025-01-24 14:09:32 -08:00
afourney
f58a864951
Set exiftool path explicitly. (#267) 2025-01-06 12:43:47 -08:00
afourney
265aea2edf
Removed the holiday away message from README.md (#266) 2025-01-06 09:06:21 -08:00
afourney
05b78e7ce1
Recognize json as plain text (if no other handlers are present). (#261)
* Recognize json as plain text (if no other handlers are present).
2025-01-03 16:40:43 -08:00
afourney
436407288f
If puremagic has no guesses, try again after ltrim. (#260) 2025-01-03 16:03:11 -08:00
afourney
731b39e7f5
Added a test for leading spaces. (#258) 2025-01-03 14:34:33 -08:00
yeungadrian
08ed32869e
Feature/ Add xls support (#169)
* add xlrd
* add xls converter with tests
2025-01-03 13:58:17 -08:00
Murat Can Kurtuluş
d248621ba4
feat: outlook ".msg" file converter (#196)
* feat: outlook .msg converter
* add test, adjust docstring
2025-01-03 13:34:39 -08:00
AbSadiki
4678c8a2a4
fix(transcription): IS_AUDIO_TRANSCRIPTION_CAPABLE should be iniztialized (#194) 2025-01-03 13:29:26 -08:00
ZeyuTeng96
0f948ade40
JsonConverter for Converting JSON Files into Structured Markdown Files
Converts Jsons to Markdown. The output preserves the structure of the JSON file as closely as possible, while using Markdown syntax for readability.
2025-01-03 14:10:40 +08:00
Ikko Eltociear Ashimine
125e206047
docs: update README.md (#182)
faciliate -> facilitate
2024-12-21 01:51:30 -08:00
numekudi
f94d09990e
feat: enable Git support in devcontainer (#136)
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-20 18:09:17 -08:00
lumin
cfd2319c14
feat: add version option to markitdown CLI (#172)
Add a `--version` option to the markitdown command-line interface 
that displays the current version number.
2024-12-20 16:24:45 -08:00
dependabot[bot]
73161982ff
Bump actions/setup-python from 2 to 5 (#179)
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 2 to 5.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](https://github.com/actions/setup-python/compare/v2...v5)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: afourney <adamfo@microsoft.com>
2024-12-20 16:20:22 -08:00
dependabot[bot]
9b69467772
Bump actions/cache from 3 to 4 (#178)
Bumps [actions/cache](https://github.com/actions/cache) from 3 to 4.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](https://github.com/actions/cache/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: gagb <gagb@users.noreply.github.com>
Co-authored-by: afourney <adamfo@microsoft.com>
2024-12-20 16:17:43 -08:00
gagb
857a2d160d
Update README.md (#180) 2024-12-20 14:49:20 -08:00
Soulter
1123392306
fix: support -o param to avoid encoding issues (#116)
* perf: cli supports -o param

* doc: update README

---------

Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-20 14:43:00 -08:00
dependabot[bot]
377a7eaa7d
Bump actions/checkout from 2 to 4 (#177)
Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 4.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v2...v4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-20 14:36:48 -08:00
lumin
c1a0d3deaf
chore: configure Dependabot for GitHub Actions updates (#112)
Sets up Dependabot to automatically check for updates to 
GitHub Actions on a weekly basis, ensuring that the project 
remains up-to-date with the latest dependencies and security 
fixes.

Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-20 14:28:55 -08:00
SigireddyBalasai
5276616ba1
Added support to use Pathlib (#93)
* Add support for Path objects in MarkItDown conversion methods

* Remove unnecessary blank line in test_markitdown_exiftool function

* Remove unnecessary blank line in test_markitdown_exiftool function

* remove pathlib path in test file

---------

Co-authored-by: afourney <adamfo@microsoft.com>
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-20 14:12:48 -08:00
gagb
7e6c36c5d4
docs: add contribution guidelines to README (#176) 2024-12-20 14:08:58 -08:00
lumin
52d73080c7
refactor(tests): add helper function for tests (#87)
* refactor(tests): simplify string validation in tests

Introduce a helper function `validate_strings` to streamline the 
validation of expected and excluded strings in test cases. Replace 
repetitive string assertions in the `test_markitdown_local` function 
with calls to this new helper, improving code readability and 
maintainability.

* run pre-commit

---------

Co-authored-by: lumin <71011125+l-melon@users.noreply.github.com>
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-20 11:42:32 -08:00
afourney
e6421000e3
Merge pull request #160 from sugatoray/support_type_hinting
Add support for type-hinting (PEP-561)
2024-12-20 10:54:43 -08:00
Sugato Ray
08a25345e3 [feat]: add support for type-hinting for PEP-561 2024-12-20 02:37:10 +00:00
Sugato Ray
8921fe7304 ignore .vscode folder
- avoid local developer vscode editor settings
2024-12-20 02:18:14 +00:00
Sugato Ray
613825d5b3 [feat]: add support for type-hinting for PEP-561 2024-12-20 02:12:24 +00:00
gagb
18e3f1d428
Merge pull request #91 from PetrAPConsulting/patch-1
Update README.md
2024-12-19 14:02:47 -08:00
gagb
c295dee5e4
Merge branch 'main' into patch-1 2024-12-19 13:22:51 -08:00
gagb
dd87dd5e36
Merge pull request #156 from microsoft/afourney-patch-1
Added holiday notice.
2024-12-19 11:18:24 -08:00
afourney
535147b2e8
Added holiday notice.
Added holiday notice.
2024-12-19 11:11:54 -08:00
gagb
5c776bda70
Update README.md 2024-12-19 10:30:53 -08:00
gagb
423a01844a
Merge branch 'main' into patch-1 2024-12-19 10:30:10 -08:00
gagb
7147bef7d5
Merge pull request #130 from sugatoray/update_commandline_help
Update CLI helpdoc formatting to allow indentation in code
2024-12-19 10:20:23 -08:00
Sugato Ray
a5f39d6922
Merge branch 'main' into update_commandline_help 2024-12-19 07:58:48 -05:00
gagb
925c4499f7
Merge pull request #121 from l-lumin/add-project-description 2024-12-19 00:53:54 -08:00
Petr@AP Consulting
b28f380a47
Update README.md
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-19 09:23:15 +01:00
lumin
c86287b7e3 feat: add project description in pyproject.toml 2024-12-19 13:02:47 +09:00
Sugato Ray
6f3c762526
Merge branch 'main' into update_commandline_help 2024-12-18 17:50:07 -05:00
gagb
cb66b35f11
Merge pull request #132 from microsoft/gagb-patch-1
Add downloads badge
2024-12-18 14:30:09 -08:00
gagb
a2743a5314
Add downloads badge 2024-12-18 14:26:36 -08:00
Sugato Ray
277480066a Merge branch 'update_commandline_help' of https://github.com/sugatoray/markitdown into update_commandline_help 2024-12-18 21:53:54 +00:00
gagb
6e1b9a7402 Run precommit 2024-12-18 13:46:10 -08:00
Sugato Ray
1384e80725 update .gitignore to exclude .vscode folder 2024-12-18 21:46:06 +00:00