Commit graph

25 commits

Author SHA1 Message Date
Adam Fourney
a2cf8ee889 Added Ole files. 2025-02-28 21:55:26 -08:00
Adam Fourney
11ffd2e550 Added pdfs 2025-02-28 21:35:14 -08:00
Adam Fourney
8362df8e60 Added xlsx and xls 2025-02-28 21:21:17 -08:00
Adam Fourney
e5dc512948 Reuse error messages for missing dependencies. 2025-02-28 20:28:35 -08:00
Adam Fourney
98698a64ce Added .docx to optional dependencies 2025-02-28 17:06:59 -08:00
Adam Fourney
b9487b6b6d Fix CLI tests.... have them install [all] 2025-02-28 16:57:19 -08:00
Adam Fourney
df80df0d1f Merge main. 2025-02-28 16:25:04 -08:00
afourney
43bd79adc9
Print and log better exceptions when file conversions fail. (#1080)
* Print and log better exceptions when file conversions fail.
* Added unit tests for exceptions.
2025-02-28 16:07:47 -08:00
Adam Fourney
7d2e0bd9d4 Exploring ways to enable optional dependencies. Starting with pptx. 2025-02-28 11:57:51 -08:00
Adam Fourney
0f63a7e28f Merge branch 'main' into optional_dependencies 2025-02-28 11:08:43 -08:00
afourney
9182923375
Don't have ZipConverter accept OOXML files. This will never yield a good result. (#1078) 2025-02-28 09:54:19 -08:00
Adam Fourney
2af4ba861c Renamed exception. 2025-02-28 08:55:34 -08:00
afourney
9a19fdd134
Make sure extensions are unique in MarkItDown's convert methods. (#1076) 2025-02-28 07:43:03 -08:00
Matthew Powers
e82e0c1372
Add Support For PPTX Shape Groups (Fix in code design to not miss out on slide content) (#331)
* Adds support for Shape Groups

* Update to Test PPtx for nested shape

* This line was accidentally removed and is added back here
2025-02-27 23:21:51 -08:00
Nima Akbarzadeh
a394cc7c27
fix: Implement retry logic for YouTube transcript fetching and fix URL decoding issue (#1035)
* fix: add error handling, refactor _findKey to use json.items()

* fix: improve metadata and description extraction logic

* fix: improve YouTube transcript extraction reliability

* fix: implement retry logic for YouTube transcript fetching and fix URL decoding issue

* fix(readme): add youtube URLs as markitdown supports
2025-02-27 23:17:54 -08:00
tanreinama
a87fbf01ee
add necessary imports (#861)
* add necessary imports
2025-02-27 23:16:09 -08:00
André Menezes
d0ed74fdf4
Fix UnboundLocalError in MarkItDown._convert (#1038)
Initialize `res` at the beginning of `_convert`. If the first converter raises an exception, then the `res` variable was not initialized and we got an error when checking `if res is not None`
2025-02-27 23:11:27 -08:00
afourney
e4b419ba40
Pin Markdownify version. (#1069)
* Pin markdownify version. TODO: update code for compatibility with Markdownify 1.0.0
2025-02-27 23:09:33 -08:00
afourney
dbdf2c0c10
Added CLI tests. (#327) 2025-02-11 20:42:50 -08:00
KennyZhang1
97eeed5f32
Doc Intelligence fixes for refactored code (#325)
* added priority flag to doc intel converter constructor
* fixed analysis features bug for docx
2025-02-11 16:01:46 -08:00
afourney
935da9976c
Added priority argument to all converter constructors. (#324)
* Added priority argument to all converter constructors.
2025-02-11 12:36:32 -08:00
Ruijun Gao
5ce85c236c
Fix a typo in sample RTF plugin (#320) 2025-02-11 10:33:52 -08:00
Tomasz Kalinowski
3a5ca22a8d
Don't generate md links in 'pre' blocks (#322) 2025-02-11 07:13:17 -08:00
Adam Fourney
4b62506451 Small typo in README. 2025-02-10 15:24:28 -08:00
afourney
c73afcffea
Cleanup and refactor, in preparation for plugin support. (#318)
* Work started moving converters to individual files.
* Significant cleanup and refactor.
* Moved everything to a packages subfolder.
* Added sample plugin.
* Added instructions to the README.md
* Bumped version, and added a note about compatibility.
2025-02-10 15:21:44 -08:00