Hew Li Yang
11ab04bfb4
Merge remote-tracking branch 'upstream/main' into hly/chore/xlsx
2025-04-03 08:23:43 +08:00
afourney
c73afcffea
Cleanup and refactor, in preparation for plugin support. ( #318 )
...
* Work started moving converters to individual files.
* Significant cleanup and refactor.
* Moved everything to a packages subfolder.
* Added sample plugin.
* Added instructions to the README.md
* Bumped version, and added a note about compatibility.
2025-02-10 15:21:44 -08:00
afourney
f58a864951
Set exiftool path explicitly. ( #267 )
2025-01-06 12:43:47 -08:00
afourney
05b78e7ce1
Recognize json as plain text (if no other handlers are present). ( #261 )
...
* Recognize json as plain text (if no other handlers are present).
2025-01-03 16:40:43 -08:00
afourney
436407288f
If puremagic has no guesses, try again after ltrim. ( #260 )
2025-01-03 16:03:11 -08:00
afourney
731b39e7f5
Added a test for leading spaces. ( #258 )
2025-01-03 14:34:33 -08:00
yeungadrian
08ed32869e
Feature/ Add xls support ( #169 )
...
* add xlrd
* add xls converter with tests
2025-01-03 13:58:17 -08:00
Murat Can Kurtuluş
d248621ba4
feat: outlook ".msg" file converter ( #196 )
...
* feat: outlook .msg converter
* add test, adjust docstring
2025-01-03 13:34:39 -08:00
Hew Li Yang
ba3011721c
chore: update tests
2024-12-22 21:39:12 +08:00
Hew Li Yang
b10b295fb4
Merge branch 'main' into hly/chore/xlsx
2024-12-22 21:29:29 +08:00
Hew Li Yang
7b64e6ebfd
chore: consider header for column-wise drop
2024-12-22 21:22:41 +08:00
lumin
52d73080c7
refactor(tests): add helper function for tests ( #87 )
...
* refactor(tests): simplify string validation in tests
Introduce a helper function `validate_strings` to streamline the
validation of expected and excluded strings in test cases. Replace
repetitive string assertions in the `test_markitdown_local` function
with calls to this new helper, improving code readability and
maintainability.
* run pre-commit
---------
Co-authored-by: lumin <71011125+l-melon@users.noreply.github.com>
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-20 11:42:32 -08:00
afourney
9e546a8588
Merge branch 'main' into main
2024-12-17 15:37:28 -08:00
Adam Fourney
8d5f16ecd2
Fixed formatting.
2024-12-17 15:27:06 -08:00
afourney
a571021199
Merge branch 'main' into main
2024-12-17 15:12:59 -08:00
Adam Fourney
95188a4a27
Merge main.
2024-12-17 13:46:26 -08:00
Adam Fourney
03a7843a0a
Added deprecation warnings for mlm_* arguments.
2024-12-17 13:22:48 -08:00
Adam Fourney
248d64edd0
Added llm tests to the local test set.
2024-12-17 12:13:19 -08:00
Soulter
752fbd333c
feat: add tests of rss convertor
2024-12-17 22:45:27 +08:00
Hew Li Yang
113f7748b7
chore: simplify xlsx tests
2024-12-17 21:38:40 +08:00
Hew Li Yang
5c60d8ca12
chore: finer flags, forward na_rep
2024-12-17 21:17:40 +08:00
Hew Li Yang
c2aae4ddda
chore: make cleaning optional
2024-12-17 14:03:39 +08:00
gagb
134d35a859
Merge branch 'main' into hly/chore/xlsx
2024-12-16 16:26:14 -08:00
afourney
afaff11ef0
Merge branch 'main' into main
2024-12-16 14:40:58 -08:00
afourney
e7636656d8
Merge branch 'main' into support-comments-in-docx
2024-12-16 14:23:14 -08:00
afourney
12ce5e95b2
Merge branch 'main' into feature/add-pptx-chart-support
2024-12-16 14:06:14 -08:00
gagb
9e6a19987b
Merge branch 'main' into main
2024-12-16 13:51:39 -08:00
Om Gupta
a3208f2bd0
feat: Add IpynbConverter
...
- Implemented IpynbConverter class for converting Jupyter Notebook (.ipynb) files into Markdown format.
- Supports markdown cells, code cells and raw cells.
- First markdown heading is used as the title if no title is found in notebook metadata.
- Created a test notebook (`test_notebook.ipynb`) to verify the functionality of the converter.
2024-12-17 01:00:41 +05:30
Hew Li Yang
19dc6a3641
chore: update test excel with a nan
2024-12-16 15:44:30 +08:00
Hew Li Yang
5de769f1bc
chore: excel improvements
2024-12-16 15:27:03 +08:00
Ville Puuska
0a7203b876
add style_map prop to MarkItDown class
2024-12-15 17:23:57 +02:00
Ville Puuska
0704b0b6ff
pass 'style_map' kwarg to mammoth when converting docx
2024-12-15 16:59:21 +02:00
sakasegawa
0dd4e95584
Remove _is_chart
2024-12-15 21:14:58 +09:00
sakasegawa
93130b5ba5
Add PPTX chart support
2024-12-15 20:42:55 +09:00
Divyansh Singh
52b723724c
Fix character decoding issues with text-like files
2024-12-15 10:37:59 +05:30
Josh XT
4987201ef6
test
2024-12-14 08:49:03 -05:00
Josh XT
571c5bbc0e
add test
2024-12-14 08:45:51 -05:00
Adam Fourney
1787b83d7d
Fix remote tests.
2024-11-13 14:37:47 -08:00
Adam Fourney
f20c964f99
Initial commit.
2024-11-13 13:00:01 -08:00