afourney
790af26310
Merge branch 'main' into main
2025-04-13 09:33:36 -07:00
createcentury
041be54471
Update README.md ( #1187 )
...
updated subtle misspelling.
2025-04-13 09:31:40 -07:00
lentil32
ebe2684b3d
chore: fix typo in README.md ( #1175 )
...
* chore: fix typo in README.md
2025-04-13 09:29:16 -07:00
Turdıbek
8576f1d915
Add CSV to Markdown table conversion - fixes #1144 ( #1176 )
...
* feat: Add CSV to Markdown table converter
- Add new CsvConverter class to convert CSV files to Markdown tables\n- Support text/csv and application/csv MIME types\n- Preserve table structure with headers and data rows\n- Handle edge cases like empty cells and mismatched columns\n- Fix Azure Document Intelligence dependency handling\n- Register CsvConverter in MarkItDown class
----
Thanks also to @benny123tw who submitted a very similar PR in #1171
2025-04-13 09:19:00 -07:00
goulvenb
25b7a49d4e
Adding GHCR container
2025-04-12 15:17:57 +02:00
goulvenb
b933ac8374
Changing Docker instructions on README
2025-04-12 15:17:32 +02:00
Goulven Bourveau
ad315b8962
Cleaning up
2025-04-12 14:33:08 +02:00
Goulven Bourveau
d00becd6a9
We're gonna create a container only on tag creation
2025-04-12 14:25:54 +02:00
Goulven Bourveau
5caecac3cf
The registry is not in the environment context, but in the variable context here
2025-04-12 14:00:02 +02:00
Goulven Bourveau
3f6a906412
This worked out pretty well ; gonna try to put the repo in an environment variable
2025-04-12 13:51:52 +02:00
Goulven Bourveau
4f3d4e406c
I pointed a few things out to a LLM and this was proposed ; i don't really like to hardcode it but for testing purposes i'm gonna try
2025-04-12 13:48:05 +02:00
Goulven Bourveau
dd64b4f8e7
Looking at the logs, the PAT has insufficient rights ; gonna try the default PAT
2025-04-12 13:43:46 +02:00
Goulven Bourveau
6e693eb0de
Looking at the logs, it's due to the image which was hard codded
2025-04-12 13:27:27 +02:00
Goulven Bourveau
feba8a0a4a
the CI did not try to push on ghcr.io but on docker.io ; trying to make the variable hardcoded
2025-04-12 01:47:01 +02:00
Goulven Bourveau
493197caeb
For testing purposes, we are making a container on each push
2025-04-12 01:44:58 +02:00
Goulven Bourveau
e63cf80653
Using container repository defined on the Github settings
2025-04-12 01:44:17 +02:00
Goulven Bourveau
ff827ff9fc
Markitdown use 'main' as the master branch
2025-04-12 01:38:42 +02:00
Goulven Bourveau
09556385ff
Changing source of tutorial to https://github.com/docker/metadata-action
2025-04-12 01:37:16 +02:00
Goulven Bourveau
e651ceaa54
Adding PAT
2025-04-12 01:31:03 +02:00
Goulven Bourveau
370583ad83
Making dummy commit to see how Github react
2025-04-12 01:27:16 +02:00
Goulven Bourveau
2cba2bf7a8
Pasting default content of https://github.com/googleapis/release-please-action
2025-04-12 01:26:33 +02:00
Sathindu
3fcd48cdfc
feat: render math equations in .docx documents ( #1160 )
...
* feat: math equation rendering in .docx files
* fix: import fix on .docx pre processing
* test: add test cases for docx equation rendering
* docs: add ThirdPartyNotices.md
* refactor: reformatted with black
2025-03-28 15:36:38 -07:00
afourney
9e067c42b6
Make it easier to use AzureKeyCredentials with Azure Doc Intelligence ( #1151 )
...
* Make it easier to use AzureKeyCredentials with Azure Doc Intelligence
* Fixed mypy type error.
* Added more fine-grained options over types.
* Pass doc intel options further up the stack.
2025-03-26 10:44:11 -07:00
afourney
9a951055f0
Update readme to point to the mcp package. ( #1158 )
...
* Updated readme with link to the MCP package.
2025-03-25 15:00:04 -07:00
afourney
73b9d57312
Update badges ( #1157 )
...
* Update badges in subpackages.
2025-03-25 14:52:24 -07:00
afourney
3ca57986ef
Basic SSE MCP Server for MarkItDown ( #1155 )
...
* Added an initial minimal MCP server for MarkItDown
* Added STDIO default option.
* Added a Dockerfile, and updated the README accordingly. Also added instructions for Claude Desktop
* Pin mcp version.
2025-03-25 14:38:22 -07:00
afourney
c1f9a323ee
Bump version. ( #1154 )
2025-03-24 23:26:30 -07:00
afourney
e928b43afb
convert_url renamed to convert_uri, and now handles data and file URIs ( #1153 )
2025-03-24 21:43:04 -07:00
afourney
2ffe6ea591
Bump version. ( #1150 )
2025-03-22 11:21:32 -07:00
afourney
efc55b260d
Bump version and resolve a console encoding error. ( #1149 )
2025-03-21 09:27:25 -07:00
Yuzhong Zhang
52432bd228
Add support for preserving base64 encoded images ( #1140 )
...
* optional reserve base64 string in markdown _CustomMarkdownify and pptx
* add other converter para support
* fix linter
* Use *kwarg to pass keep_data_uri para.
* Add module cli vector tests
* Fixed formatting, and adjusted tests.
2025-03-20 18:50:23 -07:00
afourney
c0a511ecff
Updated docx file to include an image. ( #1146 )
2025-03-20 12:25:56 -07:00
afourney
cd6aa41361
Adjust warning filters and update dependencies ( #1143 )
...
Adjusts warning filters to be more contextual
Updates dependencies for magika and youtube-transcript-api
Updates the version to 0.1.0a5 in __about__.py
2025-03-19 22:09:14 -07:00
afourney
716f74dcb9
Consider anything with a charset as plain text-convertible. ( #1142 )
2025-03-19 20:46:35 -07:00
afourney
a93e0567e6
EPub Support. Adapted #123 to not use epublib. ( #1131 )
...
* Adapted #123 to not use epublib.
* Updated README.md
2025-03-17 07:48:15 -07:00
afourney
c5f70b904f
Have magika read from the stream. ( #1136 )
2025-03-17 07:39:19 -07:00
afourney
53834fdd24
Investigate and silence warnings. ( #1133 )
2025-03-15 23:41:35 -07:00
afourney
5c565b7d79
Fix remaining mypy errors. ( #1132 )
2025-03-15 23:12:48 -07:00
afourney
a78857bd43
Added epub test file. ( #1130 )
2025-03-15 18:34:51 -07:00
afourney
09df7fe8df
Small fixes for autogen integration. ( #1124 )
2025-03-12 19:18:11 -07:00
Adam Fourney
6a9f09b153
Updated Magika dependency.
2025-03-12 16:15:33 -07:00
afourney
0b815fb916
Bumping version to 0.1.0a2 ( #1123 )
2025-03-12 11:44:19 -07:00
Emanuele Meazzo
12620f1545
Handle not supported plot type in pptx ( #1122 )
...
* Handle not supported plot type in pptx
* Fixed formatting.
2025-03-12 11:26:23 -07:00
afourney
5f75e16d20
Refactored tests. ( #1120 )
...
* Refactored tests.
* Fixed CI errors, and included misc tests.
* Omit mskanji from streaminfo test.
* Omit mskanji from no hints test.
* Log results of debugging in comments (linked to Magika issue)
* Added docs as to when to use misc tests.
2025-03-12 11:08:06 -07:00
yushihang
75140a90e2
fix: correct f-string formatting in FileConversionException ( #1121 )
2025-03-12 10:15:09 -07:00
afourney
af1be36e0c
Added CLI options for extension, mimetypes, and charset. ( #1115 )
2025-03-11 13:16:33 -07:00
Adam Fourney
2a2ccc86aa
Added mimetypes to _rss_converter
2025-03-10 16:17:41 -07:00
Adam Fourney
2e51ba22e7
Enhance type guessing.
2025-03-10 16:05:41 -07:00
afourney
8f8e58c9bb
Minimize guesses when guesses are compatible. ( #1114 )
...
* Minimize guesses when guesses are compatible.
2025-03-10 15:30:44 -07:00
afourney
8e73a325c6
Switch from puremagic to magika. ( #1108 )
2025-03-10 12:49:52 -07:00