Commit graph

  • 52432bd228
    Add support for preserving base64 encoded images (#1140) Yuzhong Zhang 2025-03-21 09:50:23 +0800
  • 9cee9c26bd
    Merge branch 'main' into feat-optional_b64 afourney 2025-03-20 12:41:47 -0700
  • 959d43c637 Fixed formatting, and adjusted tests. Adam Fourney 2025-03-20 12:41:19 -0700
  • c0a511ecff
    Updated docx file to include an image. (#1146) afourney 2025-03-20 12:25:56 -0700
  • 40ab9b6db2 Updated docx file to include an image. Adam Fourney 2025-03-20 12:22:50 -0700
  • 887dbbcf5c
    Merge branch 'main' into feat-optional_b64 Yuzhong Zhang 2025-03-21 00:50:17 +0800
  • e952ab1189 Merge remote-tracking branch 'origin/feat-optional_b64' into feat-optional_b64 Yuzhong Zhang 2025-03-21 00:49:55 +0800
  • 1eaa879b25 Use *kwarg to pass keep_data_uri para. Add module cli vector tests Yuzhong Zhang 2025-03-21 00:49:36 +0800
  • cd6aa41361
    Adjust warning filters and update dependencies (#1143) v0.1.0a5 afourney 2025-03-19 22:09:14 -0700
  • a8c76a01cf Updated versions of magika and youtube-transcript-api Adam Fourney 2025-03-19 22:06:02 -0700
  • e2eb82be7c pyproject.toml Adam Fourney 2025-03-19 22:04:34 -0700
  • 48680366b2
    Merge branch 'main' into feat-optional_b64 afourney 2025-03-19 20:48:02 -0700
  • 716f74dcb9
    Consider anything with a charset as plain text-convertible. (#1142) afourney 2025-03-19 20:46:35 -0700
  • 6feccc82eb Consider anything with a charset as plain text-convertable. Adam Fourney 2025-03-19 07:50:23 -0700
  • 4899148310 fix linter Yuzhong Zhang 2025-03-18 20:30:44 +0800
  • 41cd9b5e2a add other converter para support Yuzhong Zhang 2025-03-18 20:14:46 +0800
  • 9f1bcf3b83 optional reserve base64 string in markdown Yuzhong Zhang 2025-03-18 20:01:35 +0800
  • a93e0567e6
    EPub Support. Adapted #123 to not use epublib. (#1131) v0.1.0a4 afourney 2025-03-17 07:48:15 -0700
  • a3a0836c1d Updated with main. Adam Fourney 2025-03-17 07:45:17 -0700
  • a8220bb7cc
    Merge branch 'main' into epub afourney 2025-03-17 07:39:37 -0700
  • c5f70b904f
    Have magika read from the stream. (#1136) afourney 2025-03-17 07:39:19 -0700
  • bd9924836c
    Merge branch 'main' into magika_read_stream afourney 2025-03-17 07:37:44 -0700
  • 77f3f66176 Have magika read from the stream. Adam Fourney 2025-03-17 07:35:13 -0700
  • 9a2b110535
    Merge branch 'main' into epub afourney 2025-03-15 23:41:48 -0700
  • 53834fdd24
    Investigate and silence warnings. (#1133) afourney 2025-03-15 23:41:35 -0700
  • 1d3bfbef17 Investigate and silence warnings. Adam Fourney 2025-03-15 23:35:36 -0700
  • b3c3dc4868
    Merge branch 'main' into epub afourney 2025-03-15 23:13:08 -0700
  • 5c565b7d79
    Fix remaining mypy errors. (#1132) afourney 2025-03-15 23:12:48 -0700
  • baff681990 Fix remaining mypy errors. Adam Fourney 2025-03-15 23:10:28 -0700
  • 969520c5c9 Updated README.md Adam Fourney 2025-03-15 19:02:53 -0700
  • 5791b39b0d Adapted #123 to not use epublib. Adam Fourney 2025-03-15 19:00:42 -0700
  • a78857bd43
    Added epub test file. (#1130) afourney 2025-03-15 18:34:51 -0700
  • b93388a42a Added epub test file. Adam Fourney 2025-03-15 18:32:58 -0700
  • 09df7fe8df
    Small fixes for autogen integration. (#1124) afourney 2025-03-12 19:18:11 -0700
  • a62d8edb13 Small fixes for autogen integration. v0.1.0a3 Adam Fourney 2025-03-12 19:14:35 -0700
  • 6a9f09b153 Updated Magika dependency. Adam Fourney 2025-03-12 16:15:33 -0700
  • 0b815fb916
    Bumping version to 0.1.0a2 (#1123) afourney 2025-03-12 11:44:19 -0700
  • de2c56ffbc Bumping version to 0.1.0a2 v0.1.0a2 Adam Fourney 2025-03-12 11:42:00 -0700
  • 12620f1545
    Handle not supported plot type in pptx (#1122) Emanuele Meazzo 2025-03-12 19:26:23 +0100
  • 92169da2fd Fixed formatting. Adam Fourney 2025-03-12 11:22:21 -0700
  • 2ad7207423
    Merge branch 'main' into main afourney 2025-03-12 11:15:34 -0700
  • 5f75e16d20
    Refactored tests. (#1120) afourney 2025-03-12 11:08:06 -0700
  • edda821e38 Merge branch 'refactor_tests' of github.com:microsoft/markitdown into refactor_tests Adam Fourney 2025-03-12 11:06:00 -0700
  • 85262c38a1 Added docs as to when to use misc tests. Adam Fourney 2025-03-12 11:05:46 -0700
  • b851a01861
    Merge branch 'main' into refactor_tests afourney 2025-03-12 10:28:01 -0700
  • 7819841d38
    Merge branch 'main' into main Emanuele Meazzo 2025-03-12 18:19:19 +0100
  • 75140a90e2
    fix: correct f-string formatting in FileConversionException (#1121) yushihang 2025-03-13 01:15:09 +0800
  • aa365e6b3d
    Handle not supported plot type in pptx Emanuele Meazzo 2025-03-12 18:14:58 +0100
  • 8189b041f3
    fix: correct f-string formatting in FileConversionException yushihang 2025-03-12 19:04:24 +0800
  • 8938eb84dc Linked to Magika issue. Adam Fourney 2025-03-12 03:51:23 -0700
  • 677ce9132b Log results of debugging in comments. Adam Fourney 2025-03-12 03:18:27 -0700
  • 4f73855606 Omit mskanji from no hints test. Adam Fourney 2025-03-11 22:13:03 -0700
  • 9075dec377 Omit mskanji from streaminfo test. Adam Fourney 2025-03-11 22:10:15 -0700
  • 30924f7bb9 Fixed CI errors, and inluded misc tests. Adam Fourney 2025-03-11 22:02:47 -0700
  • 03fbec6c4f Refactored tests. Adam Fourney 2025-03-11 21:22:52 -0700
  • af1be36e0c
    Added CLI options for extension, mimetypes, and charset. (#1115) afourney 2025-03-11 13:16:33 -0700
  • f4c6c5133a Added CLI options for extension, mimetypes, and charset. Adam Fourney 2025-03-10 23:56:26 -0700
  • dba32ef21d
    Merge a77c4f0415 into 2a2ccc86aa Casper da Costa-Luis 2025-03-11 11:31:34 +0900
  • fdd2ae2cef
    Merge 011328920b into 2a2ccc86aa Casper da Costa-Luis 2025-03-11 11:31:34 +0900
  • 2a2ccc86aa Added mimetypes to _rss_converter Adam Fourney 2025-03-10 16:17:41 -0700
  • 2e51ba22e7 Enhance type guessing. Adam Fourney 2025-03-10 16:05:41 -0700
  • 8f8e58c9bb
    Minimize guesses when guesses are compatible. (#1114) afourney 2025-03-10 15:30:44 -0700
  • 825192b872 Removed debug print. Adam Fourney 2025-03-10 14:55:20 -0700
  • 65e7fbeb7a Minimize guesses when guesses are compatible. Adam Fourney 2025-03-10 14:50:12 -0700
  • 8e73a325c6
    Switch from puremagic to magika. (#1108) afourney 2025-03-10 12:49:52 -0700
  • ab9a681a6e
    Merge branch 'main' into magika afourney 2025-03-10 10:33:41 -0700
  • 13ac9c214e Added FastAPI server to handle file conversion dev-myk 2025-03-10 08:56:49 +0300
  • c3d241ec12
    Merge 9fe8507906 into 2405f201af Hieu Lam 2025-03-10 10:36:05 +0700
  • 9fe8507906 fix: reformatted to unify syntax Hieu Lam 2025-03-10 10:31:21 +0700
  • 65b3f4a152 chore: using magika instead of guesslang Hieu Lam 2025-03-10 10:28:24 +0700
  • 2405f201af
    fix typo in well-known path list (#1109) Mohit Agarwal 2025-03-09 09:02:44 +0530
  • 3dfcbbefe1
    fix typo in well-known path list Mohit Agarwal 2025-03-09 08:25:41 +0530
  • 58a687c08c Switch from puremagic to magika. Adam Fourney 2025-03-08 10:45:22 -0800
  • a77c4f0415
    CLI: add --llm-client-header Casper da Costa-Luis 2024-12-18 13:29:53 +0000
  • 68724917c7
    drop --llm-client for now Casper da Costa-Luis 2024-12-17 08:43:26 +0000
  • 88961c3280
    CLI: support LLM Casper da Costa-Luis 2024-12-17 06:29:51 +0000
  • 011328920b
    update tests Casper da Costa-Luis 2025-03-08 16:12:11 +0000
  • f17bc21c9d If files use zip packaging, be smarter about inspecting their types. zip_formats Adam Fourney 2025-03-07 23:06:56 -0800
  • 99d8e562db
    Fix exiftool in well-known paths. (#1106) afourney 2025-03-07 21:47:20 -0800
  • 76f7c6e259 Fix exiftool in well-known paths. Adam Fourney 2025-03-07 21:45:54 -0800
  • 515fa854bf
    feat(docker): improve dockerfile build (#220) Sebastian Yaghoubi 2025-03-07 20:07:40 -0800
  • 4fe8b381c2 Update Dockerfile to new package structure, and fix streaming bugs. Adam Fourney 2025-03-07 20:06:04 -0800
  • 3ed384fcbe
    Merge branch 'main' into fix-docker afourney 2025-03-07 18:55:15 -0800
  • 79b78c694d
    Merge branch 'main' into completion gagb 2025-03-07 16:25:42 -0800
  • e58bc486ee Added missing comma. v0.0.2 v0.0.X Adam Fourney 2025-03-07 16:18:47 -0800
  • 81ef601c09
    Removed deprecation and other warnings. (#1105) afourney 2025-03-07 16:17:03 -0800
  • 461a4440dc Removed deprecation and other warnings. Adam Fourney 2025-03-07 16:15:36 -0800
  • 518b12c1fb
    Addresses #1068 (#1101) afourney 2025-03-07 15:46:30 -0800
  • 0229ff6cb7
    feat: sort pptx shapes to be parsed in top-to-bottom, left-to-right order (#1104) Richard Ye 2025-03-07 18:45:14 -0500
  • 50231ddd36 Added missing import Adam Fourney 2025-03-07 15:43:48 -0800
  • a42032e4c8 Fixed formatting. Adam Fourney 2025-03-07 15:23:19 -0800
  • 3ac0acbb5d
    Update README.md Richard Ye 2025-03-07 14:52:24 -0500
  • 288a44ecf7
    Sort PPTX shapes to be read in top-to-bottom, left-to-right order Richard Ye 2025-03-07 14:02:19 -0500
  • 461e44cdd4 Addresses #1068 Adam Fourney 2025-03-06 20:36:06 -0800
  • da73d64bfa Initial work to port #55 to MarkItDown 0.1.X onenote Adam Fourney 2025-03-06 13:17:58 -0800
  • 3ebe8dfacb
    slight if-else tidy Casper da Costa-Luis 2025-03-06 14:22:29 +0000
  • e270e63bbc
    standardise metavars Casper da Costa-Luis 2025-03-06 14:22:16 +0000
  • 38feb5e7a1
    tidy docstrings Casper da Costa-Luis 2025-03-06 14:07:01 +0000
  • b0406ca2c7
    global parser Casper da Costa-Luis 2024-12-22 08:36:55 +0000
  • e4238eb1ac
    Merge 4050de78b6 into 82d84e3edd lumin 2025-03-06 05:35:37 -0800