Python tool for converting files and office documents to Markdown.
Find a file
2025-04-24 14:57:24 +08:00
packages/markitup make pydub an optional import 2025-04-24 06:56:00 +00:00
.gitignore add uv 2025-04-21 08:21:20 +00:00
LICENSE LICENSE committed 2024-11-13 11:56:45 -08:00
README.md make pydub an optional import 2025-04-24 06:56:00 +00:00

MarkItUp

This is a fork of MarkItDown.

While markitdown is a useful tool, its returned content is too text-focused, which is not updated to the current rise of multi-modal LLMs.

Features

  • Converts various file formats to markdown-oriented OpenAI compatible responses
  • Supports multiple file types including:
    • Documents: DOCX (not DOC)
    • Presentations: PPTX (not PPT)
    • Spreadsheets: XLSX, XLS, CSV
    • Media: Audio files (MP3, M4A)
    • Web content: HTML
    • PDF files
    • Plain text files
  • Returns OpenAI compatible response, which can be used by most LLM clients
  • Supports command line usage

Installation

Install directly from GitHub:

pip install git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup
uv add git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup

To use audio transciption using pydub, install markitup[audio]:

uv add "git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup[audio]"

Usage

from markitup.converter_utils.utils import read_files_to_bytestreams
from markitup import MarkItUp, Config

fs = read_files_to_bytestreams('packages/markitup/tests/test_files')

miu = MarkItUp(
    config=Config(
        modalities=['image', 'audio'],
        image_use_webp=True
        )
    )

result, stream_info = miu.convert(stream=fs[file_name], file_name=file_name)