markitdown/README.md

# MarkItUp

This is a fork of [MarkItDown](https://github.com/microsoft/markitdown).

While markitdown is a useful tool, its returned content is too text-focused, which is not updated to the current rise of multi-modal LLMs.

## Features

- Converts various file formats to markdown-oriented OpenAI compatible responses
- Supports multiple file types including:
  - Documents: DOCX (not DOC)
  - Presentations: PPTX (not PPT)
  - Spreadsheets: XLSX, XLS, CSV
  - Media: Audio files (MP3, M4A)
  - Web content: HTML
  - PDF files
  - Plain text files
- Returns OpenAI compatible response, which can be used by most LLM clients
- Supports command line usage

## Installation

Install directly from GitHub:

```bash
pip install git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup
```

```bash
uv add git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup
```

To use audio transciption using `pydub`, install `markitup[audio]`:
```bash
uv add "git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup[audio]"
```


## Usage
```python
from markitup.converter_utils.utils import read_files_to_bytestreams
from markitup import MarkItUp, Config

fs = read_files_to_bytestreams('packages/markitup/tests/test_files')

miu = MarkItUp(
    config=Config(
        modalities=['image', 'audio'],
        image_use_webp=True
        )
    )

result, stream_info = miu.convert(stream=fs[file_name], file_name=file_name)

```
rename to markitup 2025-04-21 07:13:19 +00:00			`# MarkItUp`
README.md committed 2024-11-13 19:56:46 +00:00
rename to markitup 2025-04-21 07:13:19 +00:00			`This is a fork of [MarkItDown](https://github.com/microsoft/markitdown).`
Add downloads badge 2024-12-18 22:26:36 +00:00
add readme and test 2025-04-23 07:18:23 +00:00			`While markitdown is a useful tool, its returned content is too text-focused, which is not updated to the current rise of multi-modal LLMs.`
Update readme to point to the mcp package. (#1158) * Updated readme with link to the MCP package. 2025-03-25 22:00:04 +00:00
rename to markitup 2025-04-21 07:13:19 +00:00			`## Features`
Add downloads badge 2024-12-18 22:26:36 +00:00
add readme and test 2025-04-23 07:18:23 +00:00			`- Converts various file formats to markdown-oriented OpenAI compatible responses`
			`- Supports multiple file types including:`
			`- Documents: DOCX (not DOC)`
			`- Presentations: PPTX (not PPT)`
			`- Spreadsheets: XLSX, XLS, CSV`
			`- Media: Audio files (MP3, M4A)`
			`- Web content: HTML`
			`- PDF files`
			`- Plain text files`
			`- Returns OpenAI compatible response, which can be used by most LLM clients`
add readme and test 2025-04-23 07:18:35 +00:00			`- Supports command line usage`

			`## Installation`

			`Install directly from GitHub:`

			```bash
allow plugin 2025-04-23 09:23:59 +00:00			`pip install git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup`
add readme 2025-04-23 08:56:14 +00:00			```

			```bash
allow plugin 2025-04-23 09:23:59 +00:00			`uv add git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup`
Update README.md 2025-04-23 09:32:02 +00:00			```

make pydub an optional import 2025-04-24 06:56:00 +00:00			To use audio transciption using `pydub`, install `markitup[audio]`:
			```bash
			`uv add "git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup[audio]"`
			```


Update README.md 2025-04-23 09:32:02 +00:00			`## Usage`
			```python
			`from markitup.converter_utils.utils import read_files_to_bytestreams`
			`from markitup import MarkItUp, Config`

			`fs = read_files_to_bytestreams('packages/markitup/tests/test_files')`

			`miu = MarkItUp(`
			`config=Config(`
			`modalities=['image', 'audio'],`
			`image_use_webp=True`
			`)`
			`)`

			`result, stream_info = miu.convert(stream=fs[file_name], file_name=file_name)`

			```