# MarkItDown
[](https://pypi.org/project/markitdown/)

[](https://github.com/microsoft/autogen)
MarkItDown is a utility for converting various files to Markdown (e.g., for indexing, text analysis, etc).
It supports:
- PDF
- PowerPoint
- Word
- Excel
- Images (EXIF metadata and OCR)
- Audio (EXIF metadata and speech transcription)
- HTML
- Text-based formats (CSV, JSON, XML)
- ZIP files (iterates over contents)
To install MarkItDown, use pip: `pip install markitdown`. Alternatively, you can install it from the source: `pip install -e .`
## Usage
### Command-Line
```bash
markitdown path-to-file.pdf > document.md
```
Or use `-o` to specify the output file:
```bash
markitdown path-to-file.pdf -o document.md
```
You can also pipe content:
```bash
cat path-to-file.pdf | markitdown
```
### TypeScript SDK
Basic usage in TypeScript:
```typescript
import { MarkItDown } from 'markitdown';
const md = new MarkItDown();
const result = md.convert('test.xlsx');
console.log(result.text_content);
```
To use Large Language Models for image descriptions, provide `llm_client` and `llm_model`:
```typescript
import { MarkItDown } from 'markitdown';
import { OpenAI } from 'openai';
const client = new OpenAI();
const md = new MarkItDown({ llm_client: client, llm_model: 'gpt-4o' });
const result = md.convert('example.jpg');
console.log(result.text_content);
```
### Docker
```sh
docker build -t markitdown:latest .
docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md
```
Batch Processing Multiple Files
This example shows how to convert multiple files to markdown format in a single run. The script processes all supported files in a directory and creates corresponding markdown files.
```typescript
import { MarkItDown } from 'markitdown';
import { OpenAI } from 'openai';
import * as fs from 'fs';
import * as path from 'path';
const client = new OpenAI({ apiKey: 'your-api-key-here' });
const md = new MarkItDown({ llm_client: client, llm_model: 'gpt-4o-2024-11-20' });
const supportedExtensions = ['.pptx', '.docx', '.pdf', '.jpg', '.jpeg', '.png'];
const filesToConvert = fs.readdirSync('.').filter(file => supportedExtensions.includes(path.extname(file).toLowerCase()));
filesToConvert.forEach(file => {
console.log(`\nConverting ${file}...`);
try {
const mdFile = path.basename(file, path.extname(file)) + '.md';
const result = md.convert(file);
fs.writeFileSync(mdFile, result.text_content);
console.log(`Successfully converted ${file} to ${mdFile}`);
} catch (e) {
console.error(`Error converting ${file}: ${e.message}`);
}
});
console.log('\nAll conversions completed!');
```
2. Place the script in the same directory as your files
3. Install required packages: like openai
4. Run script ```bash ts-node convert.ts ```
Note that original files will remain unchanged and new markdown files are created with the same base name.