diff --git a/docs/user-guide/readme.md b/docs/user-guide/readme.md deleted file mode 100644 index 1002576..0000000 --- a/docs/user-guide/readme.md +++ /dev/null @@ -1,110 +0,0 @@ -> [!IMPORTANT] -> (12/19/24) Hello! MarkItDown team members will be resting and recharging with family and friends over the holiday period. Activity/responses on the project may be delayed during the period of Dec 21-Jan 06. We will be excited to engage with you in the new year! - -# MarkItDown - -[![PyPI](https://img.shields.io/pypi/v/markitdown.svg)](https://pypi.org/project/markitdown/) -![PyPI - Downloads](https://img.shields.io/pypi/dd/markitdown) -[![Built by AutoGen Team](https://img.shields.io/badge/Built%20by-AutoGen%20Team-blue)](https://github.com/microsoft/autogen) - - -MarkItDown is a utility for converting various files to Markdown (e.g., for indexing, text analysis, etc). -It supports: -- PDF -- PowerPoint -- Word -- Excel -- Images (EXIF metadata and OCR) -- Audio (EXIF metadata and speech transcription) -- HTML -- Text-based formats (CSV, JSON, XML) -- ZIP files (iterates over contents) - -To install MarkItDown, use pip: `pip install markitdown`. Alternatively, you can install it from the source: `pip install -e .` - -## Usage - -### Command-Line - -```bash -markitdown path-to-file.pdf > document.md -``` - -Or use `-o` to specify the output file: - -```bash -markitdown path-to-file.pdf -o document.md -``` - -You can also pipe content: - -```bash -cat path-to-file.pdf | markitdown -``` - -### Python API - -Basic usage in Python: - -```python -from markitdown import MarkItDown - -md = MarkItDown() -result = md.convert("test.xlsx") -print(result.text_content) -``` - -To use Large Language Models for image descriptions, provide `llm_client` and `llm_model`: - -```python -from markitdown import MarkItDown -from openai import OpenAI - -client = OpenAI() -md = MarkItDown(llm_client=client, llm_model="gpt-4o") -result = md.convert("example.jpg") -print(result.text_content) -``` - -### Docker - -```sh -docker build -t markitdown:latest . -docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md -``` -
- -Batch Processing Multiple Files - -This example shows how to convert multiple files to markdown format in a single run. The script processes all supported files in a directory and creates corresponding markdown files. - - -```python convert.py -from markitdown import MarkItDown -from openai import OpenAI -import os -client = OpenAI(api_key="your-api-key-here") -md = MarkItDown(llm_client=client, llm_model="gpt-4o-2024-11-20") -supported_extensions = ('.pptx', '.docx', '.pdf', '.jpg', '.jpeg', '.png') -files_to_convert = [f for f in os.listdir('.') if f.lower().endswith(supported_extensions)] -for file in files_to_convert: - print(f"\nConverting {file}...") - try: - md_file = os.path.splitext(file)[0] + '.md' - result = md.convert(file) - with open(md_file, 'w') as f: - f.write(result.text_content) - - print(f"Successfully converted {file} to {md_file}") - except Exception as e: - print(f"Error converting {file}: {str(e)}") - -print("\nAll conversions completed!") -``` -2. Place the script in the same directory as your files -3. Install required packages: like openai -4. Run script ```bash python convert.py ``` - -Note that original files will remain unchanged and new markdown files are created with the same base name. - -