Merge branch 'main' into main

2025-02-08 20:33:15 -08:00 · 2025-02-08 20:33:15 -08:00 · 959ea53f96
commit 959ea53f96
parent 7a3e9223ca 3090917a49
2 changed files with 22 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -33,12 +33,20 @@ Or use `-o` to specify the output file:
 markitdown path-to-file.pdf -o document.md
 ```
 To use Document Intelligence conversion:
 ```bash
 markitdown path-to-file.pdf -o document.md -d -e "<document_intelligence_endpoint>"
 ```
 You can also pipe content:
 ```bash
 cat path-to-file.pdf | markitdown
 ```
 More information about how to set up an Azure Document Intelligence Resource can be found [here](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/how-to-guides/create-document-intelligence-resource?view=doc-intel-4.0.0)
 ### Python API
 Basic usage in Python:
@ -51,6 +59,16 @@ result = md.convert("test.xlsx")
 print(result.text_content)
 ```
 Document Intelligence conversion in Python:
 ```python
 from markitdown import MarkItDown
 md = MarkItDown(docintel_endpoint="<document_intelligence_endpoint>")
 result = md.convert("test.pdf")
 print(result.text_content)
 ```
 To use Large Language Models for image descriptions, provide `llm_client` and `llm_model`:
 ```python
--- a/src/markitdown/_markitdown.py
+++ b/src/markitdown/_markitdown.py
@ -217,7 +217,7 @@ class HtmlConverter(DocumentConverter):
        return result
    def _convert(self, html_content: str) -> Union[None, DocumentConverterResult]:
-        """Helper function that converts and HTML string."""
+        """Helper function that converts an HTML string."""
        # Parse the string
        soup = BeautifulSoup(html_content, "html.parser")
@ -236,6 +236,9 @@ class HtmlConverter(DocumentConverter):
        assert isinstance(webpage_text, str)
        # remove leading and trailing \n
        webpage_text = webpage_text.strip()
        return DocumentConverterResult(
            title=None if soup.title is None else soup.title.string,
            text_content=webpage_text,