Updated DocumentConverter documentation.

This commit is contained in:
Adam Fourney 2025-03-05 15:12:13 -08:00
parent 1eb8b927c2
commit fe1d57a06f

View file

@ -86,7 +86,7 @@ class DocumentConverter:
""" """
Return a quick determination on if the converter should attempt converting the document. Return a quick determination on if the converter should attempt converting the document.
This is primarily based `stream_info` (typically, `stream_info.mimetype`, `stream_info.extension`). This is primarily based `stream_info` (typically, `stream_info.mimetype`, `stream_info.extension`).
In cases where the data is retreived via HTTP, the `steam_info.url` might also be referenced to In cases where the data is retrieved via HTTP, the `steam_info.url` might also be referenced to
make a determination (e.g., special converters for Wikipedia, YouTube etc). make a determination (e.g., special converters for Wikipedia, YouTube etc).
Finally, it is conceivable that the `stream_info.filename` might be used to in cases Finally, it is conceivable that the `stream_info.filename` might be used to in cases
where the filename is well-known (e.g., `Dockerfile`, `Makefile`, etc) where the filename is well-known (e.g., `Dockerfile`, `Makefile`, etc)
@ -94,8 +94,15 @@ class DocumentConverter:
NOTE: The method signature is designed to match that of the convert() method. This provides some NOTE: The method signature is designed to match that of the convert() method. This provides some
assurance that, if accepts() returns True, the convert() method will also be able to handle the document. assurance that, if accepts() returns True, the convert() method will also be able to handle the document.
IMPORTANT: If this method advances the position in file_stream, it must also reset the position before IMPORTANT: In rare cases, (e.g., OutlookMsgConverter) we need to read more from the stream to make a final
returning. This is because the convert() method may be called immediately after accepts(). determination. Read operations inevitably advances the position in file_stream. In these case, the position
MUST be reset it MUST be reset before returning. This is because the convert() method may be called immediately
after accepts(), and will expect the file_stream to be at the original position.
E.g.,
cur_pos = file_stream.tell() # Save the current position
data = file_stream.read(100) # ... peek at the first 100 bytes, etc.
file_stream.seek(cur_pos) # Reset the position to the original position
Prameters: Prameters:
- file_stream: The file-like object to convert. Must support seek(), tell(), and read() methods. - file_stream: The file-like object to convert. Must support seek(), tell(), and read() methods.