Updated DocumentConverter documentation.
This commit is contained in:
parent
1eb8b927c2
commit
fe1d57a06f
1 changed files with 10 additions and 3 deletions
|
|
@ -86,7 +86,7 @@ class DocumentConverter:
|
||||||
"""
|
"""
|
||||||
Return a quick determination on if the converter should attempt converting the document.
|
Return a quick determination on if the converter should attempt converting the document.
|
||||||
This is primarily based `stream_info` (typically, `stream_info.mimetype`, `stream_info.extension`).
|
This is primarily based `stream_info` (typically, `stream_info.mimetype`, `stream_info.extension`).
|
||||||
In cases where the data is retreived via HTTP, the `steam_info.url` might also be referenced to
|
In cases where the data is retrieved via HTTP, the `steam_info.url` might also be referenced to
|
||||||
make a determination (e.g., special converters for Wikipedia, YouTube etc).
|
make a determination (e.g., special converters for Wikipedia, YouTube etc).
|
||||||
Finally, it is conceivable that the `stream_info.filename` might be used to in cases
|
Finally, it is conceivable that the `stream_info.filename` might be used to in cases
|
||||||
where the filename is well-known (e.g., `Dockerfile`, `Makefile`, etc)
|
where the filename is well-known (e.g., `Dockerfile`, `Makefile`, etc)
|
||||||
|
|
@ -94,8 +94,15 @@ class DocumentConverter:
|
||||||
NOTE: The method signature is designed to match that of the convert() method. This provides some
|
NOTE: The method signature is designed to match that of the convert() method. This provides some
|
||||||
assurance that, if accepts() returns True, the convert() method will also be able to handle the document.
|
assurance that, if accepts() returns True, the convert() method will also be able to handle the document.
|
||||||
|
|
||||||
IMPORTANT: If this method advances the position in file_stream, it must also reset the position before
|
IMPORTANT: In rare cases, (e.g., OutlookMsgConverter) we need to read more from the stream to make a final
|
||||||
returning. This is because the convert() method may be called immediately after accepts().
|
determination. Read operations inevitably advances the position in file_stream. In these case, the position
|
||||||
|
MUST be reset it MUST be reset before returning. This is because the convert() method may be called immediately
|
||||||
|
after accepts(), and will expect the file_stream to be at the original position.
|
||||||
|
|
||||||
|
E.g.,
|
||||||
|
cur_pos = file_stream.tell() # Save the current position
|
||||||
|
data = file_stream.read(100) # ... peek at the first 100 bytes, etc.
|
||||||
|
file_stream.seek(cur_pos) # Reset the position to the original position
|
||||||
|
|
||||||
Prameters:
|
Prameters:
|
||||||
- file_stream: The file-like object to convert. Must support seek(), tell(), and read() methods.
|
- file_stream: The file-like object to convert. Must support seek(), tell(), and read() methods.
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue