开放文档格式 (ODT)
The Open Document Format for Office Applications (ODF), 也被称为
OpenDocument,是一种使用 ZIP 压缩 XML 文件的开放文件格式,用于文字处理文档、电子表格、演示文稿和图形。它旨在提供一种面向办公室应用程序的基于 XML 的开放文件格式规范。
The standard is developed and maintained by a technical committee in the Organization for the Advancement of Structured Information Standards (
OASIS) consortium. It was based on the Sun Microsystems specification for OpenOffice.org XML, the default format forOpenOffice.organdLibreOffice. It was originally developed forStarOffice"to provide an open standard for office documents."
的 UnstructuredODTLoader 用于加载 Open Office ODT 文件。
from langchain_community.document_loaders import UnstructuredODTLoader
loader = UnstructuredODTLoader("example_data/fake.odt", mode="elements")
docs = loader.load()
docs[0]
API 参考:UnstructuredODTLoader
Document(page_content='Lorem ipsum dolor sit amet.', metadata={'source': 'example_data/fake.odt', 'category_depth': 0, 'file_directory': 'example_data', 'filename': 'fake.odt', 'last_modified': '2023-12-19T13:42:18', 'languages': ['por', 'cat'], 'filetype': 'application/vnd.oasis.opendocument.text', 'category': 'Title'})