Skip to main content
Open In ColabOpen on GitHub

开放文档格式 (ODT)

The Open Document Format for Office Applications (ODF), 也被称为OpenDocument,是一种使用 ZIP 压缩 XML 文件的开放文件格式,用于文字处理文档、电子表格、演示文稿和图形。它旨在提供一种面向办公室应用程序的基于 XML 的开放文件格式规范。

The standard is developed and maintained by a technical committee in the Organization for the Advancement of Structured Information Standards (OASIS) consortium. It was based on the Sun Microsystems specification for OpenOffice.org XML, the default format for OpenOffice.org and LibreOffice. It was originally developed for StarOffice "to provide an open standard for office documents."

UnstructuredODTLoader 用于加载 Open Office ODT 文件。

from langchain_community.document_loaders import UnstructuredODTLoader

loader = UnstructuredODTLoader("example_data/fake.odt", mode="elements")
docs = loader.load()
docs[0]
Document(page_content='Lorem ipsum dolor sit amet.', metadata={'source': 'example_data/fake.odt', 'category_depth': 0, 'file_directory': 'example_data', 'filename': 'fake.odt', 'last_modified': '2023-12-19T13:42:18', 'languages': ['por', 'cat'], 'filetype': 'application/vnd.oasis.opendocument.text', 'category': 'Title'})