Google Speech-to-Text Audio 成绩单

这SpeechToTextLoader允许使用 Google Cloud Speech-to-Text API 转录音频文件并将转录的文本加载到文档中。

要使用它，您应该拥有google-cloud-speechpython 软件包，以及启用了 Speech-to-Text API 的 Google Cloud 项目。

将大型模型的强大功能引入 Google Cloud 的 Speech API

安装和设置

首先，您需要安装google-cloud-speechpython 软件包。

您可以在 Speech-to-Text 客户端库页面上找到有关它的更多信息。

按照 Google Cloud 文档中的快速入门指南创建项目并启用 API。

%pip install --upgrade --quiet langchain-google-community[speech]

例

这SpeechToTextLoader必须包含project_id和file_path参数。音频文件可以指定为 Google Cloud Storage URI （gs://...）或本地文件路径。

加载程序仅支持同步请求，每个音频文件的限制为 60 秒或 10MB。

from langchain_google_community import SpeechToTextLoader

project_id = "<PROJECT_ID>"
file_path = "gs://cloud-samples-data/speech/audio.flac"
# or a local file path: file_path = "./audio.wav"

loader = SpeechToTextLoader(project_id=project_id, file_path=file_path)

docs = loader.load()

API 参考：SpeechToTextLoader

注意：调用loader.load()阻塞，直到转录完成。

转录的文本位于page_content:

docs[0].page_content

"How old is the Brooklyn Bridge?"

这metadata包含包含更多元信息的完整 JSON 响应：

docs[0].metadata

{
  'language_code': 'en-US',
  'result_end_offset': datetime.timedelta(seconds=1)
}

识别配置

您可以指定config参数来使用不同的语音识别模型并启用特定功能。

请参阅 Speech-to-Text recognizers 文档和RecognizeRequest有关如何设置自定义配置的信息的 API 参考。

如果未指定config，将自动选择以下选项：

型号：Chirp Universal Speech Model
语言：en-US
音频编码：自动检测
自动标点符号：已启用

from google.cloud.speech_v2 import (
    AutoDetectDecodingConfig,
    RecognitionConfig,
    RecognitionFeatures,
)
from langchain_google_community import SpeechToTextLoader

project_id = "<PROJECT_ID>"
location = "global"
recognizer_id = "<RECOGNIZER_ID>"
file_path = "./audio.wav"

config = RecognitionConfig(
    auto_decoding_config=AutoDetectDecodingConfig(),
    language_codes=["en-US"],
    model="long",
    features=RecognitionFeatures(
        enable_automatic_punctuation=False,
        profanity_filter=True,
        enable_spoken_punctuation=True,
        enable_spoken_emojis=True,
    ),
)

loader = SpeechToTextLoader(
    project_id=project_id,
    location=location,
    recognizer_id=recognizer_id,
    file_path=file_path,
    config=config,
)

API 参考：SpeechToTextLoader

Document loader 概念指南
Document loader 操作指南

安装和设置

例

识别配置

相关