使用 OpenAI Whisper API 進行語音轉文字，方便字幕後製或內容整理

Wei J Liu

2025/09/20 更新2024/02/13 發佈閱讀 7 分鐘

更新版

更穩定的版本
https://vocus.cc/article/68ced4a2fd89780001fc9266

前提

註冊 OpenAI 的 API ，並取得 SECRET KEY，然後填到程式裡面的 openai.api_key 裡。

基本的 Python 知識以及 Debug 能力。

程式碼， Python實現

import openai
from pydub import AudioSegment
import os
import codecs
import tempfile

# Set your OpenAI API key here
openai.api_key = 'your_openai_api_key'

def transcribe_audio_with_whisper(audio_file_path):
    """
    Transcribe an audio file using OpenAI's Whisper API.

    Args:
    - audio_file_path: Path to the audio file to transcribe.

    Returns:
    - The transcribed text as a string.
    """
    with open(audio_file_path, "rb") as audio_file:
        response = openai.Audio.transcribe('whisper-1', audio_file)
        return response['data']['text']

def split_and_transcribe_audio(file_path, segment_length_seconds=30):
    try:
        song = AudioSegment.from_file(file_path)
    except Exception as e:
        raise Exception(f"Error loading audio file: {e}")

    segment_length_ms = segment_length_seconds * 1000  # Correct calculation of milliseconds
    transcripts = []

    with tempfile.TemporaryDirectory() as temp_dir:
        for i, segment in enumerate([song[i:i+segment_length_ms] for i in range(0, len(song), segment_length_ms)]):
            segment_file_path = os.path.join(temp_dir, f"segment_{i}.mp3")
            segment.export(segment_file_path, format="mp3")
            
            transcript = transcribe_audio_with_whisper(segment_file_path)
            time_in_seconds = i * segment_length_seconds
            timestamp = f"[{time_in_seconds // 60:02d}:{time_in_seconds % 60:02d}]"
            transcripts.append(timestamp + " " + transcript)

    output_file_name = os.path.splitext(os.path.basename(file_path))[0] + '.txt'
    with codecs.open(output_file_name, 'w', encoding='utf-8') as f:  # Using UTF-8 encoding
        f.write("\n".join(transcripts))

# Example usage
split_and_transcribe_audio("test.mp3")

解釋

設置OpenAI API SECRET：需要在程式中設定你的OpenAI API鑰匙，以便使用Whisper API。
transcribe_audio_with_whisper 函數：
- 功能：使用 OpenAI 的 Whisper API 轉寫給定的音訊檔案。
- 參數：接受一個參數 audio_file_path，即需要轉寫的音訊檔案路徑。
- 返回值：返回轉寫後的文字。
- 實現方式：通過讀取音訊檔案並使用 openai.Audio.transcribe 方法來獲得轉寫結果。
split_and_transcribe_audio 函數：
- 功能：將長音訊檔案分割成較小的片段（預設為30秒長），然後使用Whisper API轉寫每個片段。
- 參數：file_path：長音訊檔案的路徑。segment_length_seconds：每個音訊片段的時長（秒），默認為30秒。
- 過程：使用 AudioSegment.from_file 加載音訊檔案。根據指定的片段長度（毫秒）將音訊分割成多個片段。為每個片段創建一個臨時文件，然後將其導出為MP3格式。對每個片段使用 transcribe_audio_with_whisper 函數進行轉寫。將轉寫結果和對應的時間戳添加到轉寫列表中。
- 輸出：將所有轉寫結果連同時間戳寫入到一個以原音訊檔案名命名的純文字文件中（換成 .txt）。

範例用法：程式最後展示了如何使用 split_and_transcribe_audio 函數來轉寫名為 "test.mp3" 的音訊檔案。

留言

Wei 的工程師聊什麼

4會員

12內容數

你可能也想看

無限智慧學院的沙龍

一起探索文生語音的奧術，OpenVoice 開源MyShell.ai後臺模型

要如何做到無須任何額外訓練樣本就能做到"跨語言"的語音生成，這聽起來很不可思議對吧? 但這就是本篇論文取得的成就，不僅如此，該有的功能，如調整情感，口音節奏，停頓語調這些功能也不在話下。跟著我一起用探秘還有獨立思考的眼光來分析這篇論文，這會是很有趣的旅程。

#AI論文詳解#OpenVoice#MyshellAI

2024/01/19

無限智慧學院的沙龍

一起探索文生語音的奧術，OpenVoice 開源MyShell.ai後臺模型

#AI論文詳解#OpenVoice#MyshellAI

2024/01/19

釀電影，啜一口電影的美好。

《傳奇：帕拉贊諾夫的十段殘篇》：流動、跨域、變形的「生存之道」

當代名導基里爾．賽勒布倫尼科夫身兼電影、劇場與歌劇導演，其作品流動著強烈的反叛與詩意。在俄烏戰爭爆發後，他持續以創作回應專制體制的壓迫。《傳奇：帕拉贊諾夫的十段殘篇》致敬蘇聯電影大師帕拉贊諾夫。本文作者透過媒介本質的分析，解構賽勒布倫尼科夫如何利用影劇雙棲的特質，在荒謬世道中尋找藝術的「生存之道」。

#釀電影#釀評論#藝術評論

2026/02/28