HackerNews中文版

您好，我正在制作一个工具，需要分析两个人之间的对话（非英语）。对话以音频格式提供给我。目前，我使用OpenAI Whisper进行转录，并通过API将转录内容提供给ChatGPT-4o模型进行分析。到目前为止，它做得还不错。但有时，阅读转录文本时，我很难分辨出哪个说话人在说什么。我不得不听音频才能弄清楚。我想知道ChatGPT-4o是否有时也会觉得难以从转录文本中理解对话。我认为添加一个说话人分割步骤可能会使转录文本更容易理解和分析。我正在寻找可以使用的说话人分割工具。我尝试过使用pyannote speaker-diarization-3.1，但我发现它效果不太好。还有哪些其他选项可以考虑？

查看原文

Hi,<p>I am making a tool that needs to analyze a conversation (non-English) between two people. The conversation is provided to me in audio format. I am currently using OpenAI Whisper to transcribe and feed the transcription to ChatGPT-4o model through the API for analysis.<p>So far, it's doing a fair job. Sometimes, though, reading the transcription, I find it hard to figure out which speaker is speaking what. I have to listen to the audio to figure it out. I am wondering if ChatGPT-4o would also sometimes find it hard to follow the conversation from the transcription. I think that adding a speaker diarization step might make the transcription easier to understand and analyze.<p>I am looking for Speaker Diarization tools that I can use. I have tried using pyannote speaker-diarization-3.1, but I find it does not work very well. What are some other options that I can look at?

问 HN：我应该考虑哪些说话人分割工具？