Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ API for speaker diarization #1396

Merged
merged 17 commits into from
Oct 9, 2024

Conversation

csukuangfj
Copy link
Collaborator

No description provided.

@csukuangfj csukuangfj changed the title WIP: C++ API for speaker diarization C++ API for speaker diarization Oct 9, 2024
@csukuangfj
Copy link
Collaborator Author

Usage of speaker diarization

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/speaker-segmentation-models/sherpa-onnx-pyannote-segmentation-3-0.tar.bz2
tar xvf sherpa-onnx-pyannote-segmentation-3-0.tar.bz2
rm sherpa-onnx-pyannote-segmentation-3-0.tar.bz2

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/speaker-recongition-models/3dspeaker_speech_eres2net_base_sv_zh-cn_3dspeaker_16k.onnx

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/speaker-segmentation-models/0-four-speakers-zh.wav

echo "specify number of clusters"

./build/bin/sherpa-onnx-offline-speaker-diarization \
  --clustering.num-clusters=4 \
  --segmentation.pyannote-model=./sherpa-onnx-pyannote-segmentation-3-0/model.onnx \
  --embedding.model=./3dspeaker_speech_eres2net_base_sv_zh-cn_3dspeaker_16k.onnx \
  ./0-four-speakers-zh.wav

echo "specify threshold for clustering""

./build/bin/sherpa-onnx-offline-speaker-diarization \
  --clustering.cluster-threshold=0.90 \
  --segmentation.pyannote-model=./sherpa-onnx-pyannote-segmentation-3-0/model.onnx \
  --embedding.model=./3dspeaker_speech_eres2net_base_sv_zh-cn_3dspeaker_16k.onnx \
  ./0-four-speakers-zh.wav

Sample output

OfflineSpeakerDiarizationConfig(segmentation=OfflineSpeakerSegmentationModelConfig(pyannote=OfflineSpeakerSegmentationPyannoteModelConfig(model="./she
rpa-onnx-pyannote-segmentation-3-0/model.onnx"), num_threads=1, debug=False, provider="cpu"), embedding=SpeakerEmbeddingExtractorConfig(model="./3dsp
eaker_speech_eres2net_base_sv_zh-cn_3dspeaker_16k.onnx", num_threads=1, debug=False, provider="cpu"), clustering=FastClusteringConfig(num_clusters=-1,
 threshold=0.9), min_duration_on=0.3, min_duration_off=0.5)
Started
progress 1.30%
progress 2.60%
progress 3.90%
progress 5.19%
progress 6.49%
progress 7.79%
progress 9.09%
progress 10.39%
progress 11.69%
progress 12.99%
progress 14.29%
progress 15.58%
progress 16.88%
progress 18.18%
progress 19.48%
progress 20.78%
progress 22.08%
progress 23.38%
progress 24.68%
progress 25.97%
progress 27.27%
progress 28.57%
progress 29.87%
progress 31.17%
progress 32.47%
progress 33.77%
progress 35.06%
progress 36.36%
progress 37.66%
progress 38.96%
progress 40.26%
progress 41.56%
progress 42.86%
progress 44.16%
progress 45.45%
progress 46.75%
progress 48.05%
progress 49.35%
progress 50.65%
progress 51.95%
progress 53.25%
progress 54.55%
progress 55.84%
progress 57.14%
progress 58.44%
progress 59.74%
progress 61.04%
progress 62.34%
progress 63.64%
progress 64.94%
progress 66.23%
progress 67.53%
progress 68.83%
progress 70.13%
progress 71.43%
progress 72.73%
progress 74.03%
progress 75.32%
progress 76.62%
progress 77.92%
progress 79.22%
progress 80.52%
progress 81.82%
progress 83.12%
progress 84.42%
progress 85.71%
progress 87.01%
progress 88.31%
progress 89.61%
progress 90.91%
progress 92.21%
progress 93.51%
progress 94.81%
progress 96.10%
progress 97.40%
progress 98.70%
progress 100.00%
0.318 -- 6.865 speaker_00
7.017 -- 10.747 speaker_01
11.455 -- 13.632 speaker_01
13.750 -- 17.041 speaker_02
22.137 -- 24.837 speaker_00
27.638 -- 29.478 speaker_03
30.001 -- 31.553 speaker_03
33.680 -- 37.932 speaker_03
48.040 -- 50.470 speaker_02
52.529 -- 54.605 speaker_00
Duration : 56.861 s
Elapsed seconds: 15.048 s
Real time factor (RTF): 15.048 / 56.861 = 0.265

@csukuangfj csukuangfj merged commit 59407ed into k2-fsa:master Oct 9, 2024
176 of 199 checks passed
@csukuangfj csukuangfj deleted the speaker-diarization-cpp branch October 9, 2024 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant