Releases: desh2608/spyder
Releases · desh2608/spyder
UEM and collar
New Features
UEM files
import spyder
# reference (ground truth)
ref = [("A", 0.0, 2.0), # (speaker, start, end)
("B", 1.5, 3.5),
("A", 4.0, 5.1)]
# hypothesis (diarization result from your algorithm)
hyp = [("1", 0.0, 0.8),
("2", 0.6, 2.3),
("3", 2.1, 3.9),
("1", 3.8, 5.2)]
uem = [(0.5, 5.0)]
# compute DER on full recording
print(spyder.DER(ref, hyp))
# DERMetrics(duration=5.10,miss=9.80%,falarm=21.57%,conf=25.49%,der=56.86%)
# compute DER using UEM segments
print(spyder.DER(ref, hyp, uem=uem))
# DERMetrics(duration=4.50,miss=11.11%,falarm=22.22%,conf=26.67%,der=60.00%)
From the CLI, UEM files can be passed using the -u
or --uem
option.
Collar
# compute DER using collar
print(spyder.DER(ref, hyp, collar=0.2))
# DERMetrics(duration=3.10,miss=3.23%,falarm=12.90%,conf=19.35%,der=35.48%)
From the CLI, use -c
or --collar
to score with a collar.
Speaker mapping
The returned DER now also includes reference and hypothesis speaker maps.
# get speaker mapping between reference and hypothesis
metrics = spyder.DER(ref, hyp)
print(f"Reference speaker map: {metrics.ref_map}")
print(f"Hypothesis speaker map: {metrics.hyp_map}")
# Reference speaker map: {'A': '0', 'B': '1'}
# Hypothesis speaker map: {'1': '0', '2': '2', '3': '1'}
Unit Tests
We have added basic unit testing with pytest
. Check the tests/
directory for examples. These are based on the dscore tool.
What's Changed
- Fix bug when compute region is empty by @desh2608 in #7
- Fixed evaluation of overlap regions by @desh2608 in #8
- feat: support compute DER for multiple pairs of ref and hyp by @LingweiMeng in #10
- Fix code style by @desh2608 in #12
- Support UEM and collar in DER computation by @desh2608 in #13
- Return speaker mapping in metrics dict by @desh2608 in #14
- Add unit tests for DER by @desh2608 in #15
- Fix Github actions for unit tests by @desh2608 in #16
- Fix unit tests by @desh2608 in #17
New Contributors
- @LingweiMeng made their first contribution in #10
Full Changelog: v0.2.0...v0.4.0
Small changes
This release contains tabulate
for pretty printing of the output, and bug fix to handle the case when there are overlapping speaker turns of the same speaker.
First release
This is the first release of Spyder.
Features
- Fast DER computation from Python code. Example usage:
import spyder
# reference (ground truth)
ref = [("A", 0.0, 2.0), # (speaker, start, end)
("B", 1.5, 3.5),
("A", 4.0, 5.1)]
# hypothesis (diarization result from your algorithm)
hyp = [("1", 0.0, 0.8),
("2", 0.6, 2.3),
("3", 2.1, 3.9),
("1", 3.8, 5.2)]
metrics = spyder.DER(ref, hyp)
print(metrics)
# DERMetrics(miss=0.098,falarm=0.216,conf=0.255,der=0.569)
print (f"{metrics.miss:.3f}, {metrics.falarm:.3f}, {metrics.conf:3f}, {metrics.der:.3f}")
# 0.098, 0.216, 0.254, 0.569
- CLI interface to compute DER between RTTM files. Example:
> spyder ref_rttm hyp_rttm
Average error rates:
----------------------------------------------------
Missed speaker time = 11.48
False alarm speaker time = 2.27
Speaker error time = 9.81
Diarization error rate (DER) = 23.56
- Support for computing per-file DER from CLI using the
--per-file
flag.
Speed benchmark
We have done some basic speed comparisons in this blog post using the AMI development data as an example. Spyder is:
- 3-5x faster than
md-eval.pl
(when invoked from Python code); - 10x faster than
pyannote.metrics
.