This repository contains codes to reproduce the core results from our Interspeech 2019 paper: ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks
If you find the code useful, please cite
@article{lai2019assert,
title={ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks},
author={Lai, Cheng-I and Chen, Nanxin and Villalba, Jes{\'u}s and Dehak, Najim},
journal={arXiv preprint arXiv:1904.01120},
year={2019}
}
We present JHU's system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT). Anti-spoofing has gathered more and more attention since the inauguration of the ASVspoof Challenges, and ASVspoof 2019 dedicates to address attacks from all three major types: text-to-speech, voice conversion, and replay. Built upon previous research work on Deep Neural Network (DNN), ASSERT is a pipeline for DNN-based approach to anti-spoofing. ASSERT has four components: feature engineering, DNN models, network optimization and system combination, where the DNN models are variants of squeeze-excitation and residual networks. We conducted an ablation study of the effectiveness of each component on the ASVspoof 2019 corpus, and experimental results showed that ASSERT obtained more than 93% and 17% relative improvements over the baseline systems in the two sub-challenges in ASVspooof 2019, ranking ASSERT one of the top performing systems.
Note: Evaluation key is not released yet, so we only present dev results below.
Model | PA dev min-tDCF | PA dev EER (%) | LA dev min-tDCF | LA dev EER (%) |
---|---|---|---|---|
SENet34 | 0.01514 | 0.5751 | 0.0 | 0.0 |
SENet50 | 0.01709 | 0.6317 | 0.0 | 0.0 |
Attentive Filtering Network | 0.02096 | 0.7407 | 0.0 | 0.0 |
Dilated ResNet | 0.02377 | 0.7798 | 0.0 | 0.0 |
Mean-Std ResNet | 0.022 | 0.832 | 0.0 | 0.0 |
CQCC-GMM | 0.195 | 9.87 | 0.012 | 0.43 |
LFCC-GMM | 0.255 | 11.96 | 0.066 | 2.71 |
100-i-vectors | 0.306 | 12.37 | 0.155 | 5.18 |
200-i-vectors | 0.322 | 12.52 | 0.121 | 4.12 |
We included the pretrained model weights. We do not plan to release the pretrained models for Mean-Std ResNets.
- SENet34:
./pretrained/pa/senet34
and./pretrained/la/senet34
- SENet50:
./pretrained/pa/senet50
and./pretrained/la/senet50
- Attentive Filtering Network:
./pretrained/pa/attentive_filtering_network
and./pretrained/la/attentive_filtering_network
- Dilated ResNet:
./pretrained/pa/dilated_resnet
and./pretrained/la/dilated_resnet
This project uses Python 2.7. Before running the code, you have to install
The former 2 dependencies can be installed using pip by running
pip install -r requirements.txt
./assert/
contains the main source codes, ./baseline/
contains the code for CQCC-GMM, LFCC-GMM and i-vectors, and ./features/
contains acoustic feature extraction codes.
Make sure to read through the ASVspoof 2019 webpage and download the dataset.
./baseline/
contains code for the CQCC-GMM, LFCC-GMM, and i-vectors baselines. ./baseline/baseline_CM.m
is the official MATLAB script for CQCC-GMM and LFCC-GMM. Make sure to organize the dataset according to the comments within the script. To train the baselines, do
./baseline_CM.sh
./features/
contains code for acoustic feature extraction. Follow ./features/run_feature.sh
stage by stage for acoustic feature extraction. Be sure to set up Kaldi and modify ./features/path.sh
to point to your own Kaldi directory. To extract features, do
./run_feature.sh.sh
./assert/
contains the code for ASSERT. ./assert/main.py
contains the model training script and model hyperparameters, and ./assert/src/
contains model implementations. To train the models, do
python main.py
To trian the model on GPU (recommended), do
CUDA_VISIBLE_DEVICES=`free-gpu` python main.py
Experiments will be saved automatically by Sacred to ./assert/snapshots/
Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak
If you encouter any problem, feel free to contact me.