Skip to content

lihao0214/Speaker-Recognition-GMM-UBM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Speaker-Recognition-GMM-UBM

GMM-UBM speaker model    HTK    MSR-Identity-Toolkit    Matlab operations    Matlab Functions Used    Programs.md

Contents

Transform WAV To Readable file

  • Download TIMIT corpus.

  • Create a file (copy TIMIT's two files in it:(1)TEST (2)TRAIN)which has the same directory form as TIMIT corpus: main.py

    import os, sys
    
    # path1 = 'D:/Project/Speaker-Recognition/timit_wav8000'
    # path1 = 'D:/Project/Speaker-Recognition/timit_mfcc'
    path1 = 'D:/Project/Speaker-Recognition/timit_wav16000'
    print(f'the path to be deleted is {path1}')
    files1 = os.listdir(path1)
    count = 0
    for file1 in files1:
        path2 = path1 + '/' + file1
        files2 = os.listdir(path2)
    
        for file2 in files2:
            path3 = path2 + '/' + file2
            files3 = os.listdir(path3)
            for file3 in files3:
                path4 = path3 + '/' + file3
                files4 = os.listdir(path4)
                for file4 in files4:
                    path5 = path4 + '/' + file4
                    # print(path5)
                    count += 1
                    os.unlink(path5)
    print(count)
    print('complete delete')
  • Transform WAV to readable files and save in the above file:

    conver_wav.m(入口函数)

    path1 = 'D:\Project\Speaker-Recognition\TIMIT';
    path2 = 'D:\Project\Speaker-Recognition\timit_wav16000';
    len1 = length(path1); % 36
    wav_filesName = find_wav(path1);
    % wav_filesName(1,:) is 'D:\Project\Speaker-Recognition\TIMIT\TEST\DR1\FAKS0\SA1.WAV'
    % p1 = strfind(wav_filesName(1,:),'.W') % 56
    % wav_filesName(1,37:56+3) % is '\TEST\DR1\FAKS0\SA1.WAV'
    [m,n] = size(wav_filesName);
    
    % 保存wav文件名,添加人名前缀
    tic
    hwt = waitbar(0,'please wait....');
    for i = 1:m
        [x,fs] = audioread(wav_filesName(i,:));
        p = strfind(wav_filesName(i,:),'.W'); 
        writefileName = [path2,wav_filesName(i,len1+1:p+3)];
    	audiowrite(writefileName,x,fs);
        % y = resample(x,1,2);
        % p = strfind(wav_filesName(i,:),'.W'); 
        % writefileName = [path2,wav_filesName(i,len1+1:p+3)];
        % audiowrite(writefileName,y,8000);
        waitbar(i/m);
    end
    close(hwt);
    toc
  • find_wav.m

    function [ wav_files ] = find_wav( path )
      %FIND_WAV, find all wav file recursively
      wav_files = [];
      if(isdir(path) == 0) % is path is not a folder, is a file, then return 
          return;
      end
      path_files = dir(path); % path is the folder content of path
      fileNum = length(path_files);
      for k= 3:fileNum
          file = [path,'\', path_files(k).name];
    %       disp(sprintf('file is %s',file));
        if (path_files(k).isdir == 1)
            ret = find_wav(file); % if path_files is a folder, do the find_wav function to the input(file)
    %         disp(sprintf('ret is %s',ret));
            if(isempty(ret) ~= 1)
    %             disp(sprintf('ret is %s',ret));
                if(isempty(wav_files))
    %                 disp(sprintf('ret is %s',ret));
                    wav_files = char(ret);
    %                 disp(sprintf('wav_files is %s',wav_files));
                else
                    wav_files = char(wav_files, ret);
                end
            end
        elseif strfind(path_files(k).name, '.WAV')
            if(isempty(wav_files))
                wav_files = char(file);
            else
                wav_files = char(wav_files, file);
            end
        end
      end
    end
  • check_wav.m

    %clear all;
    %files = find_wav('D:\D\Speaker-Recognition\timit_wav8000');
    path2 = 'D:\Project\Speaker-Recognition\timit_wav8000';
    files = find_wav(path2);
    tic
    hwt = waitbar(0,'please wait....');
    for fileIdx = 1:length(files)
        file = files(fileIdx,:);
        [y, fs] = audioread(file);
        if(fs~=8000)
            fprintf('%s: fs~=8000\n', file);
        end
        waitbar(fileIdx/length(files));
    end
    close(hwt);
    toc

Extract MFCC Features From Readable File

  • Download HTK ToolKit and set up its environment.

  • find_wav.m

  • wav_mfcc_path.m(入口函数)生成targetlist.txt文件

    path1 = 'D:\Project\Speaker-Recognition\timit_wav8000';
    path2 = 'D:\Project\Speaker-Recognition\timit_mfcc';
    filename = 'D:\Project\Speaker-Recognition\Program-Code\03-extract-mfcc-from-wav\targetlist.txt';
    wav_filesName = find_wav(path1);
    [m,n] = size(wav_filesName);
    len1 = length(path1);
    fid = fopen(filename, 'w+t');
    
    if fid < 0
        fprintf('error opening file\n');
        return;
    end
    
    tic
    hwt = waitbar(0,'please wait....');
    count = 0;
    for i = 1:m
        p = strfind(wav_filesName(i,:),'.W'); 
        wav_filename = wav_filesName(i,:);
        mfcc_filename = [path2, wav_filesName(i,len1+1:p), 'mfcc'];
        one_line_txt = [wav_filename,' ',mfcc_filename];
        fprintf(fid,'%s\n',one_line_txt);
        count = count + 1;
        waitbar(i/m);
    end
    fprintf('count is %g\n', count);
    close(hwt);
    toc
  • 制作analysis.conf文件

  • 将targetlist.txt和analysis.conf这两个文件都放在XXX\htk\bin.win32目录下

  • 执行如下cmd命令,会在指定目录生成mfcc文件:

    Hcopy -A -D -C  analysis.conf -S targetlist.txt

Generate 3 Important File

  • ubm.lst用于训练UBM

    里面包含5300个speech的mfcc的文件路径

    D:\Project\Speaker-Recognition\timit_mfcc\TEST\DR1\FAKS0\SA1.mfcc    
    D:\Project\Speaker-Recognition\timit_mfcc\TEST\DR1\FAKS0\SA2.mfcc    
    D:\Project\Speaker-Recognition\timit_mfcc\TEST\DR1\FAKS0\SI1573.mfcc 
    D:\Project\Speaker-Recognition\timit_mfcc\TEST\DR1\FAKS0\SI2203.mfcc 
    D:\Project\Speaker-Recognition\timit_mfcc\TEST\DR1\FAKS0\SI943.mfcc  
    ...
    共5300行
    
  • speaker_model_maps.lst用于适应训练人

    一共630人,已经挑走530人作UBM,还有100人,每个人都有10句话。这里就是这个100人的前9句话的mfcc文件的人名 文件路径

    MTXS0 D:\Project\Speaker-Recognition\timit_mfcc\TRAIN\DR6\MTXS0\SA1.mfcc   
    MTXS0 D:\Project\Speaker-Recognition\timit_mfcc\TRAIN\DR6\MTXS0\SA2.mfcc   
    MTXS0 D:\Project\Speaker-Recognition\timit_mfcc\TRAIN\DR6\MTXS0\SI1060.mfcc
    MTXS0 D:\Project\Speaker-Recognition\timit_mfcc\TRAIN\DR6\MTXS0\SI1690.mfcc
    ...
    共900行
    
  • trials.lst用于测试

    剩余的100人,每个人都跟其他99人形成target和impostor的关系,所以从1人到第100个人做100论。一共10000行,对应所有的测试语句。

    格式为人名 特征路径名 标签

    MTXS0  D:\Project\Speaker-Recognition\timit_mfcc\TRAIN\DR6\MTXS0\SX70.mfcc  target
    MTXS0  D:\Project\Speaker-Recognition\timit_mfcc\TRAIN\DR7\FBLV0\SX68.mfcc  impostor
    MTXS0  D:\Project\Speaker-Recognition\timit_mfcc\TRAIN\DR7\FCJS0\SX77.mfcc  impostor
    MTXS0  D:\Project\Speaker-Recognition\timit_mfcc\TRAIN\DR7\FCRZ0\SX73.mfcc  impostor
    ...
    第100 行 ... impostor
    第101 行 ... target
    第201 行 ... target
    ...
    第10000行 ... impostor
    

Use MSR ToolKit To Analyze MFCC Features

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages