index.html

<!DOCTYPE html>
<html lang="ja">
<head>
    <meta charset="utf-8">

    <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
    <link href="index_files/css.css" rel="stylesheet" type="text/css">
    <link rel="stylesheet" href="index_files/style.css" media="screen" type="text/css">
    <link rel="stylesheet" href="index_files/print.css" media="print" type="text/css">

<link rel="alternate" hreflang="ja" href="https://shun60s.github.io/">
<meta name="google-site-verification" content="a6JMYTgHEAQ3v6W5SwNo7scWQJf0vLDu-y140tbA9Ac" />


<title>ディープラーニングと　オーディオや音声の信号処理</title>
<meta property="og:title" content="ディープラーニングとオーディオや音声の信号処理">
<meta property="og:locale" content="ja">
<link rel="canonical" href="https://shun60s.github.io/">
<meta property="og:url" content="https://shun60s.github.io/">
<meta property="og:site_name" content="ディープラーニングとオーディオや音声の信号処理">


  </head>

  <body>
    <header>
      <div class="inner">
        <a href="https://github.com/shun60s">
          <h1>deep learning, signal processing</h1>
        </a>        
      </div>
    </header>


<div id="content-wrapper">
<div class="inner clearfix">
<section id="main-content">


<p>ディープラーニングと、オーディオや音声の信号処理のページです。<br>
These are some repositories of deep learning, audio/speech signal processing.</p>


<h2 id="chainer-notch-filter"><a href="https://github.com/shun60s/chainer-notch-filter/">ノッチフィルタのディープラーニング</a></h2>

<p>演算精度が求められるノッチフィルタをディープラーニングで学習できるかどうか試してみた。<br>
A study of design iir notch filter by deep learning framework chainer</p>

<p></p>


<h2 id="fft-wav-upsampling"><a href="https://github.com/shun60s/FFT-Wav-UpSampling2/">サンプリング周波数を2倍に、 44.1kHzを48kHzに</a></h2>

<p>FFT法を使って　音楽などのWAVファイルのサンプリング周波数を2倍にするプログラム。サンプリング周波数44.1kHzを48kHz/96kHzに変換するプログラム。<br>
A converter of audio wav file samplimg rate to 2 times by FFT method</p>

<p></p>


<h2 id="spectrum"><a href="https://shun60s.github.io/Spectrogram_Autoencoder/">メル尺度のスぺクトログラムとオートエンコーダ</a></h2>

<p>メル尺度のスぺクトログラムの作成、オートエンコーダによる事前学習、などの学習練習用。<br>
A practice of making mel spectrogram, CNN autoencode pre-training, and classifier by deep learning</p>

<p></p>


<h2 id="hmm"><a href="https://shun60s.github.io/HMM/">混合分布のHMM</a></h2>

<p>数字の発話のメル尺度のスぺクトログラムを使って、主成分分析により特徴量の次元数を少なくして、混合分布の隠れマルコフモデル（HMM）を使って識別するもの。練習用。<br>
A practice of Hidden Markov Model with Gaussian mixture emissions </p>

<p></p>


<h2 id="dnn"><a href="https://shun60s.github.io/Wave-DNN/">音声信号のDNN </a></h2>

<p>画像認識ではVGG16など事前学習したものを利用できるが、音声認識用途では少ない。 そこで、音声認識エンジンJuliusのディクテーションキットに含まれるDNNを利用するための特徴量FBANK_D_A_Zを計算するpythonを作ってみた。<br>
A python class to get FBANK_D_A_Z from wave file for use julius dictation kit dnn model </p>

<p></p>


<h2 id="dnn-likelihood"><a href="https://shun60s.github.io/Wave-DNN-likelihood/">音声信号のDNN-HMMの対数尤度の計算 </a></h2>

<p>音声認識エンジンJuliusのディクテーションキットに含まれるDNN-HMMモデルを利用して対数尤度を計算するpythonを作ってみた。<br>
A python class to calculate DNN-HMM model Log-likelihood. </p>

<p></p>


<h2 id="formant"><a href="https://shun60s.github.io/Formant/">音声のホルマントの３Ｄ表示 </a></h2>

<p>LPC(線形予測分析)法によるホルマント周波数とピッチ周波数を推定する簡略的なプログラム。<br>
A simple program of estimate formant and pitch frequecny of speech. </p>

<p></p>


<h2 id="vocal-tube-model"><a href="https://shun60s.github.io/Vocal-Tube-Model/">Vocal tract Tube Model </a></h2>
<p>発声の２管声道モデルの周波数特性と生成波形。<br>
a very simple model of vocal tract by two tube. frequecny response and cross-sectional view (area). </p>

<p></p>


<h2 id="vocal-tube-model2"><a href="https://github.com/shun60s/Vocal-Tube-Model2/">Vocal tract Tube Model その2 </a></h2>
<p>５管声道モデルの周波数特性と生成波形。<br>
a very simple model of vocal tract upto five  tube. frequecny response and cross-sectional view (area). </p>

<p></p>

<h2 id="chainer-peak-detect"><a href="https://github.com/shun60s/chainer-peak-detect/">chainerによるピーク検出</a></h2>

<p>ディープラーニングフレームワークのchainerによるピーク位置の推定 <br>
A study of 1D data peak detect by deep learning framework chainer</p>

<p></p>


<h2 id="boston-housing"><a href="https://github.com/shun60s/BostonHousing-GBR-NN/">勾配ブースティング回帰による住宅価格の予測</a></h2>

<p>ボストンハウスのデータ(13種類の指標と住宅価格のデータ）で住宅価格を予測する勾配ブースティング回帰(Gradient Boosting regression)の動作を理解する。ニューラルネットワークで予測した場合と比較する。 <br>
A study of Boston Housing Dataset problem by Gradient Boosting regression model and neural network model</p>

<p></p>


<h2 id="blind-speech-separation"><a href="https://shun60s.github.io/Blind-Speech-Separation/">U-Netによる音楽と音声のミックス信号（モノラル）からの音声の分離</a></h2>
<p>バックに音楽が流れていて、そこから音声だけを抽出するような場面を想定して、音楽と相関のない音声をミックスしたモノラル信号から音声部分を抜き出す実験をしてみた。 <br>
A study of blind speech separation using U-Net</p>
<p></p>


<h2 id="music-tagging-chainer"><a href="https://shun60s.github.io/music-tagging-chainer/">音楽のジャンル分け</a></h2>
<p>Kerasで作成された音楽のジャンル分け（タグ付け）をChainer用に作り変えてみた。 <br>
A remake of Music Genre Classification with Deep Learning</p>
<p></p>


<h2 id="Python-WORLD-Win10"><a href="https://shun60s.github.io/Python-WORLD-Win10/">音声分析変換合成システムWORLDのPython </a></h2>
<p>WORLD PYTHONをWindows10でも動くように変更したもの。 <br>
A change of Python WORLD to function in win10 environment</p>
<p></p>


<h2 id="Wavenet"><a href="https://shun60s.github.io/chainer-examples-wavenet-clone/">WaveNetの実験</a></h2>
<p>chainer-colab-notebookで公開されているWaveNetをGoogle Colaboratoryで実験してみた。<br>
An experiment of WaveNet on Google Colaboratory</p>
<p></p>


<h2 id="softmax-indexi-weighed"><a href="https://shun60s.github.io/softmax-index-weighted/">損失関数の自作</a></h2>
<p>chainerを使ってインデック差で重み付けする損失関数を自作してみた。<br>
A hand-made loss function which uses index difference as weight</p>
<p></p>

<h2 id="spectral-subtraction"><a href="https://github.com/shun60s/spectral-subtraction/">スペクトル・サブトラクション</a></h2>
<p>ノイズ抑制の手法の、簡単なスペクトル・サブトラクション。<br>
A simple spectral subtraction</p>
<p></p>

<h2 id="vocal-tube-noise-s"><a href="https://shun60s.github.io/Vocal-Tube-Noise-S-Model/">摩擦音の「さ」音の生成</a></h2>
<p>２管声道モデルとノイズ源を使った摩擦音の「さ」音の生成<br>
generation of fricative voice /sa/ sound by two tubes model and noise source instead of turbulent sound.</p>
<p></p>

<h2 id="vocal-tube-noise-k"><a href="https://shun60s.github.io/Vocal-Tube-Noise-K-Model/">破裂音の「が」「か」音の生成</a></h2>
<p>爆風インパルス波とノイズ源と２管声道モデルを使った破裂音の「が」「か」音の生成<br>
generation of plosive voice /ga/ /ka/ sound by pseudo blast impulse, noise source instead of turbulent sound, and two tubes model.</p>
<p></p>


<h2 id="vocal-tube-N"><a href="https://shun60s.github.io/Vocal-Tube-N-Model/">鼻音の「な」「ま」音の生成</a></h2>
<p>２管声道モデルと鼻の効果を含む音源を使った鼻音の「な」「ま」音の生成<br>
generation of nasal voice /na/ /ma/ sound by two tubes model and nasal effect source.</p>
<p></p>


<h2 id="vocal-tube-I"><a href="https://shun60s.github.io/Vocal-Tube-I-Model/">ノイズ源有り無しの「い」音の生成</a></h2>
<p>２管声道モデルとノイズ源を使った母音「い」音の生成<br>
generation vowel /i/ sound by two tubes model and noise source.</p>
<p></p>


<h2 id="vocal-tube-T"><a href="https://github.com/shun60s/Vocal-Tube-T/">３個の接合面を持つ管模型</a></h2>
<p>もともとは鼻音のモデル化として作ったもの。これだけでは鼻音はうまく再現できない。)<br>
three junction model of tube.</p>
<p></p>

<h2 id="glottal-spectrum"><a href="https://shun60s.github.io/glottal-source-spectrum/">声門の音源のスペクトルの予想</a></h2>
<p>口の放射特性の逆フィルターとフォルマント周波数で減衰するフィルターを使って声門の音源のスペクトルの状態を予想する<br>
A trial estimation of glottal source spectrum by anti-formant filter and inverse radiation filter.</p>
<p></p>


<h2 id="vocal-tube-estimation"><a href="https://github.com/shun60s/Vocal-Tube-Estimation2/">２管声道モデルの推定</a></h2>
<p>周波数特性のピークとドロップピークから２管と３管と４管と５管の声道モデルを推定する。<br>
estimation two three four five tube model from peak and drop-peak frequencies of vowel voice.</p>
<p></p>

<h2 id="voice-BPF-bank"><a href="https://shun60s.github.io/Voice-BPF-bank/">バンド・パス・フィルター・バンク</a></h2>
<p>音声のバンド・パス・フィルター・バンクによる分析とその応用<br>
Band Pass Filter bank and its application to voice sound analysis.</p>
<p></p>

<h2 id="Wave_Digital_Filter"><a href="https://shun60s.github.io/rt-wdf_renderer-sample-study/">Wave Digital Filter</a></h2>
<p>Wave Digital Filterによる３極真空管アンプのシュミレーション<br>
Study of Wave Digital Filter and simple triode amplifier simulation. </p>
<p></p>

<h2 id="hormonic"><a href="https://shun60s.github.io/Harmonic/">スペクトログラムのよる高調波の分析</a></h2>
<p>聴覚の仕組みに似せて作った分析器。<br>
Spectrogram analysis of Tube Amplifier Distortion (PC simulation) </p>
<p></p>

<h2 id="impulse"><a href="https://github.com/shun60s/impulse-response/">インパルス応答による非線形系の評価</a></h2>
<p>非線形の出力と　仮にそれを線形に当てはめたときの出力（インパルス応答から計算）の差を計算する。<br>
evaluate non linear system by difference with calculation from impulse response. </p>
<p></p>

<h2 id="Chebyshev"><a href="https://shun60s.github.io/Chebyshev-expansion/">チェビシェフ多項式への展開</a></h2>
<p>音楽信号の、チェビシェフ多項式への展開と、ARMAモデルによるスペクトル推定の試み。<br>
trial of chebyshev polynomials expansion and  ARMA Spectral density power estimation, to fragment of music signal</p>
<p></p>

<h2 id="FIR-min-phase"><a href="https://shun60s.github.io/Python-minimum-phase-FIR-design/">最小位相特性をもつFIRフィルターの設計</a></h2>
<p>ヒルベルト変換を使って、与えられた周波数特性をもつ近似的な最小位相特性のFIRフィルターを設計する。<br>
approximate minimum-phase FIR filter design from specified frequency characteristic by use Hilbert transform
</p>
<p></p>

<h2 id="pyaudio"><a href="https://github.com/shun60s/PyAudio-full-duplex/">PyAudioの実験</a></h2>
<p>PyAudioを使った同時録音再生の実験。<br>
experiment of PyAudio full duplex, record, filter, mix, and play simultaneously. </p>
<p></p>

<h2 id="spectrogram2"><a href="https://github.com/shun60s/spectrogram2/">複数音源の調波構造</a></h2>
<p>複数の話者と効果音から構成されるスペクトログラムと、その明瞭度が劣化したスペクトログラムの比較。<br>
example of multiple sound sources spectrogram and clarity deteriorated one via sound path. </p>
<p></p>

<h2 id="yolo-spectrogram-detector"><a href="https://github.com/shun60s/YOLO-spectrogram-darknet-clone/">スペクトグラム検出器</a></h2>
<p>スペクトログラムの中から、調波構成（曲がっているものと平坦なものの２種類だけ）を抜き出す画像認識の試み。<br>
try of spectrogram detector by yolo cloned from AlexeyAB darknet </p>
<p></p>


<h2 id="harmonic-structure-parts-detect"><a href="https://shun60s.github.io/harmonic-structure-parts-detect/">スペクトログラムの調波構造を構成する部品の検出</a></h2>
<p>スペクトログラムにおいて調波構造を構成する部品をマスク付きテンプレートマッチング法で検出する試み。<br>
experiment of voice/instrument-sound harmonic structure parts detection by template matching with mask method. </p>
<p></p>

<h2 id="mnist-basis-function"><a href="https://shun60s.github.io/mnist-basis-function/">基底を使った数字文字の認識</a></h2>
<p>３ｘ３画素の基底を使ったMNIST数字文字の認識。<br>
MNIST digits classification using 3x3 pixels basis function. </p>
<p></p>


<h2 id="BipedalWalker-v2"><a href="https://github.com/shun60s/BipedalWalker-a3c/">BipedalWalker-v2</a></h2>
<p>強化学習のA3Cを使って、OpenAI Gym のBipedalWalker-v2で2本足を交互に動かす解を見つける。<br>
Try to solve BipedalWalker-v2 move legs alternately by RL a3c. </p>
<p></p>


<h2 id="prednet-practice"><a href="https://github.com/shun60s/pytorch-prednet-practice/">prednetの練習</a></h2>
<p>pytorchによるprednetの動作検証。<br>
A prednet practice using preprocessed KITTI data.</p>
<p></p>


<h2 id="state-choice"><a href="https://github.com/shun60s/BipedalWalkerHardcore-Weights-Choice/">BipedalWalkerHardcore-v2の障害状況の推定</a></h2>
<p>OpenAI Gym のBipedalWalkerHardcore-v2の障害の状態をLidarの情報からDNNを使って予測させ、２種類の重みを切り替える方法を学習させるもの。<br>
Try to estimate obstacle condition of BipedalWalkerHardcore-v2.</p>
<p></p>

<h2 id="wavwGAN"><a href="https://github.com/shun60s/wavegan-clone/">WaveGANの学習</a></h2>
<p>WaveGANの学習と、波形そのものをラベルとして利用したconditional<br>
A practice of wave GAN synthesize.</p>
<p></p>


<h2 id="WebApp"><a href="https://github.com/shun60s/time-difference-WebApp/">WebApp</a></h2>
<p>ステレオ録音したLチャンネルとRチャンネルの時間差を推定するWebApp<br>
A WebApp of stereo WAV file channel time difference estimation.</p>
<p></p>


<h2 id="Riemann"><a href="https://shun60s.github.io/Riemann-zeta-argument-principle/">Riemann Zeta 関数の偏角の原理の計算 </a></h2>
<p>Riemann Zeta 関数の、□領域を区分求積法で、偏角の原理を計算する。<br>
calculate number of Riemann zeta function zeros, according to argument-principle</p>
<p></p>


<h2 id="ADT"><a href="https://github.com/shun60s/ADTLib-trial/"> Automatic Drum Transcription</a></h2>
<p>ADTLibを利用したドラム譜の自動作成のトライアル<br>
A trial of Automatic Drum Transcription.</p>
<p></p>


<h2 id="vocal-tube-model-list"><a href="https://github.com/shun60s/vocal-tract-tube-model-list">Vocal Tract Tube Model repositories </a></h2>
<p>声道モデルのレポジトリを表にまとめたもの <br>
repositories table related to vocal tract tube model </p>
<p></p>


<h2 id="DNN_meaning">DNNの意味</h2>
<p>非常に多くの変数をもつ連立方程式を、その変数の多さに見合ったサンプル（経験）で解くことによって、（その複雑な）経験を表現（分類）できるようになる。原理的には、対象の複雑さにみあった変数の多さで、ありとあらゆる経験を学習させれば、ハズレは出なくなるようになるのかもしれない。<br>
 （2018年11月8日記）<br>
<br>
DNNは精度計算が苦手だ。DNNに入れる前処理で　（あらかじめ分析して）　特徴的な差が明確に現われるような
表現にしておく必要があるだろう。<br>
（2020年1月26日追記）<br>
<br>
強化学習について<br>
基底となりうるBC（Behavior Copy）はない。あるのは、それを実現するために十分な「入力情報とネットワーク構造の組み合わせ」があるのだろう。<br>
与えられた訓練サンプルの条件で、たまたま動いている感じで、応用が効かない。 <br>
無限に近い、ありとあらゆるサンプルで訓練できれば、ほとんどのことに対応できるようになるのかもしれないが、実現が困難である。<br>
（2021年2月23日追記）<br>
<br>
応用とは、細分化された（学習しやすいように）比較的浅いネットワーク構造において、<br>
既に学習した重みを利用して、新たなものを (一部の重みを固定して)比較的 短時間で学習が完了することを云うのかもしれない。<br>
（2021年2月28日追記）<br>
<br>
それなりの結果を出すには膨大なビッグデータと計算パワーが前提となるが、個人でおこなうには限界がある。<br>
</p>
</section>


</div>
</div>
<br>
<br>
<br>
<p class="naname1">リンク Link</p>
<figure class="link-box">
<a href="https://qiita.com/terms"><img src="index_files/qiita.png" alt="qiita"></a>
<figcaption>Qiita</figcaption>
</figure>
<figure class="link-box">
<a href="https://github.com"><img src="index_files/github.png" alt="github"></a>
<figcaption>GitHub</figcaption>
</figure>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>


</body></html>