Playground for spiking neural network
This notebook provides a simple example of converting the Google Speech Commands dataset (GSC) into spiking data using the Speech2Spikes package. It demonstrates taking one audio command and encoding it with a spiking representation as a basic demonstration of how to use the package.
- Google Speech Commands Dataset: Tensorflow Speech Commands Dataset v0.02
- Speech2Spikes Package: DOI Reference
The conversion from raw audio to a spiking representation involves several steps as outlined in the associated paper. Here's a brief overview of each step:
The raw audio, assumed to be a 1-second clip sampled at 16 kHz, starts as a 1-dimensional array of 16,000 samples. The audio is first centered and scaled to normalize it for further processing. In the implementation, this involves padding and transposing.
The Sliding Discrete Fourier Transform (SDFT) incrementally maps the raw audio to the frequency domain, capturing the spectral components over time.
The frequency domain data is then mapped onto Mel-frequency bands, focusing on the perceptually relevant aspects of sound.
A log transformation is applied to the Mel-frequency band data, followed by stacking to prepare it for spike encoding.
The continuous-valued features are encoded into spikes using the Step-Forward algorithm, resulting in a binary representation where each element indicates the presence or absence of a spike. This mimics the firing patterns of neurons. The implementation uses tensor_to_events
for this step.
In the provided implementation code, _default_spec_kwargs
specifies an n_mels
value of 20, which likely contributes to the 20 units/neurons dimension in the final spike data representation. The conversion process involves decomposing the audio into these Mel frequency bands to represent different frequency components as "neurons."
The notebook is a demonstration for educational and illustrative purposes, showcasing a simple use case of the Speech2Spikes package for converting audio into a neuromorphic-friendly format. It serves as a starting point for more complex applications and experiments with spiking neural networks or related fields.