bombcell: evaluate unit quality and ephys properties

Bombcell works with units recorded with Neuropixel probes (3A, 1.0 and 2.0, recorded with SpikeGLX or OpenEphys) and spike-sorted with kilosort. It classifies the unit into three categories: single units, multi-units and noise units, with an option to keep or remove non-somatic spikes. param is the structure that defines how to compute the quality metrics and which thresholds to use to classify units.

Not yet published. Please cite xxx.

Preliminary versions of bombcell were used in:

Peters et al., Nature, 2021 to define real, well isolated striatal units and classify them into MSNs, FSIs and TANs. See the script bc_selectAndClassifyStriatum(work in progress) to classify striatal cells in the same way.
Peters et al., Cell Reports, 2022 to to define real, well isolated pre-frontal cortex units.

Table of contents:

Getting started
Quality metrics guide
Quality metrics GUI guide
Ephys properties guide
Recommended pre-processing
Dependancies
Other ephys utilities

Getting started

Clone the bombcell_1.0_stable repository and the dependancies. The script bc_qualityMetrics_pipeline provides an example workflow.

To start out, we suggest you compute quality metrics with default param values (contained in bc_qualityParamValues), and then adjust the thresholds for your particular neuronal region and needs by looking at (1) individual units, in the interactive GUI (2) distribution histograms of the units' quality metrics (see General plots), (3) numbers of units removed by each quality metric. It may also be useful to plot the quantity you which to measure as a function of each quality metric (see Fig. 2 Harris et al., Nat. Neuro, 2016).

If you want to compute ephys properties, change your working directory to bombcell\ephysProperties\helpers in matlab and run mex -O CCGHeart.c to able to compute fast ACGs.

If you have z-lib compressed ephys data, compressed with mtscomp, you will additionally need the zmat toolbox.

Quality metrics guide

General overview

Bombcell has been written to be as user-friendly as possible. If you run into any issues or have suggestions, please raise a github issue or email us.

In order to avoid possible conflicts, all bombcell functions are preceded by bc_.

Qmetric and param structures

Bombcell is organized around two main matlab structures: qMetrics and param.

qMetrics is a structure generated by the bc_runAllQualityMetrics function. Its fields are different quality metrics, and each field contains a quality metric value for each unit.
param is a structure generated by the function bc_qualityParamValues and it contains parameters both to compute quality metrics with bc_runAllQualityMetrics and to then classify cells with bc_getQualityUnitType.

Plotting quality metrics

We have added automatic plotting function at each step, in order to aid in trouble-shooting and evaluating bombcells' output:

Set param.plotGlobal = 1 to plot summary of the noise units' waveforms compared to single multi-units, distribution histograms of the units' quality metrics and numbers of units removed by each quality metric. See the detailed overview of global output plots section for examples of these plots.
Set param.plotThis = 1 to plot figures for each quality metric and for each cell (plot examples displayed below). This generates a considerable amout of plots. We suggest you only use when de-bugging or to generate example cell plots. See the detailed overview of the quality metrics section for examples of these plots.

In addition, bombcells' GUI allows you to flip through cells and their quality metrics. This is useful in evaluating where to set your quality metric classification thresholds and checking how bombcell behaves. See the quality metrics GUI guide section for more information.

Loading and saving data

All bombcell outputs are saved in the ONE format, in .npy or .parquet files. .npy files can be read in matlab using the npy-matlab toolbox, and .parquet files can be read in matlab using the built-in readparquet function.

The following files are outputed by bombcell:

_bc_parameters._bc_qMetrics.parquet
templates._bc_fractionRefractoryPeriodViolationsPerTauR.parquet
templates._bc_qMetrics.parquet
templates._bc_rawWaveforms.npy
templates._bc_rawWaveformPeakChannels.npy
templates._bc_baselineNoiseAmplitude.npy
templates._bc_baselineNoiseAmplitudeIndex.npy
(optional, if param.saveMultipleRaw is set to true) 'Unit$CLUSTER_ID$_RawSpikes.npy'

These saved files can be retrieved with the function [param, qMetric] = bc_loadSavedMetrics(savePath)

Additionally, optionally if param.saveMatFileForGUI is set to true, bombcell saves a templates.qualityMetricDetailsforGUI.mat file containing various data needed for bombcells GUI. This data can also be loaded using the function bc_loadMetricsForGUI.

A run-through of a typical workflow

Load ephys data, eg:

[spikeTimes_samples, spikeTemplates, templateWaveforms, templateAmplitudes, pcFeatures, pcFeatureIdx, channelPositions] = bc_loadEphysData(ephysKilosortPath)
Run all quality metrics and clasify cells with the function bc_runAllQualityMetrics, eg:

[qMetric, unitType] = bc_runAllQualityMetrics(param, spikeTimes_samples, spikeTemplates, templateWaveforms, templateAmplitudes, pcFeatures, pcFeatureIdx, channelPositions, savePath)
Look at global output plots and flip through cells with the GUI:

unitQualityGuiHandle = bc_unitQualityGUI(memMapData, ephysData, qMetric, forGUI, rawWaveforms, param, probeLocation, unitType, loadRawTraces)
Refine classification based on step 4: choose quality metric thresholds

Running speed and bottlenecks

Depending on your settings, bombcell takes between 2' and 35' for a typical 1 hour long neuropixels recording. The running time varies according to three main parameters:

if this is the first run, raw waveforms are extracted from your raw .bin or .dat ephys file. These are then saved in two .npy files. Subsequent runs load in the already extracted waveforms and thus take ~10-15' less. Initial running time increases or decreases as the number of spikes extracted per unit, defined by param.nRawSpikesToExtract, increases or decreases.
if param.computeDrift is true, drift is computed. This can add ~10' per run, and is disabled by default.
if param.computeDistanceMetrics is true, several distance metrics are computed. This can add ~10' per run. Distance metrics are susceptible to probe drift, and correlate well with a units raw waveform amplitude. As a result, we suggest using a units amplitude instead of distance metrics and this parameter is diabled by default.

Detailed overview of the quality metrics

Quality metrics to classify noise/non-somatic units

Somatic waveforms are defined as waveforms where the largest trough precedes the largest peak (Deligkaris, Bullmann & Frey, 2016).
Noise units are classified as noise if any of the following are true
- the number of peaks or troughs detected in the max channel template waveform for the unit exceeds the defined thresholds : param.maxNPeaks and param.maxNTroughs.
- the baseline of the max channel template waveform for the unit is not flat (exceeds the defined threshold param.maxWvBaselineFraction).
- the slope template waveform spatial decay is above a defined threshold param.minSpatialDecaySlope.

Quality metrics to classify multi-units

After classifying noise units, the remaining units are classifyed as good single units if the criteria below are met. They are classifying as multi-units otherwise.

the estimated percentage of spikes missing is below the param.maxPercSpikesMissing

The percent of spikes missing (false negatives) is estimated by fitting a gaussian the distribution of amplitudes, with a cutoff parameter. This assumes the spike amplitudes follow a gaussian distribution, which is not strictly true for bursty cells, like MSNs. Optionally, if param.computeTimeChunks is true, the recording is split in time chunks of size param.deltaTimeChunk, and the percentage of spikes missing is estimated seperately on these time chunks. Then, only the time chunks with a percent of spikes missing below the threshold are kept, and the rest of the quality metrics are computed on these epochs. This can be of use in cases where there is a lot of drift during the recording.

Below: example of unit with many spikes below the detection threshold in the first two time chunks of the recording.
number of spikes is above param.minSpikes

Below a certain amount of spikes, ephys properties like ACGs will not be reliable. A good minimum to use is 300 empirically, because Kilosort2 does not attempt to split any clusters that have less than 300 spikes in the post-processing phase.
the estimated percent of refractory period violations is below param.maxRPVviolations

The fraction of refractory period violations (false positives) is estimated using r = 2*(tauR - tauC) * N^2 * (1-Fp) * Fp / T , solving for Fp, with tauR the refractory period, tauC the censored period, T the total experiment duration, r the number of refractory period violations, Fp the fraction of contamination. method from Hill et al., J. Neuro, 2011.

Below: examples of a unit with a small fraction of refractory period violations (left) and one with a large fraction (right).
the raw mean waveform maximum amplitude is above param.minAmplitude

Units with low amplitude are noisy, further away units, that are more likely to be MUA.

Below: examples of a unit with high amplitude (blue) and one with low amplitude (red).
distance metrics are above param.minIsoD for the isolation distance, param.lratioMin for l-ratio, param.ssMin for the silhouette score

Disabled by default. This is if the most time-consuming part of the script. See Harris et al., Neuron, 2001 for more information on isolation distance, (see Schmitzer-Torbert and Redish, 2004) for more information on l-ratio and (see Rousseeuw, 1987 for more information on silhouette-score.

Below: examples of a unit with high isolation distance (left) and one with low isolation distance (right).

Detailed overview of global output plots

If param.plotGlobal is set to true, after computing quality metrics, the script will output 3 summary plots :

a Euler diagram of the number of units classifyed as noise/multi-unit with each quality metric. Numbers in the circles indicate the number of units classifying as noise/multi-unit by that quality metric/intersection of quality metrics.
for each a quality metric, a histogram of the distribution of values for all units. The red lines indicate classification thresholds.
template waveforms for each unit classiyed as good, multi-unit and noise, overlaid on top of each other. This is a quick way of checking there is no odd, noisy unit that has mistakenly been classifyed as either good or multi-unit.

Quality metrics GUI guide

Plot a GUI to flip through the quality metrics for each cell with the function bc_unitQualityGUI Eg:

bc_unitQualityGUI(memMapData, ephysData, qMetrics, param, probeLocation, unitType, plotRaw)

Toggle units by using the → and ← keys. Pressing g, m, n brings you to the next good, multi-unit or noise unit, respectively. Press u to select a particular unit. Navigate in time through the raw data to see this units' individual spikes with the ↑ and ↓ keys.

Unit location view

This view plots the depth of each unit on the probe in y, and it's log-normalized firing rate in x. Single units are plotted in green, multi-units in indigo and noise in red. The current unit is plotted larger and circled in black.
Template waveform view

This view plots the template waveforms for the current unit. The maximum waveform is in blue, and detected peaks are overlaid.
Raw waveform view

This view plots the mean raw waveforms for the current unit. The maximum waveform is in blue, and detected peaks are overlaid.
ACG view

This view plots the auto-correlogram (ACG) for the current unit. The horizontal red line indicates the ACG asymptote, which corresponds to the unit's firing rate. The vertical red line plot the refractory period location.
ISI view

This view plots the inter-spike-intervals (ISI) for the current unit. The vertical red line plot the refractory period location.
Isolation distance view
Raw waveform view

Plots the raw data in black, with detected spikes for this unit in blue.
Amplitude view

This view plots the scaling factor applied to each spike by kilosort in black. Spikes currently displayed in the raw data view are shown in blue, and spikes that have an ISI < refractory period threshold are shown in purple.

Ephys properties guide

work in progress

Classifying cell types

striatal cells
GPe cells
cortical cells

Recommended recording and pre-processing

To reduce the noise units you obtain, we recommend recording with either both a ground wire and reference wire going from the probe to the mouse, or with a ground wire and the internal reference (if using probes other than the 3A - the internal reference on these doesn't work well). To remove artefacts from your recorded data, either temporally align your channels with each other and common-average reference your data with Bill Karsh's function CatGT, before feeding this data into kilosort or use pyKilosort, where this is implemented.

To maximize your number of single good units, we recommend looking at your raw data and spikes detected by kilosort, to assess whether most spikes are being detected. If not, consider lowering kilosort's detection thresholds.

Dependancies

https://github.com/kwikteam/npy-matlab (to load .npy data in)

Other ephys utilities:

generate IMRO files/channel maps : see recordingUtilities
manual curation guide
quality metric plugins for phy : see phyPlugins
decompress data z-lib compressed with mtscomp : see decompressData. Requires the zmat toolbox

Name		Name	Last commit message	Last commit date
Latest commit History 366 Commits
classifyCells		classifyCells
classifyStriatum		classifyStriatum
decompressData		decompressData
ephysProperties		ephysProperties
histology		histology
images		images
loading		loading
personal_work_in_progress		personal_work_in_progress
phyPlugins		phyPlugins
qualityMetrics		qualityMetrics
recordingUtilities		recordingUtilities
visualizationTools		visualizationTools
.Rhistory		.Rhistory
README.md		README.md
bc_qualityMetrics_pipeline.asv		bc_qualityMetrics_pipeline.asv
bc_qualityMetrics_pipeline.m		bc_qualityMetrics_pipeline.m
manualCurationPhyWorkflow.md		manualCurationPhyWorkflow.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bombcell: evaluate unit quality and ephys properties

Getting started

Quality metrics guide

General overview

Qmetric and param structures

Plotting quality metrics

Loading and saving data

A run-through of a typical workflow

Running speed and bottlenecks

Detailed overview of the quality metrics

Quality metrics to classify noise/non-somatic units

Quality metrics to classify multi-units

Detailed overview of global output plots

Quality metrics GUI guide

Ephys properties guide

Classifying cell types

Recommended recording and pre-processing

Dependancies

Other ephys utilities:

About

Releases

Packages

Languages

kdiosi/spike_sorting

Folders and files

Latest commit

History

Repository files navigation

bombcell: evaluate unit quality and ephys properties

Getting started

Quality metrics guide

General overview

Qmetric and param structures

Plotting quality metrics

Loading and saving data

A run-through of a typical workflow

Running speed and bottlenecks

Detailed overview of the quality metrics

Quality metrics to classify noise/non-somatic units

Quality metrics to classify multi-units

Detailed overview of global output plots

Quality metrics GUI guide

Ephys properties guide

Classifying cell types

Recommended recording and pre-processing

Dependancies

Other ephys utilities:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages