Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test out Arbor as simulation backend #67

Open
5 of 6 tasks
espenhgn opened this issue Jun 18, 2021 · 6 comments · May be fixed by #72
Open
5 of 6 tasks

Test out Arbor as simulation backend #67

espenhgn opened this issue Jun 18, 2021 · 6 comments · May be fixed by #72
Assignees

Comments

@espenhgn
Copy link
Contributor

espenhgn commented Jun 18, 2021

Is your feature request related to a problem? Please describe.
Replacing LFPy by Arbor+LFPykit for simulations of passive multicompartment neuron models with current-based synapses in Population.cellsim() ought to be a relatively low hanging fruit, that could very well speed up simulations substantially over the equivalent simulations now relying on LFPy+NEURON (in particular with GPU access).

Steps:

Notes/possible issues that may occur:

  • Arbor requires a two-step process, we first need to instantiate a cell recipe without synapses to find the segments belonging to each CV. The segment/CV areas and locations (loc_sets?) must then be communicated to all RANKs before cell models for actual simulations are performed as these will determine the probabilities of synaptic connections
  • We have used .hoc files usually for morphologies (in examples). Need to convert these somehow or do some archeology to find the original .swc or neurolucida files.
@espenhgn
Copy link
Contributor Author

Back-conversion of morphologies to .swc format should be doable using https://github.com/JustasB/hoc2swc

@espenhgn
Copy link
Contributor Author

espenhgn commented Jul 1, 2021

Added file Example_Arbor_swc_hybridLFPy.ipynb (https://github.com/INM-6/hybridLFPy/blob/fix-67/examples/Example_Arbor_swc_hybridLFPy.ipynb) addressing pt 2 above.

@espenhgn
Copy link
Contributor Author

espenhgn commented Aug 3, 2021

Single-node scaling test on jureca-dc comparing Arbor and the usual hybridLFPy implementation:
image
The “run” times correspond to total times spent per task calling arbor.simulation.run() and the equivalent LFPy.Cell.simulate() over and over.
The “pop” times include additional overhead like setting up the cells, determining synapse locations, activation times, compute LFPs etc.
The corresponding files are https://github.com/INM-6/hybridLFPy/tree/fix-67/examples/benchmarks where the file example_brunel_arbor.py is the Arbor+lfpykit-implementation of example_brunel.py using the “old” hybridLFPy.
Output is comparable; the numbers of compartments/CVs are comparable between the NEURON/LFPy and Arbor simulations as max segment/CV length is the same value.

@espenhgn
Copy link
Contributor Author

espenhgn commented Aug 3, 2021

Multinode performance testing w. increased network size and longer simulation duration (2000 ms), two CPUs (virtual) per MPI task, multithread execution with Arbor via

        context = arbor.context(2, None)
        domains = arbor.partition_load_balance(recipe, context)

Arbor (https://github.com/espenhgn/arbor/tree/isyn) config:

{'mpi': False,
 'mpi4py': False,
 'gpu': False,
 'vectorize': True,
 'version': '0.5.3-dev',
 'source': '2021-07-31T15:18:36+02:00 9503801c293b6933e2b186d8fbf7a1245366e876',
 'arch': 'znver1'}

image

@bcumming
Copy link

bcumming commented Aug 4, 2021

Hello @espenhgn.

Am I correct that the following steps are performed for simulating a model with NCELLS cells:

Step 1
initialise MPI

Step 2
Create and run a Nest Brunel model with NCELLS cells to generate input spike information for detailed cell models

Step 3

Given the number of cells per MPI rank is NLOCAL_CELLS = NCELLS / NRANKS,
for each cable cell assigned to this rank perform a single-cell-model simulation with Arbor.

Then I have the following observations:

  • the arbor.simulation object only contains a model with 1 cell, so multithreading isn't required. The CPU back end of Arbor parallelises cells over threads, so a one cell model will only run on one thread
    • The most efficient way to run would be one MPI rank per physical core, with arbor.context(1, None).
  • You are running on Juwels-DC, which has 2 sockets of 64 core AMD EPYC CPUs
    • so you are running 128 MPI ranks per node, where each node has 2 "hyperthreaded" cores.

And I have the following questions:

  • what is the srun/sbatch parameters you use?
  • does NTASKS in the plots correspond to the number of MPI ranks?
    • If my assumptions above are correct, NTASKS=2^9 would run on 4 nodes?

@espenhgn
Copy link
Contributor Author

espenhgn commented Aug 5, 2021

Hi @bcumming

Hello @espenhgn.

Am I correct that the following steps are performed for simulating a model with NCELLS cells:

Step 1
initialise MPI

Yes, indeed.

Step 2
Create and run a Nest Brunel model with NCELLS cells to generate input spike information for detailed cell models

Yes, that's the general idea behind hybridLFPy that simulations of network activity (spikes) is performed before the simulations of the detailed cell simulations needed for extracellular signals. For the sake of simplicity the arbor_reproducer.py script creates random synaptic activation times for the arbor simulation part using scipy.stats which are subsequently fed to the class Recipe instance around here:

recipe = Recipe(cable_cell, weights=weights, times=times)
(leaving NEST out of the equation altogether).

Step 3

Given the number of cells per MPI rank is NLOCAL_CELLS = NCELLS / NRANKS,
for each cable cell assigned to this rank perform a single-cell-model simulation with Arbor.

Yes, each rank should run each subset of cells distributed in a round-robin manner. Each cell simulation on each rank is performed in succession in the case that NLOCAL_CELLS > NRANKS. The ArborPopulation.run() method (

) just iterates over the ArborPopulation.cellsim(cellindex) method (
def cellsim(self, cellindex):
).

Then I have the following observations:

  • the arbor.simulation object only contains a model with 1 cell, so multithreading isn't required. The CPU back end of Arbor parallelises cells over threads, so a one cell model will only run on one thread

Ok, thanks for the confirmation. I was curious whether or not Arbor was somehow distributing single-cell simulations analogous to the multisplit method of Neuron.

  • The most efficient way to run would be one MPI rank per physical core, with arbor.context(1, None).
  • You are running on Juwels-DC, which has 2 sockets of 64 core AMD EPYC CPUs

I'm running on Jureca-DC as well as JUSUF, both in Jülich. These are both using two 64 core AMD EPYC CPUs per node.

  • so you are running 128 MPI ranks per node, where each node has 2 "hyperthreaded" cores.

Yes, for the benchmark curves above at most 128 MPI ranks are created per node.

And I have the following questions:

  • what is the srun/sbatch parameters you use?

These are typically (see run_arbor_reproducer.py https://github.com/INM-6/hybridLFPy/blob/fix-67/examples/benchmarks/run_arbor_reproducer.py):

#!/bin/bash
##################################################################
#SBATCH --account $ACCOUNT
#SBATCH --job-name <MD5>
#SBATCH --time 0:04:01
#SBATCH -o logs/<MD5>.txt
#SBATCH -e logs/<MD5>_error.txt
#SBATCH --ntasks NTASKS
#SBATCH --cpus-per-task NTHREADS
##################################################################
# from here on we can run whatever command we want
# unset DISPLAY # DISPLAY somehow problematic with Slurm
srun --mpi=pmi2 python -u arbor_reproducer.py <MD5>

The NTHREADS value was parsed to arbor.context() in the simulation script as well (but as you confirmed this should then remain at its default value "1").

The <MD5> value is just a hash value I use to keep track of unique parameter combinations for each individual simulation in this case.

  • does NTASKS in the plots correspond to the number of MPI ranks?

Yes.

  • If my assumptions above are correct, NTASKS=2^9 would run on 4 nodes?

Yes, for the curves shown above in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants