You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Resources are things like CPU cycles, disk and network traffic utilization.
Running the unit tests, I use one core of my ten core machine, usually running around 30% utilization so about 5% utilization of my machine's resources (forgetting about the GPUs, which barely stir!)
Running a product like "question the docs" I see more cores active but still the machine seems mostly inactive.
It doesn't look like disk or network IO is a bottleneck either, but I don't have clear results.
If we made complete utilization of our machines' resources, we could speed up our unit tests by a factor of 2 to 20 (depending also on how many cores were in the machine), and our actual code by some lesser but quite possibly significant amount.
We should at least have some idea of where all our
How to approach the problem
1. Benchmarks
We need at least one very simple "benchmark": in other words, a simple-as-possible program that "does superduperdb" in a loop for something barely non-trivial, with a measurable throughput.
As we go on, we can create a mix of benchmarks, we should be careful not to fall into a hole full of metrics though.
For this large-scale order of magnitude stuff, even one benchmark is very useful, and we have a second benchmark for free - the tests.
2. Analysis (single-core)
Run the benchmarks, look at utilization of resources.
Run perf and make some graphs (see attached) or do analysis ourselves using pstats.
This will show you where the wait states are (in the attached document, look at the bottom right hand corner for an example).
Theoretically, there is always a way to eliminate wait states, or at least reduce them to be very small, but sometimes it might be they will be out of our control without too much work. I am sure we can do significantly better.
3. Analysis (multi-core)
We know already that our unit tests only utilize a single core for a series of complicated reasons which we are working to rectify. That's a separate task.
More important is our production code, and for that we need to have our benchmark.
The main program has facilities for parallelizing to other cores already, so it might be that once we improve our single core utilization, we only need to tune our usage of other cores in this step.
4. How much work is it?
Almost none to get started. We already know how to make perf graphs and will productionize that.
We need to make "the benchmark" which could just be "question the docs running in a tight loop with a counter". At this point we have some basic idea of how well or badly we are doing in production
To hit for example the issue of completely parallelizing the tests is going to take considerable time more, in parallel with other work.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
The problem: not enough resource utilization.
Resources are things like CPU cycles, disk and network traffic utilization.
Running the unit tests, I use one core of my ten core machine, usually running around 30% utilization so about 5% utilization of my machine's resources (forgetting about the GPUs, which barely stir!)
Running a product like "question the docs" I see more cores active but still the machine seems mostly inactive.
It doesn't look like disk or network IO is a bottleneck either, but I don't have clear results.
If we made complete utilization of our machines' resources, we could speed up our unit tests by a factor of 2 to 20 (depending also on how many cores were in the machine), and our actual code by some lesser but quite possibly significant amount.
We should at least have some idea of where all our
How to approach the problem
1. Benchmarks
We need at least one very simple "benchmark": in other words, a simple-as-possible program that "does superduperdb" in a loop for something barely non-trivial, with a measurable throughput.
As we go on, we can create a mix of benchmarks, we should be careful not to fall into a hole full of metrics though.
For this large-scale order of magnitude stuff, even one benchmark is very useful, and we have a second benchmark for free - the tests.
2. Analysis (single-core)
Run the benchmarks, look at utilization of resources.
Run perf and make some graphs (see attached) or do analysis ourselves using
pstats
.This will show you where the wait states are (in the attached document, look at the bottom right hand corner for an example).
Theoretically, there is always a way to eliminate wait states, or at least reduce them to be very small, but sometimes it might be they will be out of our control without too much work. I am sure we can do significantly better.
3. Analysis (multi-core)
We know already that our unit tests only utilize a single core for a series of complicated reasons which we are working to rectify. That's a separate task.
More important is our production code, and for that we need to have our benchmark.
The main program has facilities for parallelizing to other cores already, so it might be that once we improve our single core utilization, we only need to tune our usage of other cores in this step.
4. How much work is it?
Almost none to get started. We already know how to make perf graphs and will productionize that.
We need to make "the benchmark" which could just be "question the docs running in a tight loop with a counter". At this point we have some basic idea of how well or badly we are doing in production
To hit for example the issue of completely parallelizing the tests is going to take considerable time more, in parallel with other work.
Beta Was this translation helpful? Give feedback.
All reactions