Validity of using parallel in core._parallel_rolling_func #783
Replies: 10 comments
-
@earthgecko Thank you for writing this up. Please allow me some time to review. In the meantime, considering the short length of your time series, I wonder if using the (nearly) pure
Note that we really only support direct usage of the functions provided in our user-facing API and Note: I will be back with more comments after I've had time to review your points above. |
Beta Was this translation helpful? Give feedback.
-
@earthgecko Before I am able to help, I have the following questions/comments:
|
Beta Was this translation helpful? Give feedback.
-
Hi @seanlaw Thanks for to reply, I shall try to answer your question as best I can.
The purpose or function of aamp there is beyond my understanding at this point.
Create the problem. In master in core.py core._parallel_rolling_func add
Start a jupyter notebook or python terminal:
Now start a new Python terminal or in the notebook with the restarted kernel do the same again. To fix the problem, in master in core.py core._parallel_rolling_func leave
Purge Now run the same test again, this time the jit cache files will be created but they will work with the
|
Beta Was this translation helpful? Give feedback.
-
Thank you @earthgecko! This is helpful. I will find some time to see if I can reproduce the issue
So, |
Beta Was this translation helpful? Give feedback.
-
@earthgecko I was able to reproduce the issue with a minimum-reproducible-example that does not rely on STUMPY:
I learned that while functions are first class citizens in Python and can be passed around to other functions, sadly, this currently causes issues with Would you mind pulling the latest commit and checking if this solves your problem? Does the segfault/kernel restart go away? |
Beta Was this translation helpful? Give feedback.
-
In fact, you don't need to get/spawn a new cat. If you want to cache either
The first time you do this, they'll be slow but they will be cached in your Finally, to force that cache to be cleared from
where calling Let me know if that makes sense and if I can help explain anything conceptually. |
Beta Was this translation helpful? Give feedback.
-
Hi @seanlaw Thanks very much for the efforts, very much appreciated, especially on a Saturday! Apologies for the distraction to your weekend. The only part of this that you need to know today is that I can confirm that 45eb3f6 has the desired effects and works 👍 Please do not spend any more time on it at this stage, you have done more than enough helping to clarify lots of things on this issue, which I appreciate is well outside the current scope of stumpy. Below is some feedback but can wait as it is just results of some tests and some thoughts thereabouts, so come back it at a later date (not a weekend day) ... if you can ;) You are still reading? (I hope it is not Sunday 15 Jan 2023, if so close this tab) Firstly I am not suggesting that implementing caching requires your attention in any way at this point in terms of requesting the ability be added to stumpy. These are just some experiments and notes that may be useful in some way in the future, even if just for my own reference as I move forward. I have tested it directly and I also took the time to port those changes to the stripped down version I was/am testing with that has I was going to ask if Just some feedback regarding your suggestion relating to using the
However it does not create cache files for all the other njit functions that are used with
The first execution of The second execution of I appreciate that this is somewhat semantics, however I do not think that the
This indeed does produce all the same numba cache files for all the functions called by Granted, it is a solution without any changes to stumpy, however it is quite opaque and obfuscate for the user. Just pointing out that although this Additional to that the path in the helper functions can be variable depending on how both numba and stumpy were run. The Although at the moment stumpy is not dealing with numba cache, a few questions have been raised and these are not totally invalid queries relating to the ability to use numba caching, there are some upsides to it and in some use cases these can be very important aspects. All that said, I would suggest if you do consider adding any ability to use numba caching, just go down a I am going to continue down that route myself for the time being as I am also going to port over from using matrixprofile/mass_ts to using the stumpy implementation (at some point after I determine how to convert those functions) so I may want numba caching in that too. So I will fork and try a Enjoy your day (Sunday?) @seanlaw |
Beta Was this translation helpful? Give feedback.
-
Thanks @earthgecko, this is a labor of love. Now that I understand caching a little bit better, I've added a (untested, purely experimental)
You can also get a list of the cached functions via:
and the cached functions (stored in
Note that:
Please feel free to ask any questions and provide any feedback. Again, this is purely experimental and should not be relied upon but, if it works, it's currently straightforward enough that I likely won't remove it (no guarantees though!). Use it at your own risk! |
Beta Was this translation helpful? Give feedback.
-
@seanlaw thanks I shall do 👍 |
Beta Was this translation helpful? Give feedback.
-
To move to discussion 👍 |
Beta Was this translation helpful? Give feedback.
-
Hi
This is not a bug report per se rather asking question or just FYI.
This is a difficult issue to report because it is quite difficult to describe.
Note this is has only been tested/found with the
stumpy.stump
function,however it is relevant in many other places too via
core.preprocess_non_normalized
It appears that the use of
parallel=True
incore._parallel_rolling_func
doesnot have the desired effect in all circumstances.
Although there is no facility to invoke
cache=True
on thenjit
decorators instumpy, there are use cases where
cache=True
is an absolute requirement (Ishall update #699). So if modifications are made to allow for numba caching it
reveals that on the first run when
stumpy.stump
is called in a process/threadit will compile and run fine (FYI with len(T_A) == 1004), the first run with the
njit compilation taking around 13 seconds or so. The numba cache files are
created and each subsequent call of the
stumpy.stump
function in the sameprocess/thread with the same
T_A
will run super fast with no issues in like0.009974628977943212 seconds. All good.
The problem arises when another process is invoked and imports stumpy and loads
the numba cached jit files. At this point stumpy imports fine but when the
stumpy.stump
function is called in jupyter the kernel dies and in a Pythonterminal a
Segmentation fault (core dumped)
is encountered.As we are all aware debugging with numba can be quite difficult and laborious :)
However, if
parallel=True
is simply commented out of thecore._parallel_rolling_func
njit decorator, it works fine. I went through them all one by one (STRIPPED
down version), disabling both
parallel
andfastmath
all individually andtested, which eventually revealed
core._parallel_rolling_func
to be theculprit.
I am not certain of why this is the case, perhaps when caching is in play the
func
parameter is the problem as what func is numba compiling it, how does itknow what func will be passed to it? However, my understanding of what can and
cannot be achieved with numba is not that verbose.
That said I did try to modify that to actually use the func name and that had
the same kernel died/Segmentation fault result so perhaps not.
For example:
As I said a difficult one to describe. However although it may run during first
compilation, that the use of parallel in
core._parallel_rolling_func
isbreaking when compiled and cached does raise the question, is it valid to use
in this function?
That is one of the advantages of
cache=True
I find it really does validatenjit functions. Anything that is not correct will definitely always break when
the cached version is attempted to be loaded. It is quite handy for development
testing in that way I find (for me at least) :)
I must further caveat this with the fact that I am tested on a STRIPPED down
version, which I stripped down to trace this bug down. The version only has:
__init__.py
- ONLY imports required in core.py, stump.py and aamp.pycore.py
- ONLY functions required in stump.py and aamp.pyaamp.py
stump.py
Mainly because stump is all I currently need at the moment, but I must have
cache=True
otherwise I would have to consider stepping back tomatrixprofile-foundation/matrixprofile but I would have maintain it myself seeing
as they have binned it now :) But it did load and run in under 1 second and
having to wait between 13 and 17 seconds to run stumpy.stump is not an option,
but 0.009974628977943212 seconds is awesome!
So I am not sure what to do with this? Other than advise you of the findings.
I have applied the change to a full v1.11.1 version but is modified to allow for
caching which allows for
cache=True
to be set on all core, stump and aampnjit decorators and that it works too, as long as
parallel
is not passed. Onceagain only tested with
stumpy.stump
.Beta Was this translation helpful? Give feedback.
All reactions