-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the integration method #39
Conversation
Changed integration method away from the impulse method. Other things learned: peak gradient mode does not work on the tofts_v6 DRO. Also, multiplying your signal conversion by a scalar does not seem to affect fitting. Begs an obvious question: why then is relaxivity in the signal conversion equation? Also note the new testing data folder Tofts_DROs.
PkSolver/PkSolver.h
Outdated
ArrayType VeTerm; | ||
VeTerm = -Ktrans/Ve*Time; | ||
ValueType deltaT = Time(1) - Time(0); | ||
ArrayType VeTerm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AndrewBeers can you fix the indentation to be consistent with the rest of the code?
@AndrewBeers can you please test with real data and confirm there is no regression? When I test with a prostate DCE dataset, there is dramatic change in the result - most of the pixels are around 0.1, while before the change, there was more contrast (can't tell if this is scaling or not, since Slicer DataProbe clamps at 1 decimal digit for pixel value probing). I used default parameters with population AIF. Test DCE series I used is here: http://slicer.kitware.com/midas3/item/126198. Ktrans map screenshot with the current master: after your change: |
Hm, interesting.. What parameters for PkModeling did you put in? Below are some screenshots that I got from that data and my build. Below is the ktrans map for a manually-set bolus arrival time at 10 frames in. This is the ktrans map for the gradient-determined bolus arrival time option with the same settings. The gradient BAT did not perform too well on the phantom, but still gives semi-reasonable results below. What parameters did you use? I don't know the correct parameters for a prostate case - I just copied over some numbers I used to test the phantom. It's also clear to me that in both have problems with dead pixels, which might be overcome with pre-processing steps that we've yet to implement in PkModeling. This is especially obvious in the gradient BAT method. The high-ktrans area in the top-right of the image also goes to a value of '5' in my case, which likely signals that the ktrans value is exploding to some very high value in that section and then is clamped to a max value of 5. It would be an interesting area to dissect.. A sample curve from one of the pixels is pasted below - it looks fairly normal. Perhaps this is a consequence of using a population AIF in this case. I can do some more testing soon with the old module to see if it gets similar results! |
I am pretty sure I tried with the default parameters. I will test again later today. |
Made my code's indentation match the rest of the code.
There is another interesting point about PkModeling (either newly introduced with my edit or pre-existing, I haven't tested that yet). Those dead voxels I had referenced earlier have values of 'nan.' I assumed this was a problem with random, high-noise curves that the algorithm failed to fit. However, I ran a test using a custom AIF (extracted from a label map in the image), and got the result below: Those streaks of NaN values suggests to me that the algorithm might use the output values of the previous voxel as initial parameters for the next fitting (AKA 'warm fitting'). This creates streaks of contiguous 'nan' values, as initial starting parameters of [nan, nan] for [ve, ktrans] are unlikely to fit to anything other than [nan, nan] again. This had happened previously in our Python code when we used 'warm fitting' of initial parameters. I haven't been able to find evidence of this possible warm fitting initial parameter problem in the code yet, but I'm still wrapping my head around the whole structure of the program. Another thing to test and investigate perhaps before releasing.. |
I don’t recall whether warm fitting is used. But it is the type of optimization we would have used.
Jim
Jim Miller
Senior Principal Scientist
Software Sciences and Analytics
GE Research
GE imagination at work
On 1/17/17, 11:57 AM, Andrew Beers <[email protected]<mailto:[email protected]>> wrote:
There is another interesting point about PkModeling (either newly introduced with my edit or pre-existing, I haven't tested that yet). Those dead voxels I had referenced earlier have values of 'nan.' I assumed this was a problem with random, high-noise curves that the algorithm failed to fit. However, I ran a test using a custom AIF (extracted from a label map in the image), and got the result below:
[mage]<https://urldefense.proofpoint.com/v2/url?u=https-3A__cloud.githubusercontent.com_assets_19840621_22030019_58f2f392-2Ddcaa-2D11e6-2D94d6-2D08ac58a061d7.png&d=DQMFaQ&c=IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=3zoP8244p5-bYip1v4EUgQ&m=ZXi-fNaaKORLIqSMauajGHZO-N4T15RfG7QH_un8blA&s=COAu4lr36jpOP4F_a_W2gP-HfUlLlcUFhlLValVrV48&e=>
Those streaks of NaN values suggests to me that the algorithm might use the output values of the previous voxel as initial parameters for the next fitting (AKA 'warm fitting'). This creates streaks of contiguous 'nan' values, as initial starting parameters of [nan, nan] for [ve, ktrans] are unlikely to fit to anything other than [nan, nan] again. This had happened previously in our Python code when we used 'warm fitting' of initial parameters.
I haven't been able to find evidence of this possible warm fitting initial parameter problem in the code yet, but I'm still wrapping my head around the whole structure of the program. Another thing to test and investigate perhaps before releasing..
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_millerjv_PkModeling_pull_39-23issuecomment-2D273228951&d=DQMFaQ&c=IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=3zoP8244p5-bYip1v4EUgQ&m=ZXi-fNaaKORLIqSMauajGHZO-N4T15RfG7QH_un8blA&s=G8hpcZl_2sI9FqzqL3wya0x3AW-OL0dn8G8M0_xW-fk&e=>, or mute the thread<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AATKQlLcbuvmXpI7p9BiWSBuJH1rmBtSks5rTPL0gaJpZM4LjOM9&d=DQMFaQ&c=IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=3zoP8244p5-bYip1v4EUgQ&m=ZXi-fNaaKORLIqSMauajGHZO-N4T15RfG7QH_un8blA&s=fniQIaiM4WY_N_xJuzWp-Q0kurHa9oMwyCestCaW1lI&e=>.
|
Previous indent edits work out of sync with other portions of code.
Added a few lines of code to catch NaN-values while fitting and replace them with new initial parameters. This stops NaN values from overwhelming the image because of warm-fitting.
A random value is now picked during fitting if fitting runs into a NaN value.
Hello! Here are some other interesting comments on this... I tested the prostate data file on the old version of PkModeling. It does not have those dead voxels. It also seems to have consistently lower ktrans values across the board, but has similar relative intensities. I had previously thought that these dead voxels were caused by unfittable noise, but it doesn't seem like that's the case upon closer inspection. The dead voxels seem mostly random - run the algorithm twice and they will show up in different spots, and it is hard to discern visually what is so different about the signal intensity curves they are derived from. On the other hand, they seem to cluster it certain parts of the scan, or certain slices, so there might be something going on there.. Something to think about! I've been prototyping a noise-removal strategy using eigenvalues and PCA in Python, but I'm not sure it applies to whatever is going on with PkModeling. The curves CAN be fit, they just don't get fit sometimes.. |
These dead voxels in the figure above are irrelevant. There is no contrast uptake in the bone or in the air. Those correspond to fitting noise. I am sure anything can be fit, but I am also absolutely sure there is no gain or glory in trying to fit those dead voxels. |
@AndrewBeers : @speled started working on evaluating this, really sorry for the delay, and again not finding time to do this myself.... |
A quick retrofit to a different ITK class. Will need to take care to clean code after an integration method is decided.
Hello! @speled @fedorov @kalpathy I've spent some quality time with PkModeling over the past two days, and have learned some interesting things. Sorry for the delay in uploading - it always seemed like there was another test to run, and then the 3-day weekend came.. The commit above changes the fitting method for proof-of-principle, although that may not be the best way to continue as documented below. Most of the following tests take place on the v6 and v9 tofts phantoms, respectively. Preprocessing MethodsThe motivation for editing PkModeling in the first place was that our group at MGH had developed on off-hand Matlab script that performed very well on the tofts v9 noisy phantom, when compared to existing software packages, in a recent study/competition soon to be submitted for publication. The idea was to apply lessons learned from that program to PkModeling to get it up to speed. There may have been a few things that gave our script such good performance on the noisy phantom (e.g. integration method choice, fitting algorithm choice), but after some more analysis it seems like we probably performed the best simply because we decided to apply a Gaussian blur to the Tofts DRO before analyzing it. This made sense to me at the time, as inspecting the noisy curves as they came in (as well as inspecting curves in actual DCE volumes from our brain data) convinced me that it would be difficult to retrieve an accurate ktrans from any one voxel. It also made sense given that the intensities of most voxels across many imaging modalities are correlated with neighboring voxels. The question with blurring is whether blurring is destroying more information than it is recovering. I've run a brief test on our lab's (QTIM @ MGH) double-baseline study, where ~45 pre-treatment glioblastoma patients came in for two visits separated 3-4 days apart without any intervening treatment. Seeing as their ktrans values should be similar between the two baselines, the repeatability of ktrans values derived from a certain DCE algorithm should be a good indicator of that algorithm's efficacy. Testing our Matlab script with a guassian blur with a sigma of 0 (no blur), .65, and 1.35, I found that repeatability from baseline to baseline for median ktrans was higher in the blurred cases. This suggests that the results from the phantom may carry over into patient data. There are likely 'smarter' ways to do de-noising than simply applying a blur on axial slices such as I've done, but for now this may be a good start when working with patient data. We are currently exploring using PCA analysis on DCE curves within a certain volume to assist in extracting the signal from the noise, but I haven't yet found a way to do that that performs significantly better on the Tofts phantoms than just a Gaussian blur. All of the following analyses on the noisy data were done with a gaussian blur with a sigma of 1. Using unblurred data does not give very good results, and I have found makes it difficult to compare different algorithms. Screenshots below of a sample voxel curve before and after blurring. Integration MethodWe started by changing the integration method, which the above comments document. That gave PkModeling perfect accuracy on the noiseless tofts_v6 DCE phantom - previously PkModeling had struggled with the lowest Ve column. However, it decreased accuracy on the noisy phantoms in all situations except for when Ve is very low (<= .01). This makes sense, seeing as the only section the noiseless phantom had trouble with is the low Ve section. The new integration method seems to be much faster than the old integration method. This makes sense to me, as the old integration method used convolution, which from my understanding should take O(n^2) where n is the length of the time series being analyzed, whereas the new method should be more like O(n). I'm not totally confident in that explanation, but I also don't think it matters too much in the grand scheme of things, because the time differences seem pretty similar for time points < 100. Changing the integration method -- but not changing the fitting algorithm -- leads to seemingly-randomly distributed NaN or otherwise blown up values on both the phantom data and patient data. I had discussed this earlier in the Github chain, and thought that it only happened in areas with low signal, but based on my experiments with the phatnoms it seems to happen everywhere. Clusters of NaN values seem to be near to each other, but totally random from run to run of the algorithm. I'm not sure what causes them to occur, but I do know that the L-M algorithm combined with the new integration method is not very deterministic, and that this did not happen with the old integration method. This is disconcerting in general (each run produces slightly different values), but may also give insight into the blow-up problem. It could be that some starting points, and some ways of traversing the fitting, lead to NaN values, and that some voxels are just unlucky enough to get those starting points. It doesn't explain why NaN values tend to loosely cluster, which is why I had suspected warm-fitting. Regardless, something to think about. As discussed below, changing the fitting algorithm to Nelder-Mead/Simplex/"Amoeba" makes this problem disappear. Changing the Fitting AlgorithmThe next step was to change PkModeling's fitting algorithm. We had previously used the Nelder Mead a.k.a. Simplex a.ka. Amoeba algorithm in our Matlab code, and some small experiments I had done in Python were showing me that most other algorithms, including PkModeling's (Levenberg-Marquardt) did not fair well on the noisy data. This was a bit more of an involved change, because the two optimizers were different ITK classes and so couldn't so couldn't so easily be swapped out for one another. However, after puzzling through it last week, I seem to have gotten an implementation that works for the purposes of fitting ktrans and ve. How it propagates to the rest of PkModeling's features has yet to be investigated and certainly will need to be.. Changing the fitting algorithm did not noticeably change results on the noisy phantom for the old integration method, but did actually make results worse on the noiseless phantom for the old integration method. With the new integration method, it has perfect performance on the noiseless phantom, but has inherited the new integration method's slightly poorer performance on the noisy phantom. For reasons I don't quite understand, it solved the problem of dead voxels with the new integration method (perhaps it just naturally does not fit to NaN values?). It also made the new integration method deterministic, which is reassuring for the purposes of assessing performance. Conclusions..This seems to suggest that for noisy data (i.e. real-world data) we should favor the old/existing method, as it does slightly better on noisy data. However, for purposes of validation, it's a little disconcerting that our old integration method doesn't get perfect values on the noiseless phantom, which is currently being used to assess other DCE scripts (e.g. http://ac.els-cdn.com/S1936523314800193/1-s2.0-S1936523314800193-main.pdf?_tid=2edb0db6-f798-11e6-8437-00000aab0f26&acdnat=1487614547_b2d80746fe3111587fcbe7a8499532fa). Unfortunately, there isn't a method among the different integration and fitting methods described here that has both perfect accuracy on the noiseless phantom and the best performance on the noisy data. I'm unsure if this dynamic is also present in our Python and Matlab versions of the same script. One thing that is reassuring is that both the old method and new method look like they would have probably won the original competition that QTIM participated in with the added blurring for pre-processing. Thus, we may be splitting hairs deciding between the two. From this perspective, it may be best to just get something that works on the noiseless phantom if that's an emerging standard.. Summary table copied below: T1MappingT1Mapping does not seem to work for the tofts phantoms. It usually creates NaN values on ~half or all of the phantom. This happens on both new and old PkModeling - we'll have to figure out why.. There's a chance it could be some pre-processing problem on my end creating a NRRD file of the T1Map, so I will check on that and get back to you with some real patient data. Screenshots below in the meantime from two consecutive runs on a noisy phantom with a T1 Map. Automatic Bolus Arrival Time MethodI haven't done too many rigorous tests on the automatic bolus time method, but it does not get usable results on either the noiseless or noisy phantoms. Jayashree is interested in using PCA to find the bolus arrival time, but I'm not sure if it should be used if a BAT can otherwise be determined. Screenshot of noiseless phantom copied below. Future StepsAs far as measuring Ve goes, our Matlab script and PkModeling (which seem to be equivalent) was closer in accuracy to other submitted software packages. It might be worth looking into figuring out how to better fit Ve. Best, |
Good afternoon! @kalpathy @fedorov @speled This is just a quick comment. One thing I hadn't had the chance to explain is why changing the integration method makes PkModeling work on the perfect phantom as it hadn't before. I wanted to add some quick screenshots from a test I ran in Python to contextualize the difference. All of the images below have the same ktrans value, 0.35, which is the highest value in the phantom. They have different Ve, at 0.01, 0.1, and 0.5 (organized first to last, respectively). The blue line is the observed values, the red line is the values generated with the new integration method using the correct ktrans/ve parameters, and the green line are the values generated using the old integration method. You'll notice that the difference between the red+blue lines and the green lines increases as Ve decreases. This means that the old integration method deviates from the observed/generated values at low Ve. You can click on any image to view it larger. |
As discussed at the QIICR call today, @AndrewBeers will expose both the integration and fitting method as options to the module. |
@AndrewBeers how is your progress exposing these options as module parameters? When do you expect this task to be completed? |
@fedorov Hi Andrey. I haven't yet had a chance to work on it yet this week, but am going to take a few hours now to get started. I suppose it should only be a matter of updating the UI - I'll keep you posted and post an intermediary commit later if there's yet more work to be done. Best, |
integration method. This option still does not apply to the three-parameter model, although I'm not sure if there's any reason they shouldn't. The description for what the integration method means is not very clear currently -- would be worth drafting up an intuitive one-sentence explanation and/or some good documentation. Next step: adding a radio button for the fitting algorithm used. This may be more involved because of the need to integrate two separate itk classes. Other distant ideas to keep in mind: adding pre-processing options, such as blurring and PCA reduction.
PkSolver/PkSolver.h
Outdated
else if (m_ModelType == TOFTS_2_PARAMETER) | ||
{ | ||
|
||
if (m_IntegrationType == "Convolutional") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update your editor settings to use 2 spaces for indentation?
accomplish this, I have basically duplicated huge segments of code at several points in the program. However, this still doesn't work, because some functions expect a certain class type (LMCostFunction and AmoebaCostFunction, for example) and I don't know enough about C++ classes and inheritance to get around that yet. Hopefully, all of this code duplication will be rendered unnesseccary by some smart coding. Seeking help from other C++ developers (Andrey, Steve) soon.
but still have questions about how to do sub/superclasses.
but now have a faulting program with no bovious solution. This may have crossed the boundary into more trouble than it is worth..
Note: check LM and recursive to make sure it works in literally any circumstance.
timer is broken, but I think it may have been that way before. LM + Recursive still gets bad answers, maybe no way around it. Ready for presentation soon.
Choose Fitting Method Option Added
Hello! I just merged a side-branch I had to finalize the changes we had been discussing in the previous steps. These changes add two buttons to the main menu to choose between integration methods and between fitting algorithms. Simplex + Recursive is chosen by default, because it performs well on the As far as I can tell, there are still two unresolved issues:
Otherwise, I've tested on the Prostate data, some Brain DCE data from the RIDER public dataset, and the Tofts phantoms, and it seems to work alright. We discussed on our most recent call adding a Continuous Integration test to make sure that future PkModeling changes don't mess up our performance on the Tofts phantoms (and maybe other phantoms too). Best, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AndrewBeers thanks! I will need a lot more time to review this -- a lot of changes, so for now just few comments/questions throughout the code.
I also noticed there are already tests (see add_test commands in CMakeLists.txt). Can you please confirm that when you run 'make test' the tests are passing? The current tests are using prostate data. Would be great if you could see how that is done, and add one based on DRO in a separate pull request.
Unfortunately, I don't think there is an option to block combinations of parameters. What we can do is to check for this specific combination in the code, and exit immediately with the error and log message without doing any computation. Although, ideally it would be nice to know why it generates NaNs.
@speled can you please test and compare with the 'master' branch?
@@ -18,23 +18,23 @@ | |||
<longflag>T1Blood</longflag> | |||
<label>T1 Blood Value</label> | |||
<channel>input</channel> | |||
<default>1600</default> | |||
<default>1440</default> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you change this?
</float> | ||
<float> | ||
<name>T1PreTissueValue</name> | ||
<description><![CDATA[T1 value for tissue.]]></description> | ||
<longflag>T1Tissue</longflag> | ||
<label>T1 Tissue Value</label> | ||
<channel>input</channel> | ||
<default>1597</default> | ||
<default>1000</default> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this?
CLI/Testing/Cxx/CMakeLists.txt
Outdated
@@ -28,7 +28,7 @@ if(QINPROSTATE001) | |||
--compare ${QINPROSTATE001}/Baseline/phantom-ktrans.nrrd | |||
${TEMP}/${testname}-ktrans.nrrd | |||
ModuleEntryPoint | |||
--T1Tissue 1597 | |||
--T1Tissue 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good - so we already have some tests!
Why did you change the parameter here as well? This is a prostate test dataset, so it definitely should correspond to prostate tissue T1!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fedorov Good morning! This was actually an accident. Before I understood the XML file, I was trying to change the GUI default parameters to make testing easier, and mistakenly changed those bits of code. I can submit a quick commit soon reverting those changes back. Thanks for catching that! And it looks like I now know where to look to make my own tests..
Reverted accidental parameter changes to Prostate tests.
@AndrewBeers sorry for such a delay in reviewing this. I did spend time going over this PR, and I have to say I don't have the resources and confidence to merge it as is, or to make suggestions how to revise this specific PR to make it work. My main problem is that it has convoluted history, commit messages that are not consistent with the actual changes being made, and there are just too many changes, many of which are not consistent with the stated purpose of this PR ("Change the integration method"). Would it be possible to review what was done in this PR, clean up the contributions (I understand you were learning while working on this), and submit a set of new PRs that are cleaner and more concise? I see that you made great progress while adding this functionality, and learned a lot about CLI development, PkModeling and git, so I think it should actually be a great opportunity to revise what was done and take your contribution to the next level! Before doing this, it will be good if you could review the style guidelines for creating meaningful commit messages: we aim to be consistent with https://www.slicer.org/wiki/Documentation/Nightly/Developers/Style_Guide#Commits. Here are specific comments for the individual commits:
Overall comment: it is not a good idea to make your changes to the master branch on your fork, as you do now (unless you want to abandon the original repository and are not interested in propagating the updates in the original repo to your fork). Instead, consider the following workflow:
Keep development localized with branches/PRs focused on a specific and manageable topic. I hope this is not discouraging. I strongly believe this will help with the maintainability and troubleshooting of the code in the longer run. |
@AndrewBeers to make it a bit easier and more concrete, I created a focused set of issues that should be addressed separately (perhaps in this order):
Would be great if you could start addressing them in this order, and altogether they will address and supercede the functionality proposed in this current PR, which I am closing. |
Hi Andrey, First of all -- thank you for all of this advice! If only for my own sake I would really value redoing the PR in a more structured/standard way, let alone how much more tractable it makes things on your end to review. Thanks for setting me up with the new PRs and linking me to the guidelines -- I'll check them out and get cracking. I've been travelling these past few days and haven't had much time to review the code, but will get back to your specific PkModeling comments in their various PRs as soon as I have a chance to sit down and take a look. Thanks again for all your help on this module! |
I changed the integration method used to generate 'predicted' curves in the fitting algorithm. The old method had good performance and is likely faster, but failed on situations with low Ve.
I have not verified this new version on anything but the tofts_v6 DRO yet, so it may be better to wait until we can verify that it looks passable on real-world data or at least a noisy DRO (tofts_v9).