Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the integration method #39

Closed
wants to merge 18 commits into from
Closed

Conversation

AnnaBeers
Copy link
Contributor

I changed the integration method used to generate 'predicted' curves in the fitting algorithm. The old method had good performance and is likely faster, but failed on situations with low Ve.

I have not verified this new version on anything but the tofts_v6 DRO yet, so it may be better to wait until we can verify that it looks passable on real-world data or at least a noisy DRO (tofts_v9).

Changed integration method away from the impulse method. Other things
learned: peak gradient mode does not work on the tofts_v6 DRO. Also,
multiplying your signal conversion by a scalar does not seem to affect
fitting. Begs an obvious question: why then is relaxivity in the signal
conversion equation? Also note the new testing data folder Tofts_DROs.
ArrayType VeTerm;
VeTerm = -Ktrans/Ve*Time;
ValueType deltaT = Time(1) - Time(0);
ArrayType VeTerm;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndrewBeers can you fix the indentation to be consistent with the rest of the code?

@fedorov
Copy link
Collaborator

fedorov commented Jan 13, 2017

@AndrewBeers can you please test with real data and confirm there is no regression?

When I test with a prostate DCE dataset, there is dramatic change in the result - most of the pixels are around 0.1, while before the change, there was more contrast (can't tell if this is scaling or not, since Slicer DataProbe clamps at 1 decimal digit for pixel value probing).

I used default parameters with population AIF. Test DCE series I used is here: http://slicer.kitware.com/midas3/item/126198.
The label is here: qin-prostate-1-label.nrrd.zip

Ktrans map screenshot with the current master:

screenshot 2017-01-13 16 21 02

after your change:

screenshot 2017-01-13 16 23 09

@fedorov fedorov changed the title PkModeling Update Change the integration method Jan 13, 2017
@AnnaBeers
Copy link
Contributor Author

Hm, interesting.. What parameters for PkModeling did you put in? Below are some screenshots that I got from that data and my build.

Below is the ktrans map for a manually-set bolus arrival time at 10 frames in.

image

image

This is the ktrans map for the gradient-determined bolus arrival time option with the same settings. The gradient BAT did not perform too well on the phantom, but still gives semi-reasonable results below.

image

What parameters did you use? I don't know the correct parameters for a prostate case - I just copied over some numbers I used to test the phantom.

It's also clear to me that in both have problems with dead pixels, which might be overcome with pre-processing steps that we've yet to implement in PkModeling. This is especially obvious in the gradient BAT method. The high-ktrans area in the top-right of the image also goes to a value of '5' in my case, which likely signals that the ktrans value is exploding to some very high value in that section and then is clamped to a max value of 5. It would be an interesting area to dissect.. A sample curve from one of the pixels is pasted below - it looks fairly normal. Perhaps this is a consequence of using a population AIF in this case. I can do some more testing soon with the old module to see if it gets similar results!

image

@fedorov
Copy link
Collaborator

fedorov commented Jan 17, 2017

I am pretty sure I tried with the default parameters. I will test again later today.

@AnnaBeers
Copy link
Contributor Author

Here is testing on mine with default parameters. Also just saw the comment about indentation, will fix in a moment.

BAT at 10 frames:

image

Gradient BAT:

image

and parameters:

image

Made my code's indentation match the rest of the code.
@AnnaBeers
Copy link
Contributor Author

There is another interesting point about PkModeling (either newly introduced with my edit or pre-existing, I haven't tested that yet). Those dead voxels I had referenced earlier have values of 'nan.' I assumed this was a problem with random, high-noise curves that the algorithm failed to fit. However, I ran a test using a custom AIF (extracted from a label map in the image), and got the result below:

image

Those streaks of NaN values suggests to me that the algorithm might use the output values of the previous voxel as initial parameters for the next fitting (AKA 'warm fitting'). This creates streaks of contiguous 'nan' values, as initial starting parameters of [nan, nan] for [ve, ktrans] are unlikely to fit to anything other than [nan, nan] again. This had happened previously in our Python code when we used 'warm fitting' of initial parameters.

I haven't been able to find evidence of this possible warm fitting initial parameter problem in the code yet, but I'm still wrapping my head around the whole structure of the program. Another thing to test and investigate perhaps before releasing..

@millerjv
Copy link
Owner

millerjv commented Jan 17, 2017 via email

Previous indent edits work out of sync with other portions of code.
Added a few lines of code to catch NaN-values while fitting and replace
them with new initial parameters. This stops NaN values from
overwhelming the image because of warm-fitting.
A random value is now picked during fitting if fitting runs into a NaN
value.
@AnnaBeers
Copy link
Contributor Author

Hello!

Here are some other interesting comments on this...

I tested the prostate data file on the old version of PkModeling. It does not have those dead voxels. It also seems to have consistently lower ktrans values across the board, but has similar relative intensities.

I had previously thought that these dead voxels were caused by unfittable noise, but it doesn't seem like that's the case upon closer inspection. The dead voxels seem mostly random - run the algorithm twice and they will show up in different spots, and it is hard to discern visually what is so different about the signal intensity curves they are derived from. On the other hand, they seem to cluster it certain parts of the scan, or certain slices, so there might be something going on there.. Something to think about!

I've been prototyping a noise-removal strategy using eigenvalues and PCA in Python, but I'm not sure it applies to whatever is going on with PkModeling. The curves CAN be fit, they just don't get fit sometimes..

image

@fedorov
Copy link
Collaborator

fedorov commented Feb 2, 2017

These dead voxels in the figure above are irrelevant. There is no contrast uptake in the bone or in the air. Those correspond to fitting noise. I am sure anything can be fit, but I am also absolutely sure there is no gain or glory in trying to fit those dead voxels.

@fedorov
Copy link
Collaborator

fedorov commented Feb 16, 2017

@AndrewBeers : @speled started working on evaluating this, really sorry for the delay, and again not finding time to do this myself....

A quick retrofit to a different ITK class. Will need to take care to
clean code after an integration method is decided.
@AnnaBeers
Copy link
Contributor Author

Hello! @speled @fedorov @kalpathy

I've spent some quality time with PkModeling over the past two days, and have learned some interesting things. Sorry for the delay in uploading - it always seemed like there was another test to run, and then the 3-day weekend came.. The commit above changes the fitting method for proof-of-principle, although that may not be the best way to continue as documented below. Most of the following tests take place on the v6 and v9 tofts phantoms, respectively.

Preprocessing Methods

The motivation for editing PkModeling in the first place was that our group at MGH had developed on off-hand Matlab script that performed very well on the tofts v9 noisy phantom, when compared to existing software packages, in a recent study/competition soon to be submitted for publication. The idea was to apply lessons learned from that program to PkModeling to get it up to speed. There may have been a few things that gave our script such good performance on the noisy phantom (e.g. integration method choice, fitting algorithm choice), but after some more analysis it seems like we probably performed the best simply because we decided to apply a Gaussian blur to the Tofts DRO before analyzing it.

This made sense to me at the time, as inspecting the noisy curves as they came in (as well as inspecting curves in actual DCE volumes from our brain data) convinced me that it would be difficult to retrieve an accurate ktrans from any one voxel. It also made sense given that the intensities of most voxels across many imaging modalities are correlated with neighboring voxels. The question with blurring is whether blurring is destroying more information than it is recovering. I've run a brief test on our lab's (QTIM @ MGH) double-baseline study, where ~45 pre-treatment glioblastoma patients came in for two visits separated 3-4 days apart without any intervening treatment. Seeing as their ktrans values should be similar between the two baselines, the repeatability of ktrans values derived from a certain DCE algorithm should be a good indicator of that algorithm's efficacy. Testing our Matlab script with a guassian blur with a sigma of 0 (no blur), .65, and 1.35, I found that repeatability from baseline to baseline for median ktrans was higher in the blurred cases. This suggests that the results from the phantom may carry over into patient data.

There are likely 'smarter' ways to do de-noising than simply applying a blur on axial slices such as I've done, but for now this may be a good start when working with patient data. We are currently exploring using PCA analysis on DCE curves within a certain volume to assist in extracting the signal from the noise, but I haven't yet found a way to do that that performs significantly better on the Tofts phantoms than just a Gaussian blur.

All of the following analyses on the noisy data were done with a gaussian blur with a sigma of 1. Using unblurred data does not give very good results, and I have found makes it difficult to compare different algorithms. Screenshots below of a sample voxel curve before and after blurring.

image

image

Integration Method

We started by changing the integration method, which the above comments document. That gave PkModeling perfect accuracy on the noiseless tofts_v6 DCE phantom - previously PkModeling had struggled with the lowest Ve column. However, it decreased accuracy on the noisy phantoms in all situations except for when Ve is very low (<= .01). This makes sense, seeing as the only section the noiseless phantom had trouble with is the low Ve section.

The new integration method seems to be much faster than the old integration method. This makes sense to me, as the old integration method used convolution, which from my understanding should take O(n^2) where n is the length of the time series being analyzed, whereas the new method should be more like O(n). I'm not totally confident in that explanation, but I also don't think it matters too much in the grand scheme of things, because the time differences seem pretty similar for time points < 100.

Changing the integration method -- but not changing the fitting algorithm -- leads to seemingly-randomly distributed NaN or otherwise blown up values on both the phantom data and patient data. I had discussed this earlier in the Github chain, and thought that it only happened in areas with low signal, but based on my experiments with the phatnoms it seems to happen everywhere. Clusters of NaN values seem to be near to each other, but totally random from run to run of the algorithm. I'm not sure what causes them to occur, but I do know that the L-M algorithm combined with the new integration method is not very deterministic, and that this did not happen with the old integration method. This is disconcerting in general (each run produces slightly different values), but may also give insight into the blow-up problem. It could be that some starting points, and some ways of traversing the fitting, lead to NaN values, and that some voxels are just unlucky enough to get those starting points. It doesn't explain why NaN values tend to loosely cluster, which is why I had suspected warm-fitting.

Regardless, something to think about. As discussed below, changing the fitting algorithm to Nelder-Mead/Simplex/"Amoeba" makes this problem disappear.

Changing the Fitting Algorithm

The next step was to change PkModeling's fitting algorithm. We had previously used the Nelder Mead a.k.a. Simplex a.ka. Amoeba algorithm in our Matlab code, and some small experiments I had done in Python were showing me that most other algorithms, including PkModeling's (Levenberg-Marquardt) did not fair well on the noisy data. This was a bit more of an involved change, because the two optimizers were different ITK classes and so couldn't so couldn't so easily be swapped out for one another. However, after puzzling through it last week, I seem to have gotten an implementation that works for the purposes of fitting ktrans and ve. How it propagates to the rest of PkModeling's features has yet to be investigated and certainly will need to be..

Changing the fitting algorithm did not noticeably change results on the noisy phantom for the old integration method, but did actually make results worse on the noiseless phantom for the old integration method.

With the new integration method, it has perfect performance on the noiseless phantom, but has inherited the new integration method's slightly poorer performance on the noisy phantom. For reasons I don't quite understand, it solved the problem of dead voxels with the new integration method (perhaps it just naturally does not fit to NaN values?). It also made the new integration method deterministic, which is reassuring for the purposes of assessing performance.

Conclusions..

This seems to suggest that for noisy data (i.e. real-world data) we should favor the old/existing method, as it does slightly better on noisy data. However, for purposes of validation, it's a little disconcerting that our old integration method doesn't get perfect values on the noiseless phantom, which is currently being used to assess other DCE scripts (e.g. http://ac.els-cdn.com/S1936523314800193/1-s2.0-S1936523314800193-main.pdf?_tid=2edb0db6-f798-11e6-8437-00000aab0f26&acdnat=1487614547_b2d80746fe3111587fcbe7a8499532fa). Unfortunately, there isn't a method among the different integration and fitting methods described here that has both perfect accuracy on the noiseless phantom and the best performance on the noisy data. I'm unsure if this dynamic is also present in our Python and Matlab versions of the same script.

One thing that is reassuring is that both the old method and new method look like they would have probably won the original competition that QTIM participated in with the added blurring for pre-processing. Thus, we may be splitting hairs deciding between the two. From this perspective, it may be best to just get something that works on the noiseless phantom if that's an emerging standard..

Summary table copied below:

image

T1Mapping

T1Mapping does not seem to work for the tofts phantoms. It usually creates NaN values on ~half or all of the phantom. This happens on both new and old PkModeling - we'll have to figure out why.. There's a chance it could be some pre-processing problem on my end creating a NRRD file of the T1Map, so I will check on that and get back to you with some real patient data. Screenshots below in the meantime from two consecutive runs on a noisy phantom with a T1 Map.

image
image

Automatic Bolus Arrival Time Method

I haven't done too many rigorous tests on the automatic bolus time method, but it does not get usable results on either the noiseless or noisy phantoms. Jayashree is interested in using PCA to find the bolus arrival time, but I'm not sure if it should be used if a BAT can otherwise be determined. Screenshot of noiseless phantom copied below.

image

Future Steps

As far as measuring Ve goes, our Matlab script and PkModeling (which seem to be equivalent) was closer in accuracy to other submitted software packages. It might be worth looking into figuring out how to better fit Ve.

Best,
Andrew

@AnnaBeers
Copy link
Contributor Author

Good afternoon! @kalpathy @fedorov @speled

This is just a quick comment. One thing I hadn't had the chance to explain is why changing the integration method makes PkModeling work on the perfect phantom as it hadn't before. I wanted to add some quick screenshots from a test I ran in Python to contextualize the difference.

All of the images below have the same ktrans value, 0.35, which is the highest value in the phantom. They have different Ve, at 0.01, 0.1, and 0.5 (organized first to last, respectively). The blue line is the observed values, the red line is the values generated with the new integration method using the correct ktrans/ve parameters, and the green line are the values generated using the old integration method.

You'll notice that the difference between the red+blue lines and the green lines increases as Ve decreases. This means that the old integration method deviates from the observed/generated values at low Ve. You can click on any image to view it larger.

high_ve_high_ktrans
mid_ve_high_ktrans
low_ve_high_ktrans

@fedorov
Copy link
Collaborator

fedorov commented Mar 2, 2017

As discussed at the QIICR call today, @AndrewBeers will expose both the integration and fitting method as options to the module.

@fedorov
Copy link
Collaborator

fedorov commented Mar 10, 2017

@AndrewBeers how is your progress exposing these options as module parameters?

When do you expect this task to be completed?

@AnnaBeers
Copy link
Contributor Author

@fedorov Hi Andrey.

I haven't yet had a chance to work on it yet this week, but am going to take a few hours now to get started. I suppose it should only be a matter of updating the UI - I'll keep you posted and post an intermediary commit later if there's yet more work to be done.

Best,
Andrew

integration method. This option still does not apply to the
three-parameter model, although I'm not sure if there's any reason they
shouldn't. The description for what the integration method means is not
very clear currently -- would be worth drafting up an intuitive
one-sentence explanation and/or some good documentation. Next step:
adding a radio button for the fitting algorithm used. This may be more
involved because of the need to integrate two separate itk classes.
Other distant ideas to keep in mind: adding pre-processing options, such
as blurring and PCA reduction.
else if (m_ModelType == TOFTS_2_PARAMETER)
{

if (m_IntegrationType == "Convolutional")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update your editor settings to use 2 spaces for indentation?

accomplish this, I have basically duplicated huge segments of code at
several points in the program. However, this still doesn't work, because
some functions expect a certain class type (LMCostFunction and
AmoebaCostFunction, for example) and I don't know enough about C++
classes and inheritance to get around that yet. Hopefully, all of this
code duplication will be rendered unnesseccary by some smart coding.
Seeking help from other C++ developers (Andrey, Steve) soon.
but still have questions about how to do sub/superclasses.
but now have a faulting program with no bovious solution. This may have
crossed the boundary into more trouble than it is worth..
Note: check LM and recursive to make sure it works in literally any
circumstance.
timer is broken, but I think it may have been that way before. LM +
Recursive still gets bad answers, maybe no way around it. Ready for
presentation soon.
Choose Fitting Method Option Added
@AnnaBeers
Copy link
Contributor Author

@fedorov @speled @kalpathy

Hello!

I just merged a side-branch I had to finalize the changes we had been discussing in the previous steps.

These changes add two buttons to the main menu to choose between integration methods and between fitting algorithms. Simplex + Recursive is chosen by default, because it performs well on the

As far as I can tell, there are still two unresolved issues:

  • Changing the integration method will have no affect on the three-parameter model, because I don't know too much about the three-parameter model and if the same formula change that applied to the two-parameter model can sensibly be transferred over.
  • The combination of the LM Fitting algorithm + the Recursive integration method seems to fit a whole lot of NaN value for reasons I don't understand. If there was some way one could reactively disable that combination on the PkModeling menu so that it couldn't be selected, that might be nice.. It would also be good to figure out why it fits NaN/0 values in the first place.

Otherwise, I've tested on the Prostate data, some Brain DCE data from the RIDER public dataset, and the Tofts phantoms, and it seems to work alright. We discussed on our most recent call adding a Continuous Integration test to make sure that future PkModeling changes don't mess up our performance on the Tofts phantoms (and maybe other phantoms too).

Best,
Andrew

Copy link
Collaborator

@fedorov fedorov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndrewBeers thanks! I will need a lot more time to review this -- a lot of changes, so for now just few comments/questions throughout the code.

I also noticed there are already tests (see add_test commands in CMakeLists.txt). Can you please confirm that when you run 'make test' the tests are passing? The current tests are using prostate data. Would be great if you could see how that is done, and add one based on DRO in a separate pull request.

Unfortunately, I don't think there is an option to block combinations of parameters. What we can do is to check for this specific combination in the code, and exit immediately with the error and log message without doing any computation. Although, ideally it would be nice to know why it generates NaNs.

@speled can you please test and compare with the 'master' branch?

@@ -18,23 +18,23 @@
<longflag>T1Blood</longflag>
<label>T1 Blood Value</label>
<channel>input</channel>
<default>1600</default>
<default>1440</default>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change this?

</float>
<float>
<name>T1PreTissueValue</name>
<description><![CDATA[T1 value for tissue.]]></description>
<longflag>T1Tissue</longflag>
<label>T1 Tissue Value</label>
<channel>input</channel>
<default>1597</default>
<default>1000</default>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this?

@@ -28,7 +28,7 @@ if(QINPROSTATE001)
--compare ${QINPROSTATE001}/Baseline/phantom-ktrans.nrrd
${TEMP}/${testname}-ktrans.nrrd
ModuleEntryPoint
--T1Tissue 1597
--T1Tissue 1000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good - so we already have some tests!

Why did you change the parameter here as well? This is a prostate test dataset, so it definitely should correspond to prostate tissue T1!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fedorov Good morning! This was actually an accident. Before I understood the XML file, I was trying to change the GUI default parameters to make testing easier, and mistakenly changed those bits of code. I can submit a quick commit soon reverting those changes back. Thanks for catching that! And it looks like I now know where to look to make my own tests..

@fedorov
Copy link
Collaborator

fedorov commented May 3, 2017

@AndrewBeers sorry for such a delay in reviewing this.

I did spend time going over this PR, and I have to say I don't have the resources and confidence to merge it as is, or to make suggestions how to revise this specific PR to make it work. My main problem is that it has convoluted history, commit messages that are not consistent with the actual changes being made, and there are just too many changes, many of which are not consistent with the stated purpose of this PR ("Change the integration method").

Would it be possible to review what was done in this PR, clean up the contributions (I understand you were learning while working on this), and submit a set of new PRs that are cleaner and more concise? I see that you made great progress while adding this functionality, and learned a lot about CLI development, PkModeling and git, so I think it should actually be a great opportunity to revise what was done and take your contribution to the next level!

Before doing this, it will be good if you could review the style guidelines for creating meaningful commit messages: we aim to be consistent with https://www.slicer.org/wiki/Documentation/Nightly/Developers/Style_Guide#Commits.

Here are specific comments for the individual commits:

  • f010ecd#diff-b86f20af8d568be803df341092d17d6c
    ** separate addition of the DRO into a separate commit
    ** give the DRO file a better name
    ** include in the commit message what version of DRO it is, where it was downloaded from
    ** add a test and baseline that actually exercises this DRO
    ** make sure you use editor settings that use indentation consistent with the code style from the beginning (2 spaces and soft tabs)
  • 3ee8cdc
    ** can you provide more details about your hypothesis about warm fitting? From what I see in the code, initial values are fixed: https://github.com/millerjv/PkModeling/blob/master/PkSolver/PkSolver.cxx#L73-L74. If there is no evidence for warm fitting, let's not confuse the developer with these messages referring to it.
    ** all in all, I don't see how this commit helps, since it is misleadingly assigns arbitrary values for Ktrans and Ve instead of NaN
  • cfd0488
    ** this one I don't get at all - you are assigning random value to the output map pixel?! Goes into the same bucket as the previous commit
  • 6466688
    ** indentation is messed up again
    ** this commit changes should be together with the later ones on the same topic
  • fbf24df
    ** you changed the default values not only in the test, but in the xml files, this should not be done (or should be justified)
    ** this commit changes should be together with the later ones on the same topic
  • 08fae37
    ** in this commit, you changed indentation for all of the files, which makes it extremely difficult to review; if you want to fix indentation, make a separate PR doing just that, and confirming no other changes to the code were made. It does not belong to a PR changing the functionality of the code.
    ** this commit and the next few - consolidate your experiments and debugging to the final set of changes to the code

Overall comment: it is not a good idea to make your changes to the master branch on your fork, as you do now (unless you want to abandon the original repository and are not interested in propagating the updates in the original repo to your fork). Instead, consider the following workflow:

  1. Fork "clean" version of master to your account
  2. To propose a PR: make a branch from the master, give it some intuitive short name, make changes and commit to that branch, make a PR
  3. If master branch changed since the time you made a branch, update master locally, use "git rebase" on your PR branch to sync, resolve conflicts manually if necessary; use "git push -f" to force-push updated branch to github

Keep development localized with branches/PRs focused on a specific and manageable topic.

I hope this is not discouraging. I strongly believe this will help with the maintainability and troubleshooting of the code in the longer run.

@fedorov
Copy link
Collaborator

fedorov commented May 3, 2017

@AndrewBeers to make it a bit easier and more concrete, I created a focused set of issues that should be addressed separately (perhaps in this order):

Would be great if you could start addressing them in this order, and altogether they will address and supercede the functionality proposed in this current PR, which I am closing.

@fedorov fedorov closed this May 3, 2017
@AnnaBeers
Copy link
Contributor Author

Hi Andrey,

First of all -- thank you for all of this advice! If only for my own sake I would really value redoing the PR in a more structured/standard way, let alone how much more tractable it makes things on your end to review. Thanks for setting me up with the new PRs and linking me to the guidelines -- I'll check them out and get cracking.

I've been travelling these past few days and haven't had much time to review the code, but will get back to your specific PkModeling comments in their various PRs as soon as I have a chance to sit down and take a look. Thanks again for all your help on this module!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants