-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inverse table improvements #388
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, @daviesje. Just a few small comments and I think we'll be gtm.
SAMPLE_METHOD: int, optional | ||
The sampling method to use in the halo sampler when calculating progenitor populations: | ||
0: Mass-limited CMF sampling, where samples are drawn until the expected mass is reached | ||
1: Number-limited CMF sampling, where we select a number of halos from the Poisson distribution | ||
and then sample the CMF that many times | ||
2: Sheth et al 1999 Partition sampling, where the EPS collapsed fraction is sampled (gaussian tail) | ||
and then the condition is updated using the conservation of mass. | ||
3: Parkinsson et al 2008 Binary split model as in DarkForest (Qiu et al 2021) where the EPS merger rate | ||
is sampled on small internal timesteps such that only binary splits can occur. | ||
NOTE: Sampling from the density grid will ALWAYS use number-limited sampling (method 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the python-side parameter either a string or an enum? This is more explicit. It's harder to read code that just has SAMPLE_METHOD=2
rather than SAMPLE_METHOD=samplers.partition
or SAMPLE_METHOD='partition'
src/py21cmfast/inputs.py
Outdated
AVG_BELOW_SAMPLER: bool, optional | ||
When switched on, an integral is performed in each cell between the minimum source mass and SAMPLER_MIN_MASS, | ||
effectively placing the average halo population in each HaloBox cell below the sampler resolution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if it's not true? Those masses are just ignored totally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would imagine that most of the time SAMPLER_MIN_MASS
would be set low enough so that the averaging is unnecessary, and that this would be for saving memory on larger boxes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool -- I think a sentence describing that would be helpful
src/py21cmfast/inputs.py
Outdated
HALOMASS_CORRECTION: float, optional | ||
This provides a corrective factor to the mass-limited (SAMPLE_METHOD==0) sampling, which multiplies the | ||
expected mass from a condition by this number. | ||
This also is used in the partition (SAMPLE_METHOD==2) sampler, multiplying sigma(M) of each sample drawn. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the usefulness of this option? Why is its default 0.9?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mass-limited sampler has a slight bias toward having too many halos, this bias is independent of delta or halo mass, so this factor is a correction to that. The value of 0.9, multiplying the expected collapsed mass from each descendant works well for the default timestep factor of 1.02. Since the partition method also uses a single correction factor (Mcquinn+ 2007 lowers sigma_8 slightly, I've implemented a correction in nu) we re-use the parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Then I think this should be documented here (i.e. probably don't touch this unless you have good reason, and these are the values it should take in these circumstances)
//Pending a serious deep-dive into this algorithm, I will force DexM to use the fitted parameters to the | ||
// Sheth-Tormen mass function (as of right now, We do not even reproduce EPS results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment still true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it would take some time / code archaeology to fully understand why the excursion set halo finder gets the results it does, and find suitable corrections for any given CMF.
src/py21cmfast/src/Stochasticity.c
Outdated
//fudge factor for assuming that internal lagrangian volumes are independent | ||
exp_M *= user_params_stoc->HALOMASS_CORRECTION; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment and code are unclear to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the implementation of the above correction in the mass-limited sampling. One of the approximations in our sample is that we sample the final CMF in an uncorrelated way, assuming each sampled halo is independent of the others. This means that the CMF doesn't change after each sample, like it does in the partition or binary split methods. This results in a bias where the last halo sampled is on average larger.
I'll change the comment to something more generic and expand the explanation in inputs.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
def _get_enum_property(self, prop, enum_list, propname=""): | ||
""" | ||
Retrieve a value for a property with defined enum list (see UserParams._power_models etc.). | ||
|
||
Arguments | ||
--------- | ||
prop: the hidden attribute to find in the enum list | ||
enum_list: the list of parameter value strings | ||
propname: the name of the property (for error messages) | ||
|
||
Returns | ||
------- | ||
The index of prop within the parameter list, corresponding to it's integer value in | ||
the C backend | ||
""" | ||
# if it's a string we grab the index of the list | ||
if isinstance(prop, str): | ||
val = enum_list.index(prop.upper()) | ||
# otherwise it's a number so we leave it alone | ||
else: | ||
val = prop | ||
|
||
try: | ||
val = int(val) | ||
except (ValueError, TypeError) as e: | ||
raise ValueError(f"{val} is an invalid value for {propname}") from e | ||
|
||
if not 0 <= val < len(enum_list): | ||
raise ValueError(f"HMF must be an int between 0 and {len(enum_list) - 1}") | ||
|
||
return val | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the merge, do you think this is a good way to implement the enums? I found trying to work with the python enum.IntEnum a bit more cumbersome
This PR mainly contains improvements to the HMF tables used in the halo sampler.
For the inverse CDF tables used in the Sampling:
The Sheth-Tormen conditional mass function has been fixed and optimized (fixing #370), The Delos+24 conditional mass function is available for the default case but does not perform well in the sampler due to the small timestep.
Alternate sampling methods (Sheth+1999 Partition and Parkinson+08 binary split) have been fixed (#374), they are less conforming to the given CMF but as far as I can tell they are working.
Old global parameters have been removed and many sampler related global parameters have been moved to the input structures.