You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
prepare_sim is using a lot of memory right now. In particular, do_Menv_from_tree() uses 100+ GB per process with default settings on an AbacusSummit sim. I think the reason is the r_outer tree query:
For context, we're trying to compute the sum of masses of neighbor halos in two radius apertures and then taking the difference. The inner aperture seems okay, just the outer is using a ton of memory to save all the indices. It finds over 2 billion matches (which should only be ~16 GB, but maybe cause these are Python lists, we're hitting over 100 GB. Still not 100% clear on why...).
Really, the ideal algorithm wouldn't even store the indices. We would just add up the mass on the fly as we encounter a valid pair in the tree query. But scipy.KDTree doesn't seem to support that. Maybe another library does? Or maybe we could use a different Menv metric?
@epaillas, feel free to jump in here too, since this is probably related to #143 (although it isn't explicitly related to whether one is using halo light cones).
prepare_sim
is using a lot of memory right now. In particular,do_Menv_from_tree()
uses 100+ GB per process with default settings on an AbacusSummit sim. I think the reason is ther_outer
tree query:abacusutils/abacusnbody/hod/prepare_sim.py
Line 304 in 229fe9a
I think the problem started in #78, when the default
r_outer
when from 5 to 10. But I can't tell why that value was changed from the PR.@SandyYuan, @boryanah: do you know if we need 10 Mpc/h as the outer radius?
For context, we're trying to compute the sum of masses of neighbor halos in two radius apertures and then taking the difference. The inner aperture seems okay, just the outer is using a ton of memory to save all the indices. It finds over 2 billion matches (which should only be ~16 GB, but maybe cause these are Python lists, we're hitting over 100 GB. Still not 100% clear on why...).
Really, the ideal algorithm wouldn't even store the indices. We would just add up the mass on the fly as we encounter a valid pair in the tree query. But
scipy.KDTree
doesn't seem to support that. Maybe another library does? Or maybe we could use a different Menv metric?Probably related to #143.
The text was updated successfully, but these errors were encountered: