Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to specify maximum memory a pony program is allowed to use #3282

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

dipinhora
Copy link
Contributor

Prior to this commit, a pony program would keep growing in terms of
memory use until the OS would be unable to allocate more virtual
memory.

This commit adds a new --ponymaxmem command line argument to limit
the maximum amount of dynamically allocated memory a pony program is
allowed to use before the runtime refuses to allocate any more. The
limit enforced is in MB and does not exactly match the RSS of a
process in linux due to memory allocated by the process outside of
the pony pool allocator. Additionally, GC is now triggered for actors
if the application is close to having allocated all of the memory it
is allowed and it doesn't have much free in the pool allocator itself.


The following program:

actor Main
  new create(env: Env) =>
    do_stuff()

  fun do_stuff() =>
    var i: USize = 200
    while(i > 0) do
      let z = Array[U8].init(1, 1024*1024)
      i = i - 1
    end

Uses about 210 MB when run:

vagrant@ubuntu-bionic:~/dhp2$ /usr/bin/time ./maxmemtest
0.05user 0.17system 0:00.19elapsed 111%CPU (0avgtext+0avgdata 209136maxresident)k
0inputs+0outputs (0major+51810minor)pagefaults 0swaps

Results in an error when limited to 100 MB with the new --ponymaxmem option once it tries to allocated more than it is allowed:

vagrant@ubuntu-bionic:~/dhp2$ /usr/bin/time ./maxmemtest --ponymaxmem 100
ERROR: Memory limit reached! Allocating 32768 would use more than 104857600 allowed!
Command terminated by signal 6
0.05user 0.08system 0:00.46elapsed 30%CPU (0avgtext+0avgdata 103672maxresident)k
0inputs+0outputs (0major+25472minor)pagefaults 0swaps

@@ -524,7 +525,12 @@ void ponyint_heap_used(heap_t* heap, size_t size)

bool ponyint_heap_startgc(heap_t* heap)
{
if(heap->used <= heap->next_gc)
// don't run GC if there's no heap used
if(heap->used == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and no. the check for heap->used == 0 needs to be done for when ponyint_pool_mem_pressure() == true. I put it as a separate thing but I can instead add it later on as part of the ponyint_pool_mem_pressure() condition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right that makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this were to remain as is, I think an explanation of why it's needed would be important. This also makes me favor the "do this as capping next_gc" approach I put forth.

return false;

// don't run GC if haven't reached next threshold and aren't under memory pressure
if((heap->used <= heap->next_gc) && !ponyint_pool_mem_pressure())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if instead of tackling in this fashion, we capped next_gc?

this might be a bad idea as it is overloading the meaning of next_gc otoh, it literally is what we are doing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. How exactly would we cap next_gc in relation to what ponyint_pool_mem_pressure() checks for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using "pool_mem_pressure" when setting a new next_gc value, don't allow it to be set to more than the value which currently triggers true from ponyint_pool_mem_pressure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okey. Thanks for clarifying. I'm not sure that accomplishes the same thing. It is very likely some actors will get their next_gc set prior to other actors allocating more memory that the next_gc would be above the value that triggers true from ponyint_pool_mem_pressure for the some actors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right. my idea makes no sense. i blame the migraine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

migraine no good. 8*/

{
// Make GC happen if pony has allocated more than 95% of the memory allowed
// and pool_block_header cache is less than 5% of memory allowed
// TODO: this currently does not account for any free blocks of memory tracked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this should be a TODO.

I think a note that it doesn't take the free blocks into account is fine.

TODO says "we should change" which I don't believe is the purpose of this.

I think noting how it works is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it to a note. The reason I put it as a TODO is because of the maybe it should? part. I think the TODO is the open question whether those blocks should be accounted for or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make a decision on if we want to account for free block as part of this PR if that is what you mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we can make a decision as part of the PR that would be great. If not, I would want the TODO as a reminder that the decision needs to be made.

@SeanTAllen
Copy link
Member

I'm not sure about this.

Reason:

The OS provides why to set a max for memory usage for a process. Why "duplicate" that in Pony runtime?

I do like "once we hit X amount of memory, GC" so its more like an attempt to stay within a limit and less like an OOM killer or OS functionality for limiting memory usage.

OTOH, the JVM does have a max heap size and will exit with an out of memory error if exceeded so it's not totally unheard of but I believe that was introduced before OSes had the level of resource control that exists now.

@dipinhora
Copy link
Contributor Author

That's a valid point. I added it because it was something I felt would be useful.

I have no issues with not including this if the decision is that the OS functionality is sufficient and shouldn't be duplicated in pony.

@SeanTAllen
Copy link
Member

I have no strong opinions either way @dipinhora. I think it would be good to collect feedback from others. I would lean (slightly) towards "don't have the abort functionality".

@aturley
Copy link
Member

aturley commented Aug 20, 2019

I like the idea of giving the user a way to specify a point where we increase the GC activity. I don't know if we need to tie it to killing the process. I'd be in favor of removing the process killing part and leaving that as something that can be handled by limiting the resources at the OS level or container level. If we need to add the killing back in for systems that don't support specifying resource limits then maybe we could do that as a separate flag.

@slfritchie
Copy link
Contributor

I don't have a lot of experience with the setrlimit(3) categories that are relevant/overlap with the intent of the limit in this PR, e.g., https://linux.die.net/man/2/setrlimit with RLIMIT_DATA, RLIMIT_RSS, and RLIMIT_STACK.

One thing that's missing (to my naïve eyes) is a limit on the number anonymous mmap(3) regions, which is another common method of allocating memory for user-space app general use. AFAIK there isn't a setrlimit(3) category for this style of memory allocation.

@dipinhora
Copy link
Contributor Author

@slfritchie i don't either. not sure if there was a specific concern you wanted to raise but i wasn't able to understand it. can you please clarify?

@slfritchie
Copy link
Contributor

In the weekly Pony meeting today, @SeanTAllen wondered if existing setrlimit and perhaps other container'ish resource limit mechanisms were sufficient to limit memory growth in the same way that the JVM's heap size limit is (mostly) an effective ceiling. The anonymous mmap'ed regions are a thing that AFAIK don't have an OS-enforced limit, so ... my guess for answering Sean would be "no, they aren't sufficient".

@dipinhora
Copy link
Contributor Author

Thanks for clarifying. I'm not sure how anonymous mmap and rss interact in detail but my naive understanding is that pages only get counted as part of rss when they get loaded for read/write and before being loaded they are only counted as part of the virtual size.

@sylvanc
Copy link
Contributor

sylvanc commented Aug 27, 2019

Two points:

  • @slfritchie 's man page link is great because it identifies that doing this with rlimit is actually tricky, as none of the categories are the same as heap block usage.
  • This introduces an atomic operation on every block allocation, which is a runtime cost across all programs, whether or not they use the feature.

@dipinhora
Copy link
Contributor Author

@sylvanc Minor clarification:

  • atomic operations are only added on each block allocation the first time a block is allocated by a program. Subsequent allocations of a block after it is freed do not add atomic operations (but do add a branch).

Yes, this does add overhead for all programs regardless of whether they utilize the feature or not.

Maybe it should be a compile time option instead? Or, if an additional branch in the block allocation code is acceptable, there can be a runtime check to only do the atomic operation if this option is specified on the command line.

@sylvanc
Copy link
Contributor

sylvanc commented Sep 3, 2019

The compile-time option means needing an alternate libponyrt available to compile against, which is of course do-able, but seems unfortunate (and not sustainable for multiple features like this).

I suspect an additional branch on block allocation would not have a noticeable performance impact, so that seems like an attractive option. Do we have a performance test that might stress block allocation?

@dipinhora
Copy link
Contributor Author

PR updated to add in the branch to decide if tracking memory allocated should occur or not. Additionally, added two more command line arguments: --ponyaggressivegcthreshold and --ponyaggressivegcfactor for kicking in aggressive gc thresholds when passing a set memory threshold.


Disclaimer: Not run in a controlled environment.

The following are some comparisons using message-ubench that in theory illustrate something or other about the impact of the additional branch due to the memory tracking changes in this PR (although I have no clue what given the noisy environment).

The original message-ubench with ponyc master:

./message-ubench.original.master --report-count 25 --ponypin

# pingers 8, report-interval 10, report-count 25, initial-pings 5
time,run-ns,rate
1567923537.831455251,999410362,15768113
1567923538.830009460,998001308,16650860
1567923539.829964619,995972323,15723735
1567923540.829443845,998967628,17240416
1567923541.828359036,998161950,14621754
1567923542.827840234,999021842,17555190
1567923543.826687780,998758589,15799816
1567923544.826394552,999357373,17709873
1567923545.826382060,999490410,15232401
1567923546.825423673,998701847,17018812
1567923547.824139393,995645062,16304457
1567923548.823595807,999345046,15702659
1567923549.823088050,999396036,14357331
1567923550.821969654,998078196,14748542
1567923551.822014243,999984275,16638983
1567923552.821238461,998504728,14841336
1567923553.821026302,998782140,15400060
1567923554.819725190,998588660,14633975
1567923555.825985232,1005873526,15703745
1567923556.818499526,992446854,16484741
1567923557.817232981,997908947,16690358
1567923558.816937187,998404316,15588583
1567923559.815680985,992680578,17740076
1567923560.814622435,997595916,15530830
1567923561.815129923,999823663,18032593

The original message-ubench with the runtime changes from this PR:

./message-ubench.original.memlimit --report-count 25 --ponypin

# pingers 8, report-interval 10, report-count 25, initial-pings 5
time,run-ns,rate
1567923562.830545819,1000229771,13268720
1567923563.829260792,997864981,16935438
1567923564.828925257,998926921,15163030
1567923565.828532311,998642270,17390240
1567923566.828558061,999520507,15583276
1567923567.826564021,997546419,17930991
1567923568.826635911,999975272,14117814
1567923569.831023743,1004038641,18315219
1567923570.825282601,993902403,14243993
1567923571.824250961,998168194,17253984
1567923572.823242503,998112097,16149923
1567923573.834709910,1010592324,15418487
1567923574.822190220,987356093,16084250
1567923575.820996689,998090276,15144689
1567923576.820824315,999521829,18006007
1567923577.820010424,998758076,14975282
1567923578.818739704,998595583,16279405
1567923579.818635942,999783035,14036615
1567923580.817207456,998448864,16518856
1567923581.816828957,999218409,15777874
1567923582.815802499,998528334,16439202
1567923583.815528429,998938954,14843651
1567923584.814986853,998592437,17489379
1567923585.814686041,998746000,14023844
1567923586.813651837,998421952,15158054

The original message-ubench with the runtime changes from this PR with --ponymaxmem set:

./message-ubench.original.memlimit --report-count 25 --ponypin --ponymaxmem 4000

# pingers 8, report-interval 10, report-count 25, initial-pings 5
time,run-ns,rate
1567923587.819962035,999676261,13251967
1567923588.818776184,998418010,15778953
1567923589.818551979,999629498,16095546
1567923590.817445527,997834291,15015457
1567923591.817023314,998575204,15665724
1567923592.816748156,998794138,15343411
1567923593.816026801,998892395,17351001
1567923594.816100900,999292208,12344015
1567923595.814308126,997452090,16618209
1567923596.813890614,999469350,14501167
1567923597.813342459,999332486,17124226
1567923598.812147154,998472649,13478288
1567923599.811889649,998797744,17168845
1567923600.810615098,997797656,15329448
1567923601.810978434,1000248778,15952020
1567923602.809578162,997640232,15382972
1567923603.809045266,999346715,14537417
1567923604.807775486,998601316,16490266
1567923605.808048935,999408060,16022236
1567923606.807229372,999058390,16290202
1567923607.805959253,998466844,14460792
1567923608.805343263,999257383,16672081
1567923609.804010392,998551871,14748875
1567923610.803687959,998981054,16941862
1567923611.803098465,999250881,15446606

The modified message-ubench from this PR with ponyc master:

./message-ubench.modified.master --report-count 25 --ponypin

# pingers 8, report-interval 10, report-count 25, initial-pings 5, send-array false, array-size 512, hold-array-pings 10
time,run-ns,rate
1567923612.808367754,999778084,14491241
1567923613.808412159,999954899,15597978
1567923614.807970373,999210937,15895717
1567923615.807290119,998480609,16112585
1567923616.806371083,998592622,15940753
1567923617.805301197,997915147,16737521
1567923618.804818532,998747872,15555259
1567923619.803442086,998477651,16791076
1567923620.802990117,999441132,14661680
1567923621.802295369,998704113,17108301
1567923622.802244181,999591573,13372280
1567923623.800731815,997973971,17241555
1567923624.800244898,999393712,14903067
1567923625.799043322,998468255,16732013
1567923626.799874410,1000650542,13775535
1567923627.797405603,996797302,16318917
1567923628.797410128,999889613,15484476
1567923629.797443805,999903559,15992685
1567923630.795404343,997623460,15811444
1567923631.795934353,999807947,14545580
1567923632.795061715,998336816,16673955
1567923633.793812922,998547599,14328041
1567923634.793373419,999046967,16694662
1567923635.792297226,998838712,15162995
1567923636.791651680,995994818,16395877

The modified message-ubench from this PR with the runtime changes from this PR:

./message-ubench.modified.memlimit --report-count 25 --ponypin

# pingers 8, report-interval 10, report-count 25, initial-pings 5, send-array false, array-size 512, hold-array-pings 10
time,run-ns,rate
1567923637.823488511,999494834,13525261
1567923638.822945565,998145820,15446394
1567923639.821513329,996951538,14435883
1567923640.821045834,999393754,14154800
1567923641.821165097,999983836,13832578
1567923642.827406469,1005634765,14196142
1567923643.819697278,991565138,14620801
1567923644.818144730,997481820,14599167
1567923645.817621752,998619139,13398203
1567923646.817307180,998711827,15936472
1567923647.815919925,998068287,14853885
1567923648.830176624,1013277596,15801351
1567923649.815156913,984702746,13958477
1567923650.814110788,998824057,15376613
1567923651.814120619,999885491,16097099
1567923652.812682976,998429106,15445074
1567923653.812230579,999457198,16515370
1567923654.812362191,999336735,13247302
1567923655.810938819,998061169,16598014
1567923656.810399108,999044915,14411145
1567923657.809168791,998630414,16566727
1567923658.808641844,999353823,13559280
1567923659.807768074,999009111,15932333
1567923660.806751701,998231448,14749849
1567923661.806306470,999489029,15448960

The modified message-ubench from this PR with the runtime changes from this PR with --ponymaxmem set:

./message-ubench.modified.memlimit --report-count 25 --ponypin --ponymaxmem 4000

# pingers 8, report-interval 10, report-count 25, initial-pings 5, send-array false, array-size 512, hold-array-pings 10
time,run-ns,rate
1567923662.816948716,999773572,13503388
1567923663.817069868,999982869,14120704
1567923664.815611434,998409863,15170565
1567923665.815994203,1000230352,14842151
1567923666.814556362,998319209,15039930
1567923667.813526142,997357702,15247304
1567923668.813379868,999701409,16120229
1567923669.813055721,999617516,13679441
1567923670.812067858,998450937,11920865
1567923671.810534109,997893378,13304460
1567923672.810590308,999941424,14744304
1567923673.814418830,1003709814,13185540
1567923674.823680420,1009102039,16208000
1567923675.808800219,985006293,14358442
1567923676.809327985,1000364294,16843222
1567923677.806782946,997348346,14930390
1567923678.810077139,1002287391,16806720
1567923679.805650163,995453364,14570189
1567923680.804705816,998927150,14774421
1567923681.804195916,999376316,14547321
1567923682.802972315,998183383,16005351
1567923683.802613845,995071443,15286248
1567923684.801345173,991703006,15184498
1567923685.801403502,999220030,16371559
1567923686.800544230,998347362,15141388

The modified message-ubench from this PR with ponyc master sending arrays as part of the pings:

./message-ubench.modified.master --report-count 25 --ponypin --send-array --array-size 1025

# pingers 8, report-interval 10, report-count 25, initial-pings 5, send-array true, array-size 1025, hold-array-pings 10
time,run-ns,rate
1567923687.816683456,999963031,2216329
1567923688.816203591,984519816,2027309
1567923689.816006206,993437609,2158388
1567923690.814522419,997571833,2001368
1567923691.816125652,1000634257,2311686
1567923692.812471566,996061904,1978026
1567923693.812206193,999207411,2265433
1567923694.811266878,998070968,1896740
1567923695.810755478,995919215,2342799
1567923696.810912345,995037346,1927940
1567923697.809135721,997571185,2186228
1567923698.808892232,989469172,2006365
1567923699.807646010,998323157,2138039
1567923700.807360932,998857182,2013513
1567923701.806587963,998744519,2132971
1567923702.805561718,998413410,2118608
1567923703.805765516,1000020049,1862906
1567923704.804475845,998240665,2247857
1567923705.803769365,998241023,1960340
1567923706.803181053,998639429,2235450
1567923707.802997400,998696836,1925430
1567923708.801151830,997175434,2205759
1567923709.801150913,999818498,1913114
1567923710.800478602,999185885,2240674
1567923711.799538445,998936178,2057453

The modified message-ubench from this PR with the runtime changes from this PR sending arrays as part of the pings:

./message-ubench.modified.memlimit --report-count 25 --ponypin --send-array --array-size 1025

# pingers 8, report-interval 10, report-count 25, initial-pings 5, send-array true, array-size 1025, hold-array-pings 10
time,run-ns,rate
1567923712.811718267,1003431922,2211201
1567923713.807045382,994271913,2120498
1567923714.806931956,999771882,2022754
1567923715.805534080,998480079,2135201
1567923716.805522719,999100023,2134641
1567923717.804772977,998393243,2292197
1567923718.804320352,999163200,1934062
1567923719.803640111,998708850,2346029
1567923720.802875298,998400285,1949067
1567923721.801398909,998321287,2265917
1567923722.801113300,999619192,2081758
1567923723.800534852,998660796,2266624
1567923724.800041722,998720487,2197806
1567923725.798591191,997671859,2157325
1567923726.798223719,999209199,2407710
1567923727.798097785,999009191,1975289
1567923728.797120622,998256031,2341300
1567923729.795783910,998263751,2096190
1567923730.795818776,999281314,2180225
1567923731.794711373,998024070,1836437
1567923732.794235099,999333859,1986473
1567923733.793577431,998788194,2025087
1567923734.792237772,998077643,2309181
1567923735.791773316,998778510,2120484
1567923736.798435345,1006280508,2183924

The modified message-ubench from this PR with the runtime changes from this PR sending arrays as part of the pings with --ponymaxmem set:

./message-ubench.modified.memlimit --report-count 25 --ponypin --send-array --array-size 1025 --ponymaxmem 4000

# pingers 8, report-interval 10, report-count 25, initial-pings 5, send-array true, array-size 1025, hold-array-pings 10
time,run-ns,rate
1567923737.829375226,1000053280,2210369
1567923738.836320270,1006034556,2226500
1567923740.139200581,1301514760,2361002
1567923740.832443569,692323410,2014269
1567923741.826867550,993678545,2355149
1567923742.826149293,998305940,2026812
1567923743.824710029,997437169,2327756
1567923744.824499056,998821714,2072710
1567923745.824561826,999278981,2355538
1567923746.823055731,998338904,2104714
1567923747.822076312,998052628,2176996
1567923748.822192888,999688066,2188871
1567923749.820619164,997617405,2176871
1567923750.820572317,998992790,2191364
1567923751.819728790,998353918,2049618
1567923752.818327524,998092288,2245298
1567923753.818290002,999749353,2132566
1567923754.816819041,998375826,2267647
1567923755.817331969,1000393194,2075978
1567923756.816444276,998654884,2388446
1567923757.815863874,999276085,2172916
1567923758.814453376,996575120,2275918
1567923759.813744193,999118334,2223516
1567923760.812867050,994060077,2156403
1567923761.811918961,998925116,2269623

Prior to this commit, a pony program would keep growing in terms of
memory use until the OS would be unable to allocate more virtual
memory.

This commit adds a new `--ponymaxmem` command line argument to limit
the maximum amount of dynamically allocated memory a pony program is
allowed to use before the runtime refuses to allocate any more. The
limit enforced is in MB and does not exactly match the RSS of a
process in linux due to memory allocated by the process outside of
the pony pool allocator.
This commit adds two new command line arguments to allow for
enabling aggressive GC when the program is using more than a set
threshold of memory. The new command line options are
`--ponyaggressivegcthreshold` and `--ponyaggressivegcfactor`.
This commit enhances the message-ubench example application to be
able to send arrays of arbitrary sizes as part of pings from one
actor to another. It also includes the ability to specify how long
a receiving actor should hold on to arrays before freeing them for
GC.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants