Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pg crashes when a process both monitor and join pg group #7625

Closed
zzydxm opened this issue Sep 5, 2023 · 3 comments · Fixed by #7659
Closed

pg crashes when a process both monitor and join pg group #7625

zzydxm opened this issue Sep 5, 2023 · 3 comments · Fixed by #7659
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Milestone

Comments

@zzydxm
Copy link
Contributor

zzydxm commented Sep 5, 2023

Describe the bug
This will crash:

pg:start_link(), spawn(fun() -> pg:monitor(a), pg:join(a,self()) end).

=ERROR REPORT==== 5-Sep-2023::09:05:49.097826 ===
** Generic server pg terminating
** Last message in was {'DOWN',#Ref<0.1929657013.2338848804.134906>,process,
                               <0.107.0>,normal}
** When Server state == {state,pg,
                               #{<0.107.0> =>
                                     {#Ref<0.1929657013.2338848804.134908>,
                                      [a]}},
                               #{},#{},
                               #{#Ref<0.1929657013.2338848804.134906> =>
                                     {<0.107.0>,a}},
                               #{a =>
                                     [{<0.107.0>,
                                       #Ref<0.1929657013.2338848804.134906>}]}}
** Reason for termination ==
** {{case_clause,{{#Ref<0.1929657013.2338848804.134908>,[a]},#{}}},
    [{pg,handle_info,2,[{file,"pg.erl"},{line,407}]},
     {gen_server,try_handle_info,3,[{file,"gen_server.erl"},{line,1077}]},
     {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,1165}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}

<0.107.0>
=CRASH REPORT==== 5-Sep-2023::09:05:49.098435 ===
  crasher:
    initial call: pg:init/1
    pid: <0.106.0>
    registered_name: pg
    exception error: no case clause matching
                     {{#Ref<0.1929657013.2338848804.134908>,[a]},#{}}
      in function  pg:handle_info/2 (pg.erl, line 407)
      in call from gen_server:try_handle_info/3 (gen_server.erl, line 1077)
      in call from gen_server:handle_msg/6 (gen_server.erl, line 1165)
    ancestors: [<0.105.0>,<0.88.0>,<0.70.0>,<0.65.0>,<0.69.0>,<0.64.0>,
                  kernel_sup,<0.47.0>]
    message_queue_len: 1
    messages: [{'DOWN',#Ref<0.1929657013.2338848804.134908>,process,
                          <0.107.0>,normal}]
    links: [<0.105.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 376
    stack_size: 28
    reductions: 7576
  neighbours:
    neighbour:
      pid: <0.105.0>
      registered_name: []
      initial_call: {erlang,apply,2}
      current_function: {io,execute_request,3}
      ancestors: [<0.88.0>,<0.70.0>,<0.65.0>,<0.69.0>,<0.64.0>,kernel_sup,
                  <0.47.0>]
      message_queue_len: 0
      links: [<0.88.0>,<0.106.0>]
      trap_exit: false
      status: waiting
      heap_size: 987
      stack_size: 27
      reductions: 4407
      current_stacktrace: [{io,execute_request,3,[{file,"io.erl"},{line,607}]},
                  {shell,exprs,7,[{file,"shell.erl"},{line,787}]},
                  {shell,eval_exprs,7,[{file,"shell.erl"},{line,736}]},
                  {shell,eval_loop,4,[{file,"shell.erl"},{line,721}]}]
** exception error: no case clause matching {{#Ref<0.1929657013.2338848804.134908>,[a]},#{}}
     in function  pg:handle_info/2 (pg.erl, line 407)
     in call from gen_server:try_handle_info/3 (gen_server.erl, line 1077)
     in call from gen_server:handle_msg/6 (gen_server.erl, line 1165)
     in call from proc_lib:init_p_do_apply/3 (proc_lib.erl, line 241)

Expected behavior
pg should not crash

Affected versions
All version that includes the new pg

Additional context
Add one more case clause here should work: https://github.com/erlang/otp/blob/master/lib/kernel/src/pg.erl#L407

@zzydxm zzydxm added the bug Issue is reported as a bug label Sep 5, 2023
@max-au
Copy link
Contributor

max-au commented Sep 7, 2023

Thanks for reporting, let me come up with a test case and a fix!

@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Sep 8, 2023
@max-au
Copy link
Contributor

max-au commented Sep 15, 2023

I have a test case now. If the same process joins a group, and also starts monitoring a group (or a scope), it works fine. But when such process gets terminated without leaving all groups (or stopping monitoring), pg assumes that a process is either a monitor, or a joined process.

Fix is technically trivial, upon getting {DOWN, ...} message check both monitors and locally joined processes. It may incur a small performance fee, @zzydxm would you be able to test it at scale?

@michalmuskala
Copy link
Contributor

Could this be solved by using custom monitor flags? That way you could know which part to check on message receipt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants