-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Andoird] ANR if app closed and opened too quickly #7567
Comments
Can you put more logcat, before there should be relevant sdl message ... |
The problem happens with |
By shut down, I mean |
Exiting that's where the main() ends, and tells too "finish()" the activity:
otherwise, maybe, this is just an issue with phone gles2 driver... |
I just had the idea to put a I will do some more digging to find out if it's my code, or something in SDL and come back with a repo, or bug fix. |
I can confirm it's an SDL(3) issue. If you
|
Oh yes, and this is what comes through logcat:
|
ok. but you change your bug, before was Indeed this doesn't work, like in other platforms, because the android java surface view is still there and is connected to the egl.
so that you see a "surfaceDestroyed()" message in you log (nothing remains connected). if you think the initial bug is the same as your 2nd, you can add a global variable counter to check that. You display and increment the variable at each start of the main() and see the value. (but that shouldn't happen! unless you have different activity configuration). |
I didn't change the bug, it still happens if you exit Other platforms, this works, so it's a bug specific to the Android platform. I'm not adding code to work around the bug, especially not a delay because that will create ANRs, which is bad behaviour in Google Play. I'm pretty sure this can be easily fixed in the SDL java code, where the source of the problem is. |
You're also saying the error doesn't occur at the same time. Can you add a global variable counter that you display and increment at the start of you main(). |
The code I pasted reproduces the same error, just with a slightly different result - the error's coming back from logcat at the same. The counter will be at 2, because it happens after the first exit and re-start if you open the app quick enough. |
To clarify, SDL is getting stuck, and produces an ANR if restarted too quickly. It gets stuck in There's an interesting comment in this method's source code which might be relevant. It might be related to the bug.
|
not sure if this is relevant, we don't use the looper directly, but the activity.... and what is the "quit" in this situation, onDestroy() ? if so nothing is done after this. This is an android comment for android internals.
strange. because I tried it, several times, and i could restart the main() with this. Maybe the timing is different (try to increase). for this test, the relevant information is to check the logcat and search for "onSurfaceDestroyed" and see it if arrive before you reach the "again:" label. SDL_MinimizeWindow should trigger indirectly "onSurfaceDestroyed".
Sorry to ask, but here this is really not clear to me. "counter will be at 2". did you check that ? or you assume it ? the goal is to tell if the main() is re-started in the same process, or in a new one. (this is has to be tested without the "again:" label + goto). |
The bug is clear and easy to reproduce with the latest source using the android-project template. There's not really anything more I can add. |
Just to clarify further facepalm: I didn't assume the counter will be 2, I know it will. Global / static data does not get destroyed on Android when main() exits. You can see this by the position of the green box that slides to the right. It will persist, even when you return to the app an hour later, unless you force-close the app, or the Android system decides to garbage collect. I've just tested this on an old Nexus 7 and the game loop continues, but the event loop locks up. I did have to add |
This is an apk for aarch64 I created, using the code from #7567 (comment) and my android cmake pr (+ printing of a counter value at the start of main). I don't see a ANR on a OnePlus 5, after more then 500 restarts. issue_7567.c#include <SDL3/SDL_main.h>
#include <SDL3/SDL.h>
static int counter = 0;
int main(int argc, char** argv) {
again:
SDL_Log("At start of main. counter=%d", counter);
counter++;
SDL_SetHint(SDL_HINT_ANDROID_TRAP_BACK_BUTTON, "1");
if(SDL_Init(SDL_INIT_VIDEO)) {
SDL_ShowSimpleMessageBox(SDL_MESSAGEBOX_ERROR, "SDL_Init", SDL_GetError(), 0);
return 1;
}
SDL_Window* w = SDL_CreateWindowWithPosition( "", SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED, 640, 480, SDL_WINDOW_RESIZABLE);
if(!w) {
SDL_ShowSimpleMessageBox(SDL_MESSAGEBOX_ERROR, "SDL_CreateWindow", SDL_GetError(), 0);
return 1;
}
SDL_Renderer *r = SDL_CreateRenderer(w, 0, SDL_RENDERER_PRESENTVSYNC);
if (!r) {
SDL_ShowSimpleMessageBox(SDL_MESSAGEBOX_ERROR, "SDL_CreateRenderer", SDL_GetError(), 0);
return 1;
}
SDL_bool running = SDL_TRUE;
while (running) {
SDL_SetRenderDrawColor(r, 0, 0, 255, SDL_ALPHA_OPAQUE);
SDL_RenderClear(r);
static SDL_FRect box = { 0,0,50,50 };
SDL_SetRenderDrawColor(r, 0, 255, 0, SDL_ALPHA_OPAQUE);
SDL_RenderFillRect(r, &box);
SDL_RenderPresent(r);
SDL_PumpEvents();
SDL_Event event;
while (SDL_PeepEvents(&event, 1, SDL_GETEVENT, SDL_EVENT_FIRST, SDL_EVENT_LAST) == 1) {
if(event.type == SDL_EVENT_KEY_UP || event.type == SDL_EVENT_FINGER_UP)
running = SDL_FALSE;
}
if(box.x != 300)
box.x++;
else
box.x = 0;
}
SDL_DestroyRenderer(r);
SDL_DestroyWindow(w);
SDL_Quit();
/* SDL_Delay(8000); /* It still crashes even with a delay to let the java-side catch up */
goto again;
} |
@madebr thanks for the cooperation ! 2/ the label "again:" bug. eg: Init() Quit() ... Init() Quit(). maybe that can work .. but that's an uncommon scenario |
@AntTheAlchemist |
I can reliably reproduce the error when removing the And start/stop it using:
For me, it always fails in the 3rd iteration. When stuck, you can unstuck it with:
This forces a restart of the program. source.c#include <SDL3/SDL_main.h>
#include <SDL3/SDL.h>
int main(int argc, char** argv) {
SDL_SetHint(SDL_HINT_ANDROID_TRAP_BACK_BUTTON, "1");
if(SDL_Init(SDL_INIT_VIDEO)) {
SDL_ShowSimpleMessageBox(SDL_MESSAGEBOX_ERROR, "SDL_Init", SDL_GetError(), 0);
return 1;
}
SDL_Window* w = SDL_CreateWindowWithPosition( "", SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED, 640, 480, SDL_WINDOW_RESIZABLE);
if(!w) {
SDL_ShowSimpleMessageBox(SDL_MESSAGEBOX_ERROR, "SDL_CreateWindow", SDL_GetError(), 0);
return 1;
}
SDL_Renderer *r = SDL_CreateRenderer(w, 0, SDL_RENDERER_PRESENTVSYNC);
if (!r) {
SDL_ShowSimpleMessageBox(SDL_MESSAGEBOX_ERROR, "SDL_CreateRenderer", SDL_GetError(), 0);
return 1;
}
SDL_bool running = SDL_TRUE;
while (running) {
SDL_SetRenderDrawColor(r, 0, 0, 255, SDL_ALPHA_OPAQUE);
SDL_RenderClear(r);
static SDL_FRect box = { 0,0,50,50 };
SDL_SetRenderDrawColor(r, 0, 255, 0, SDL_ALPHA_OPAQUE);
SDL_RenderFillRect(r, &box);
SDL_RenderPresent(r);
SDL_PumpEvents();
SDL_Event event;
while (SDL_PeepEvents(&event, 1, SDL_GETEVENT, SDL_EVENT_FIRST, SDL_EVENT_LAST) == 1) {
if(event.type == SDL_EVENT_KEY_UP || event.type == SDL_EVENT_FINGER_UP)
running = SDL_FALSE;
}
if(box.x != 300)
box.x++;
else
box.x = 0;
}
SDL_DestroyRenderer(r);
SDL_DestroyWindow(w);
SDL_Quit();
return 0;
} |
@madebr are you reproducing the error with the patch from the PR ? |
I modified the adb command to repeatedly start/stop the app to:
Then, it does not block anymore. The app keeps the global static state though: the green block does not move left after a close/start cycle. When testing manually, the behavior is not good: as slouken wrote in #7585, you need to re-start the app 5 times before the activity shows up. And then too, the global state remains. |
I have updated the PR so that it indicate the count the main is called (eg: 3c703f2 ) |
FYI, the real fix is to add a hint to call exit() after main() returns, and default that to true. It should be a hint, because some applications properly handle the Android life cycle and expect the activity to be reentered. This may need special handling to survive SDL_Quit(), possibly implemented as a callback that sets a global variable. |
@madebr I am not sure how you get the global state to remain.
I am curious, about the method, and I also would like to know it calls onDestroy() in the logcat ! @slouken
if i do the quick sequence close-restart, then:
|
I've updated #7585 |
I locally merged #7581 into #7585, and recorded a video of me opening the app, closing, and reopening it by clicking 5 times.
repro.mp4 |
so onDestroy() is called, but the static native data remains ? |
I just keep tapping on my phone screen (or clicking in the scrcpy window). Here's a screen cast: |
I've looked at the video, the process id, remains the same across all the video (20391) (except at the very start). Now the patch use system.exit(0). so it should really exits. (before, only finish() that call onDestroy()). also: |
SDLEntryActivity is not started in the video: it is only used when started through a shortcut. Repeating the above test procedure makes logcat return a new process id, and resets the test executable. Video + logcat of running:
out.mp4 |
@1bsyl I was certain I had reported back on this before now, maybe elsewhere. Apologies if I haven't already & for the delay. I did test, and it does seem to have fixed the issue from the user's perspective. I have been re-entering apps without issue for a long time. I was careful not to leave global / static data behind that would disrupt a re-start of the app, simply by not assuming global data defaults when it's restarted on Android. It was kind of useful to maintain the state of the app sometimes. I could see it tripping up a lot of unaware developers though. Are we sure there won't be any side-effects as a result of this fix? |
I think the fix assumes the I haven't tested this theory. So this might not be true. |
@AntTheAlchemist thanks for the feedback ! yes, you gave some feedback, but it's better when it's re-actualized the side effect is almost as @madebr say. Now by default, it only lets you run the SDLActivity once, with the same PID. Just to make clear:
When is this an issue ?
|
All very comprehensive, thank you. The only remaining thing I can think of is will any of the waiting for things to end / restart spark another ANR? I have an SDL3 app live on Play Store that's being punished for exceeding the bad behaviour for ANRs (I'll post a separate bug for this), especially in |
Here, with the PR, there is no "wait" for things to end / restart. So I don´t believe this can add more ANR. In fact, I hope this could reduce the ANR rate on android. Because, if you re-create the activity with a previous -random- state, then this is an undefined behavior, leading to ANR (event sent / lock) or Crash My experience about android anr/crashes:
both are under the threshold. crash rate is far below, but the anr rate is quite close to the threshold. On TV devices ... it cannot run AdMob ? I have disable HID, so I cannot tell. maybe there are some ANR there, that add to the Admob ones, so that it exceeds threshold. ( btw, after updating your app, I think you need to wait some time, eg 2 weeks to have so relevant new rates.) |
@AntTheAlchemist. Not sure, if this is feasible. but for instance, the video of madebr, prove the same pid is always use there. SDLActivity is always reopened with same PID. maybe because it use other activity to instrument the SDLActivty (you were using scrcpy ? ) |
I've applied dfd80f3 and everything seems to be ok. The '.MainActivity.onDestroy' ANR has disappeared from Google Play Console reports - which was responsible for sending the ANR rate many times over the bad behaviour limit. And of course there's no glitching when restarting the app too quickly. Thanks everyone! |
@AntTheAlchemist Thanks for the feedback, I was recently wondering how it goes !! thanks also for reporting this ! can we leave it open, so that you also refresh the status, in a couple again ? |
@1bsyl this update has only been live for 5 days, but the ANRs would appear after 1 day, so I'm confident we won't see any onDestroy related ANRs. 600 new downloads a day gives a good sample. I was averaging 8 ANRs a day, now it's dropped to 3-4 and still dropping daily. I'm still over the bad behaviour threshold (0.2%, needs to be under 0.04%?), but that's caused by joystick mutex locking (e.g: It's also worth noting that because people will be testing game controllers on their game-console TV set-up, most of my installs (over 50%) are on Android TV devices, as opposed to phone. My ANRs seem to be something specific to how Android TV works. I've updated about 20 different versions trying to tackle the ANRs and strangely they're just going up and up until your activity destroying patch. I can revisit here later, but I'm confident this works. All I will say is that there may be another hidden issue that caused the activity to not perfectly be re-used without destruction, so using the hint to keep the activity alive may still be a problem for those that need it. |
Great! Thanks for your feedback! :) I haven't forgotten your joystick ANRs, but I don't know what the underlying cause might be. Can you ping the other issue with an updated stack trace and I'll come back and look when I get a chance? |
I'll put together some info that should help point in the right direction, leave it with me. It should be a lot easier now activity is being renewed. I may need to acquire an Android TV box and see if I can trigger the ANR with the profiler running - might be when controllers are physically plugged in and out of a USB port on the box. Bare with :-) |
Hey @AntTheAlchemist ! I am currently testing this recently ... btw, another, question: |
Hi @1bsyl ! Crashes are very rare. There are some null pointer crashes in the Java code - will list them in another post. They should be easy to fix with a null check. The destroy-activity-on-close update has greatly reduced my ANRs, but I'm still way over bad behaviour. I only had 1 rare one 19 days ago in the update. Looking at the other SDL activities, there is one doing a I see there has been a recent hid / libusb update to SDL3. I'd like to see if it makes any difference. Yes, I am seeing multiple SDLTread in the stack trace, with the thread numbers going quite high, eg: 463. The main culprit is I gave Sam admin access to my Google Play Console so he could view the stack traces. I can do that, or paste a bunch of collapsible plain-text stack traces here. |
ok ! |
@AntTheAlchemist Also, if you plane to do a new release, could you add some smart trace in SDLActivity: replace in SDLActivity.java: by:
this can give some insight, like: (the longest thread name for me in gp console, so far is 35 chars) |
@AntTheAlchemist
and |
Also, maybe an ANR with nativeKeyDown/Up: #7832 |
After pushing, some update to gather some more information about the duplicated SDLThread... I've inversed the string and got: clearly android bug. timestamp couldn't be identical nor truncated. |
I suspect some of the ANRs (and crashes) are caused by the Android system, and the app gets the blame. I've seen reports showing SQL_Lite code. I'm not even using SQL. My ANRs are decreasing slightly in my latest live production. I wasn't in time to include #378e33b though. If you give me a couple more weeks, I'll have some nice fresh stack logs (which I'll post in the other issue) that'll help identify the culprit. |
what's #378e33b ? |
Sorry, I'm not very good with this Github lark. |
ok ! (copy paste is fine !). I am currently testing it. |
#7859: Android: also protect Hat/Joy/PadDown,Up so they are not sent without window |
@AntTheAlchemist : I meant, you don't need for your next update to change the SDLThread java name. (Unless you've idea about what to include to gather more information) |
**Update: skip to the last message to get to the conclusion and avoid the waffle **
If the Java side of the app hasn't finished shutting down properly when re-opened, it all goes a bit FUBAR when you re-open the app. You have to be pretty quick with your fingers to trigger this, but still shouldn't happen? Might be obvious / simple to fix?
Looking at Logcat, I'm seeing these messages come up, which might be a clue:
The text was updated successfully, but these errors were encountered: