Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android 8 LoD syncing requires app restart and device reboot #11816

Closed
radinamatic opened this issue Jan 30, 2024 · 14 comments
Closed

Android 8 LoD syncing requires app restart and device reboot #11816

radinamatic opened this issue Jan 30, 2024 · 14 comments
Assignees
Labels
OS: Android TODO: needs decisions Design or specifications are necessary

Comments

@radinamatic
Copy link
Member

Observed behavior

LoD created on a tablet with Android 8 by going through the on-my-own setup, and then merging the account with one already on the Windows 7 server. For more than an hour LoD was syncing correctly (on one occasion it seemed to have stalled, but resumed with app restart), but eventually stopped, and app restart was not able to make it resume. Only after the full tablet reboot it eventually started to sync again. According to the Last synced information in Coach page on the server, the culprit might be the error in the LoD logs at 20:45.

The other 2 LoDs, with Android 12 & 13 respectively (one of whom has also been created by merging the on-my-own account), have been successfully (resuming) syncing for 4-5 days.

Errors and logs

Server and LoD logs and dbs

Expected behavior

Syncing should resume without the need to reboot the device.

User-facing consequences

Steps to reproduce

Context

  • Kolibri version: 0.16b13
  • Operating system: Windows 7 (server) and Android 8 LoD
@radinamatic
Copy link
Member Author

Possibly related to #11556 ?

@bjester
Copy link
Member

bjester commented Jan 30, 2024

the culprit might be the error in the LoD logs at 20:45

@radinamatic what is this error you mention? I don't see it myself. Also, what was the timeline of events right before you rebooted the device? How long do you recall waiting before rebooting?

@bjester bjester self-assigned this Jan 30, 2024
@radinamatic
Copy link
Member Author

I may be miss-interpreting the logs, but this is approximately the time when the server reports the last sync:

2024-01-29 20:45:00.033  8380-8406  libEGL                  org.learningequality.Kolibri         W  EGLNativeWindowType 0x7bb078c010 disconnect failed
2024-01-29 20:45:01.563  8380-8380  Destroy                 org.learningequality.Kolibri         I  end of app
2024-01-29 20:45:01.564  8380-8380  Process                 org.learningequality.Kolibri         I  Sending signal. PID: 8380 SIG: 9
2024-01-29 20:45:15.408  9506-9530  WM-ForceStopRunnable    org.learningequality.Kolibri         D  Found unfinished work, scheduling it.
2024-01-29 20:45:15.422  9484-9484  Settings                org.learningequality.Kolibri         W  mValues not put! needsGenerationTracker: true currentGeneration: -1 name: show_password value: null
2

When I see more than 5 minutes since the last sync on the device I'm testing, I usually try to interact with it first: make some progress on an assignment, browse another library, try to download a resource, etc. If the syncing does not resume for another up to 5 minutes, I then restart the app, and try some more interactions on it. If even that does not manage to make the syncing resume, it usually means that the server reports more than 15 minutes since the last sync, and then I proceed to reboot the LoD.

@bjester
Copy link
Member

bjester commented Jan 30, 2024

So this type of log message is usually associated with a forced restart of the app:

2024-01-29 20:45:01.563  8380-8380  Destroy                 org.learningequality.Kolibri         I  end of app

What follows is a timeline, from what I can gather from the logs:

  1. 20:44:49 - The last successful sync occurs
  2. 20:45:01 - The app is force restarted, seems to have no bearing on its function
  3. 20:45:48 - The next SoUD sync starts as it was supposed to, 60 seconds after the last
  4. 10 minutes pass-- Android Work Manager gives tasks 10 minutes to complete-- the app seems to go idle for some time during this
  5. 20:55:48 - The SoUD task that started 10 minutes earlier is interrupted, along with others
  6. 20:57:00 - The app is woken up after 3 minutes without a server request
  7. 20:58:55 - The app is force restarted
  8. 20:59:19 - The app registers the Windows device through Zeroconf, starting the chain that will lead to SoUD syncing
  9. 21:00:24 - The device is rebooted before the syncing can restart

Now, ideally the 10 minutes that pass during (4) never happen, specifically with the tasks becoming stuck. Although, this had nothing to do with the syncing mechanism itself but rather is more than likely an issue with python-for-android. Based off the logs, it's also something we've already seen leading to our commitment to switch to chaquopy. So I feel confiden this will be resolved by learningequality/kolibri-installer-android#197, which isn't prioritized for 0.16

@pcenov
Copy link
Member

pcenov commented Jan 31, 2024

Hi @bjester posting here the logs of a device that stopped syncing to the Windows server and would not resume syncing regardless of the number of times I restarted both the app and the server.

Android13.zip
windows-server.zip

The other 2 devices that were also connected to the server are still syncing correctly.

@radinamatic
Copy link
Member Author

Another take (user import from scratch) on my side with the same Huawei Android 8 tablet (and this time with the steps to disable battery optimization features, as @rtibbles reminded me), ended the same way as for @pcenov above: only the initial sync was performed, learner device received the class assignments, but all their progress never managed to sync back to the server, despite app restart, device reboot, or several server and VM restarts.

Logs and DBs

@pcenov
Copy link
Member

pcenov commented Feb 6, 2024

Hi @bjester - here are the logs from a device that was not able to resume syncing for quite some time (after both app and server restarts) and then started syncing again:

LOGS

@pcenov
Copy link
Member

pcenov commented Feb 14, 2024

Adding the logs for an Android phone on which I've installed the latest 0.16.0-rc0 app. Again the app was syncing correctly, then I left it running in the background for several hours and when I picked it up it seemed impossible to resume syncing even after restarting both the server and the app several times. Eventually the app started syncing though.
LOGS

@rtibbles
Copy link
Member

The main novel thing I am seeing in here that I've not seen in other logs is this:

WARNING 2024-02-14 10:07:45,480 zeroconf Exception occurred:
Traceback (most recent call last):
  File "/home/runner/work/kolibri/kolibri/python-for-android/build/python-installs/kolibri/arm64-v8a/kolibri/dist/zeroconf.py", line 2346, in send

@bjester's plan to migrate our Android device detection to use the native Android functionality should presumably resolve this.

@rtibbles
Copy link
Member

The other notable errors here seem to be due to database contention - either database locked errors on the default database, or other issues with writing on the task database.

The other exceptions I am seeing are repeated network connection timeouts and disconnects, which I would assume are caused by server restarts?

@pcenov
Copy link
Member

pcenov commented Apr 24, 2024

Hi @bjester I've not been able to replicate this syncing stoppage with the build from learningequality/kolibri-installer-android#216
Tested with 3 Samsung phone devices running on Android, 7.1, 8.0 and 13.

@bjester
Copy link
Member

bjester commented Apr 24, 2024

Thanks @pcenov! I will go ahead and merge that PR. I'll keep this issue open for a bit to see if @radinamatic wants attempt reproducing it, but this seems like great news 🎉

@bjester
Copy link
Member

bjester commented Apr 24, 2024

Ah I just noticed your comment @pcenov on that PR. Great news!

@marcellamaki
Copy link
Member

Closing this out as it seems like this has been resolved! 🎉 We can open a new issue if anything comes up in further testing in 0.17 QA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS: Android TODO: needs decisions Design or specifications are necessary
Projects
None yet
Development

No branches or pull requests

6 participants