Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSOD on Moto G power 2021 #1025

Closed
shankari opened this issue Nov 1, 2023 · 23 comments
Closed

WSOD on Moto G power 2021 #1025

shankari opened this issue Nov 1, 2023 · 23 comments

Comments

@shankari
Copy link
Contributor

shankari commented Nov 1, 2023

A user recently reported a WSOD on a Moto G Power 2021 (see attached)

screen-20231031-220346.mp4
@shankari
Copy link
Contributor Author

shankari commented Nov 1, 2023

This is very weird because I tested on a Moto G Play 2021 and it works fine.
And the android webview version is more recent on the Power

Moto G Play

  • Last updated Oct 7
  • version 117.0.5938.153

Moto G Power

  • Last updated 3 days ago
  • version 118.0.5593.80

This makes me wonder whether the issue is with the particular users' data

@JGreenlee
Copy link

One of my test phones is a Moto G Power 2021 and has had no issues with this release.

If there's some kind of aberration in this user's data, maybe somewhere on the phone we make a faulty assumption about it. We should try the user's OPcode in a devapp

@JGreenlee
Copy link

Unable to reproduce in the devapp on either dev or prod JS.
Using master, I logged in with the OPcode and saw nothing unusual. The latest trip is from today, Nov 1.

@JGreenlee
Copy link

Now I am wondering if it is specific to this user's installation of the app. Maybe a bad value in local or native storage?

@shankari
Copy link
Contributor Author

shankari commented Nov 1, 2023

@JGreenlee can you try with this opcode on your motorola G Power test phone? You can restore back to your regular opcode later. If you still can't reproduce, I will turn on the developer options on the user phone and try to get an OS level dump

@JGreenlee
Copy link

@JGreenlee can you try with this opcode on your motorola G Power test phone? You can restore back to your regular opcode later. If you still can't reproduce, I will turn on the developer options on the user phone and try to get an OS level dump

Everything still normal on my G Power with the user's OPcode

@shankari
Copy link
Contributor Author

shankari commented Nov 5, 2023

I tried a couple of times to get the error from the bug report, but I don't see anything.
The error occurs around 9:58

screen-20231101-215808.mp4

And in the logs, I see

11-01 21:57:56.174  1000  1930  5466 I ActivityTaskManager: START u0 {act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] flg=0x10200000 cmp=gov.nrel.cims.openpath/.MainActivity bnds=[561,1350][696,1547]} from uid 10174
11-01 21:57:56.177  1000  1930  5466 D ActivityTaskManager: getPreferredLaunchDisplay userset_displayid false current displayId -1
11-01 21:57:56.177  1000  1930  5466 I chatty  : uid=1000(system) Binder:1930_1D identical 1 line
11-01 21:57:56.182  1000  1930  5466 D ActivityTaskManager: getPreferredLaunchDisplay userset_displayid false current displayId -1
...
11-01 21:58:12.620  1000  1930  5466 V WindowManager: Changing focus of displayId=0 to Window{15335fc u0 NotificationShade} from Window{7cf05d6 u0 gov.nrel.cims.openpath/gov.nrel.cims.openpath.MainActivity}
11-01 21:58:20.563  1000  1930  5466 D CompatibilityChangeReporter: Compat change id reported: 136219221; UID 10352; state: ENABLED
...
11-01 21:59:42.974  1000  1930  5466 W UserManagerService: Do not support app clone feature.
11-01 21:59:47.121  1000  1930  5466 I ProcessStatsService: Added stats: 2023-11-01-14-45-46, over +3h58m36s146ms
11-01 21:59:47.137  1000  1930  5466 I ProcessStatsService: Added stats: 2023-11-01-08-33-42, over +6h12m3s790ms
11-01 21:59:47.155  1000  1930  5466 I ProcessStatsService: Added stats: 2023-10-31-19-40-33, over +12h53m8s938ms

I will try one more time to use logcat in real time

@shankari
Copy link
Contributor Author

shankari commented Nov 5, 2023

Here are the final recording and logs. This time I ran logcat locally, and was able to get even more logs. But I don't see any errors, and I also don't see any logs indicating where we are wrt the loading process.

However, I also don't think we have a lot of logs any more - we removed a lot of them during the rewrite since using debuggers is easier, but that makes it hard to find issues that are not reproducible in the debugger (I know I flagged this in a previous commit as a future fix, and maybe the time to fix that is now).

There are exactly two log statements outside of diary/services.js now so it is very hard to figure out what is going on

openpath-phone kshankar$ grep -rc Logger.log www/js/diary | grep -v ":0"
www/js/diary/addressNamesHelper.ts:1
www/js/diary/services.js:26
www/js/diary/LabelTab.tsx:1

The last log that I can see is

11-05 15:22:37.234  1930  1957 I ActivityManager: Start proc 11640:com.google.android.webview:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:1/u0i949 for  {gov.nrel.cims.openpath/org.chromium.content.app.SandboxedProcessService0:1}
11-05 15:22:41.218  1930  1950 I LaunchCheckinHandler: Displayed gov.nrel.cims.openpath/.MainActivity,wp,ca,6495
11-05 15:31:07.772 22734 27268 I System.out: data = {"approval_date":"2016-07-14","protocol_id":"2014-04-
11-05 15:31:07.802  1930  1950 I WindowManager: SURFACE hide Surface(name=Splash Screen gov.nrel.cims.ope
npath)/@0x99f32b1 on display:0
11-05 15:31:07.816 22734 27268 I System.out: About to execute query SELECT data FROM userCache WHERE key
= 'data_collection_consented_protocol' AND type = 'local-storage' AND write_ts >= 0.0 AND write_ts <= 1.6
99227067816E12 ORDER BY write_ts DESC
11-05 15:31:08.244 22734 27188 D SERVER  : Handling local request: https://localhost/dist/5d42b4e60858731
e7b65.ttf
11-05 15:31:08.325 22734 27268 I System.out: data = {"approval_date":"2016-07-14","protocol_id":"2014-04-
6267"}
...
11-05 15:31:18.718  1930  1944 V WindowManager: Changing focus of displayId=0 to Window{15335fc u0 NotificationShade} from Window{49f7871 u0 gov.nrel.cims.openpath/gov.nrel.cims.openpath.MainActivity}

video: https://github.com/e-mission/e-mission-docs/assets/2423263/9217dd46-d0d1-41a5-b8c5-16525af28fad
logs: logcat.log.gz

@shankari
Copy link
Contributor Author

shankari commented Nov 5, 2023

@JGreenlee can we go through and bulk up on the diary logs - we want to see something on the order of the 20+ logs seen in the diary services... Then I can try to update on this phone and see what is going on. I suspect that a reinstall might fix it, but then we will never know what (if anything) caused this and won't be able to guard against it.

@shankari
Copy link
Contributor Author

shankari commented Nov 9, 2023

@JGreenlee @Abby-Wheelis can one of you bulk up on the logs in the label screen (and potentially the dashboard screen), including making sure that every top level call is in a try/catch block? That should help us get a better handle on these intermittent WSODs

Once those are added, I can push out a new release and see what the logs show.

@JGreenlee
Copy link

@shankari
Copy link
Contributor Author

I deployed the new release for this user and got the following behavior.

Trips but no labels Even on the detail screen even after reloading
Image Image Image

@shankari
Copy link
Contributor Author

After seeing this, I wonder if the issue is with the cached config. This user installed the app a very long time ago, so they might have an old version of the config in the local storage. Maybe there's some backwards compat code that was removed during the rewrite?

If this was the case, we should have displayed an error instead of just silently catching and ignoring it. Uploading logs internally...

@shankari
Copy link
Contributor Author

Also, it looks like we keep reloading data.
The video below shows:

  • we are on label screen
  • we go to profile (reload, "Reading unprocessed trips")
  • we go back to label screen (reloads completely)
  • we go to profile (reload, "Reading unprocessed trips")
screen-20231111-164052.mp4

The logs for this should be in the internal upload as well

@shankari
Copy link
Contributor Author

I also have an issue in which the logs are not getting shared properly to nextcloud. It did work with Google Drive, but it just returned immediately from nextcloud. @the-bay-kay, can you install nextcloud on your test android and check this out?

@shankari
Copy link
Contributor Author

I also have an issue in which the logs are not getting shared properly to nextcloud.

I take this back, it just took a really long time to upload. It is there now.

@shankari
Copy link
Contributor Author

shankari commented Nov 12, 2023

Uploaded userCacheDB as well - the stored config can be retrieved as below.

$ sqlite3 /tmp/userCacheDB
SQLite version 3.37.0 2021-12-09 01:34:53
Enter ".help" for usage hints.
sqlite> .tables
android_metadata  userCache         userCacheError
sqlite> select distinct(key) from userCache
   ...> ;
prompted-auth
config/consent
data_collection_consented_protocol
config/app_ui_config
intro_done
foodCompare
CONFIG_PHONE_UI
CURR_GEOFENCE_LOCATION
connection_settings
stats/client_time
stats/client_nav_event
background/battery
statemachine/transition
diary/trips-2023-08-09
diary/trips-2023-08-10
diary/trips-2023-08-13
diary/trips-2023-08-15
sqlite> select * from userCache where key == "config/app_ui_config";

@JGreenlee
Copy link

Something else I noticed is that there is no Dashboard tab, even though this user is supposed to be on open-access, which is a MULTILABEL configuration. We would expect the Dashboard to be hidden on non-MULTILABEL configurations.

So I am fairly certain that the stored config for this user is missing the trip-labels option. It's neither ENKETO nor MULTILABEL, so the UI doesn't know what to do and just shows no buttons at all.

@JGreenlee
Copy link

Here is the usercache entry of config/app_ui_config for that user:

{
  "version": 1,
  "ts": 1655143472,
  "server": {
    "connectUrl": "https://open-access-openpath.nrel.gov/api/",
    "aggregate_call_auth": "user_only"
  },
  "intro": {
    "program_or_study": "study",
    "program_admin_contact": "K. Shankari ([email protected])",
    "deployment_partner_name": "National Renewable Energy Laboratory (NREL)",
    "translated_text": {
      "en": {
        "deployment_name": "Open Access Study",
        "summary_line_1": "enables people to track their travel modes and <b>measure</b> their associated energy use and emissions",
        "summary_line_2": "makes <b>aggregated</b> data on mode shares, trip frequencies, and carbon footprints available via a public dashboard",
        "summary_line_3": "serves as a <b>control group</b> while evaluating behavior change of programs.",
        "short_textual_description": "Transportation is the largest contributor to US Greenhouse Gas GHG emissions. As the nation's premier facility for energy-efficient transportation R&amp;D solutions, NREL’s sustainable transportation and mobility research takes a whole-system, human-centric approach to solve complex energy challenges and slash greenhouse gas emissions. As part of this approach we would like to collect long-term travel behavior data that tells us how, where and why people travel.",
        "why_we_collect": "NREL can use this information to understand how travel patterns change in response to federal, state and local interventions. The interventions can include programs, incentives and infrastructure changes. Since this data will be collected for a long time, changes can be directly observed instead of inferred from other data sources. This data can also serve as a control of unmodified behavior for programs that target a small sample of the population."
      },
      "es": {
        "deployment_name": "Estudio de acceso abierto",
        "summary_line_1": "permite a las personas realizar un seguimiento de sus modos de viaje y <b>medir</b> su uso de energía y emisiones asociadas",
        "summary_line_2": "hace que los datos <b>agregados</b> sobre modos compartidos, frecuencias de viaje y huellas de carbono estén disponibles a través de un tablero público",
        "summary_line_3": "sirve como un <b>grupo de control</b> al evaluar el cambio de comportamiento de los programas",
        "short_textual_description": "El transporte es el mayor contribuyente a las emisiones de gases de efecto invernadero de EE. UU. Como la principal instalación del país para soluciones de investigación y desarrollo de transporte eficiente en energía, la investigación de movilidad y transporte sostenible de NREL adopta un enfoque de sistema completo centrado en el ser humano para resolver desafíos energéticos complejos y reducir las emisiones de gases de efecto invernadero. Como parte de este enfoque, nos gustaría recopilar datos de comportamiento de viaje a largo plazo que nos digan cómo, dónde y por qué viaja la gente.",
        "why_we_collect_es": "NREL puede usar esta información para comprender cómo cambian los patrones de viaje en respuesta a las intervenciones federales, estatales y locales. Las intervenciones pueden incluyen programas, incentivos y cambios de infraestructura. Dado que estos datos serán recopilarse durante mucho tiempo, los cambios se pueden observar directamente en lugar de inferido de otras fuentes de datos. Estos datos también pueden servir como control de comportamiento no modificado para programas que se dirigen a una pequeña muestra de la población."
      }
    }
  },
  "display_config": { "use_imperial": true },
  "profile_controls": {
    "support_upload": false,
    "trip_end_notification": false
  },
  "name": "open-access",
  "joined": {
    "route": "join_study",
    "label": "open-access",
    "source": "github"
  }
}

It does not have survey_resources which is where trip-labels would go.

@JGreenlee
Copy link

We do have a backwards compat filler for this (https://github.com/e-mission/e-mission-phone/blob/922a62b7c2601f195bfe8df54654986135e99b25/www/js/config/dynamicConfig.ts#L33) but it is only applied when a config is being downloaded.

If a config has already been stored with missing values (which is the case for this user because they have been using the app for a long time), the backwards compat is not doing anything.

@JGreenlee
Copy link

Potential ways to proceed:
a) Have this backwards compat be applied not only to new configs being downloaded, but also stored configs.
b) Instead of specifically checking if MULTILABEL exists, check for the nonexistence of ENKETO. If ENKETO isn't there, assume it will be MULTILABEL

@shankari
Copy link
Contributor Author

shankari commented Nov 12, 2023

I don't think this was the cause of the WSOD because I didn't have a WSOD and I have been on NREL commute, which still doesn't have the survey_resources tag, for a while
https://github.com/e-mission/nrel-openpath-deploy-configs/blob/main/configs/nrel-commute.nrel-op.json

The most recent update seems to have fixed the WSOD, although I am still not sure why and introduced this new issue.
I guess the root cause for the WSOD might be lost in the mists of time unless there's an error we have caught and printed in the logs

For this issue, we had implemented (a) before the rewrite (I knew I was careful about that):
https://github.com/e-mission/e-mission-phone/blob/0f7caef51dc140540926ff3a671dbc67a76dfad5/www/js/config/dynamic_config.js#L133

For now, we should restore it.

The long-term pattern for such issues is to change the config on the server, potentially using e-mission/op-admin-dashboard#75. We can then handle stored configs through lazy migration in parallel. After sufficient time (say, one year), we can remove the migration code path on the phone.

We can write that up as our config policy and include it into the config repo readme and/or the FAQ

@JGreenlee
Copy link

I think it's probable that this was the cause of the WSOD. If survey_resources was missing, this line would have caused an error:
https://github.com/e-mission/e-mission-phone/blob/922a62b7c2601f195bfe8df54654986135e99b25/www/js/diary/LabelTab.tsx#L57

That line does not exist anymore in service_rewrite_2023 since #1086


I didn't have a WSOD and I have been on NREL commute, which still doesn't have the survey_resources tag, for a while

Your installation doesn't currently show the same behavior as the aforementioned user, does it? You have probably logged out and back in at some point, causing the backwards compat to apply-- while for that user, it never did.

@shankari shankari moved this from Tasks completed to Tasks completed in last release in OpenPATH Tasks Overview Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants