Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge to main repo #905

Merged

Conversation

shankari
Copy link
Contributor

@shankari shankari commented Apr 17, 2023

This includes the upgrades to the server code (#900)
and the changes to support the composite trip/place timeline (#895)

shankari and others added 30 commits January 4, 2023 18:03
At a high level, we add two more data keys:
- trip_addition_input
- place_addition_input

The trip_addition_input is just a tripuserinput-type object.
The place_addition_input is a new, parallel, placeuserinput type object

+ adding them in the timeseries and all the formatters

Testing done:

Survey appears on the server

```
2023-01-04 17:13:17,118:DEBUG:123145411158016:Updated result for user = 90ba8f99-ca3a-4c12-bbc6-8e1bd074ba65, key = manual/trip_addition_input, write_ts = 1672881166.827849 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('63b6242de642ad2a393c21d0'), 'ok': 1.0, 'updatedExisting': False}
```

Survey is in the usercache and does have start and end ts

```
>>> list(edb.get_usercache_db().find({"metadata.key": "manual/trip_addition_input"}))[1]
{'_id': ObjectId('63b6242de642ad2a393c21d0'), 'metadata': {'time_zone': 'America/Los_Angeles', 'plugin': 'none', 'write_ts': 1672881166.827849, 'platform': 'ios', 'read_ts': 0, 'key': 'manual/trip_addition_input', 'type': 'message'}, 'user_id': UUID('90ba8f99-ca3a-4c12-bbc6-8e1bd074ba65'), 'data': {'end_ts': 1470941549.534, 'xmlResponse': '<a88RxBtE3jwSar3cwiZTdn xmlns:jr="http://openrosa.org/javarosa" xmlns:orx="http://openrosa.org/xforms" id="a88RxBtE3jwSar3cwiZTdn">\n          <start>2023-01-04T17:11:52.268-08:00</start>\n        <end>2023-01-04T17:11:52.270-08:00</end>\n          <group_hg4zz25>\n            <Date>2016-08-11</Date>\n            <Start_time>11:43:00.000-08:00</Start_time> \n            <End_time>11:53:00.000-08:00</End_time>\n          <Activity_Type>personal_care_activities</Activity_Type>\n            <Personal_Care_activities>option_1</Personal_Care_activities>\n            <Employment_related_a_Education_activities/>\n            <Domestic_activities/>\n            <Recreation_and_leisure/>\n            <Voluntary_work_and_care_activities/>\n            <Other/>\n          </group_hg4zz25>\n          <meta>\n            <instanceID>uuid:5f1ebc00-60ad-4462-bf22-ed5a915ca3a8</instanceID>\n          </meta>\n        </a88RxBtE3jwSar3cwiZTdn>', 'label': 'Answered', 'jsonDocResponse': {'a88RxBtE3jwSar3cwiZTdn': {'meta': {'attr': {}, 'instanceID': 'uuid:5f1ebc00-60ad-4462-bf22-ed5a915ca3a8'}, 'attr': {'id': 'a88RxBtE3jwSar3cwiZTdn', 'xmlns:jr': 'http://openrosa.org/javarosa', 'xmlns:orx': 'http://openrosa.org/xforms'}, 'end': '2023-01-04T17:11:52.270-08:00', 'group_hg4zz25': {'attr': {}, 'Personal_Care_activities': 'option_1', 'Voluntary_work_and_care_activities': '', 'Employment_related_a_Education_activities': '', 'Date': '2016-08-11', 'End_time': '11:53:00.000-08:00', 'Start_time': '11:43:00.000-08:00', 'Activity_Type': 'personal_care_activities', 'Domestic_activities': '', 'Recreation_and_leisure': '', 'Other': ''}, 'start': '2023-01-04T17:11:52.268-08:00'}}, 'start_ts': 1470940950.700465, 'name': 'TimeUseSurvey', 'version': 9}}
```

Run the pipeline; no errors

```
2023-01-04T17:30:11.922696-08:00**********UUID 90ba8f99-ca3a-4c12-bbc6-8e1bd074ba65: moving to long term**********
...
2023-01-04 17:30:12,039:DEBUG:4594421248:module_name = emission.net.usercache.formatters.ios.trip_addition_input
2023-01-04 17:30:12,042:DEBUG:4594421248:Timestamp conversion: 1672881166.827849 -> 1672881166.827849 done
...
Got error 'AttrDict' instance has no attribute 'currState' while saving entry AttrDict({'_id': ObjectId('63b6242de642ad2a393c21d7'), 'metadata': {'time_zone': 'America/Los_Angeles', 'plugin': 'none', 'write_ts': 1672881197.030221, 'platform': 'ios', 'read_ts': 0, 'key': 'statemachine/transition', 'type': 'message'}, 'user_id': UUID('90ba8f99-ca3a-4c12-bbc6-8e1bd074ba65'), 'data': {'curr_state': 'STATE_START', 'transition': 'T_EXITED_GEOFENCE', 'ts': 1672881197}}) -> None
Got error 'AttrDict' instance has no attribute 'currState' while saving entry AttrDict({'_id': ObjectId('63b6242de642ad2a393c21da'), 'metadata': {'time_zone': 'America/Los_Angeles', 'plugin': 'none', 'write_ts': 1672881197.0337992, 'platform': 'ios', 'read_ts': 0, 'key': 'statemachine/transition', 'type': 'message'}, 'user_id': UUID('90ba8f99-ca3a-4c12-bbc6-8e1bd074ba65'), 'data': {'curr_state': 'STATE_ONGOING_TRIP', 'transition': 'T_TRIP_ENDED', 'ts': 1672881197}}) -> None
2023-01-04T17:30:12.062962-08:00**********UUID 90ba8f99-ca3a-4c12-bbc6-8e1bd074ba65: updating incoming user inputs**********
```

Usercache only has entry from yesterday (does not have the entry from today)

```
>>> list(edb.get_usercache_db().find({"metadata.key": "manual/trip_addition_input"}))
[{'_id': ObjectId('63b4d85ae642ad2a393bf727'), 'metadata': {'time_zone': 'America/Los_Angeles', 'plugin': 'none', 'write_ts': 1672796195.641211, 'platform': 'ios', 'read_ts': 0, 'key': 'manual/trip_addition_input', 'type': 'message'}, 'user_id': UUID('90ba8f99-ca3a-4c12-bbc6-8e1bd074ba65'), 'data': {'version': 9, 'label': 'Answered', 'jsonDocResponse': {'end': '2023-01-03T17:36:18.229-08:00', 'Date': '2023-01-03', 'End_time': '18:36:00.000-08:00', 'Start_time': '17:36:00.000-08:00', ...}}]
```

Timeseries has only one entry from today

```
>>> list(edb.get_timeseries_db().find({"metadata.key": "manual/trip_addition_input"}))
[{'_id': ObjectId('63b6242de642ad2a393c21d0'), 'user_id': UUID('90ba8f99-ca3a-4c12-bbc6-8e1bd074ba65'), 'metadata': {'time_zone': 'America/Los_Angeles', 'plugin': 'none', 'write_ts': 1672881166.827849, 'platform': 'ios', 'read_ts': 0, 'key': 'manual/trip_addition_input', 'type': 'message', 'write_local_dt': {'year': 2023, 'month': 1, 'day': 4, 'hour': 17, 'minute': 12, 'second': 46, 'weekday': 2, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-01-04T17:12:46.827849-08:00'}, 'data': {'end_ts': 1470941549.534, 'xmlResponse': '<a88RxBtE3jwSar3cwiZTdn xmlns:jr="http://openrosa.org/javarosa" xmlns:orx="http://openrosa.org/xforms" id="a88RxBtE3jwSar3cwiZTdn">\n          <start>2023-01-04T17:11:52.268-08:00</start>\n        <end>2023-01-04T17:11:52.270-08:00</end>\n          <group_hg4zz25>\n            <Date>2016-08-11</Date>\n            <Start_time>11:43:00.000-08:00</Start_time> \n            <End_time>11:53:00.000-08:00</End_time>\n          <Activity_Type>personal_care_activities</Activity_Type>\n            <Personal_Care_activities>option_1</Personal_Care_activities>\n            <Employment_related_a_Education_activities/>\n            <Domestic_activities/>\n            <Recreation_and_leisure/>\n            <Voluntary_work_and_care_activities/>\n            <Other/>\n          </group_hg4zz25>\n          <meta>\n            <instanceID>uuid:5f1ebc00-60ad-4462-bf22-ed5a915ca3a8</instanceID>\n          </meta>\n        </a88RxBtE3jwSar3cwiZTdn>', 'label': 'Answered', 'jsonDocResponse': {'a88RxBtE3jwSar3cwiZTdn': {'meta': {'attr': {}, 'instanceID': 'uuid:5f1ebc00-60ad-4462-bf22-ed5a915ca3a8'}, 'attr': {'id': 'a88RxBtE3jwSar3cwiZTdn', 'xmlns:jr': 'http://openrosa.org/javarosa', 'xmlns:orx': 'http://openrosa.org/xforms'}, 'end': '2023-01-04T17:11:52.270-08:00', 'group_hg4zz25': {'attr': {}, 'Personal_Care_activities': 'option_1', 'Voluntary_work_and_care_activities': '', 'Employment_related_a_Education_activities': '', 'Date': '2016-08-11', 'End_time': '11:53:00.000-08:00', 'Start_time': '11:43:00.000-08:00', 'Activity_Type': 'personal_care_activities', 'Domestic_activities': '', 'Recreation_and_leisure': '', 'Other': ''}, 'start': '2023-01-04T17:11:52.268-08:00'}}, 'start_ts': 1470940950.700465, 'name': 'TimeUseSurvey', 'version': 9, 'start_local_dt': {'year': 2016, 'month': 8, 'day': 11, 'hour': 11, 'minute': 42, 'second': 30, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2016-08-11T11:42:30.700465-07:00', 'end_local_dt': {'year': 2016, 'month': 8, 'day': 11, 'hour': 11, 'minute': 52, 'second': 29, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'end_fmt_time': '2016-08-11T11:52:29.534000-07:00'}}]
```
Summary of changes:
- change the tripuserinput to have an enum "status" of ACTIVE or DELETED to support multi-inputs
- change the confirmed trip data model to support a `trip_addition` list
- change the matching code to support both use cases
    - if we expect a single user input per trip and want to take the most recent, we continue the current handling
    - if we expect multiple user inputs per trip and only choose the ACTIVE ones, we append or delete based on status

TODO/Notes:
- the current code only works for trips, not places; will need to be expanded to support places
- the current code only works for incoming inputs; will need to be expanded to support labels for draft trips
- the current code is backwards compatible; so it creates the `trip_addition` field for the confirmed trip if it doesn't exist, and it assumes that inputs without a status are ACTIVE. We may want to change them once we have changed the UI to match

Testing done:
- Unit tests pass
- After running the pipeline with a trip addition, we get

```
2023-01-05 08:54:18,438:DEBUG:4819070464:Saving entry Entry({'_id': ObjectId('63b4c5425f9b032a64764293'), 'user_id': UUID('90ba8f99-ca3a-4c12-bbc6-8e1bd074ba65'), 'metadata': {'key': 'analysis/confirmed_trip', 'platform': 'server', 'write_ts': 1672791362.761187, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 1, 'day': 3, 'hour': 16, 'minute': 16, 'second': 2, 'weekday': 1, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-01-03T16:16:02.761187-08:00'}, 'data': {'source': 'DwellSegmentationTimeFilter', 'end_ts': 1470941549.534, 'end_local_dt': {'year': 2016, 'month': 8, 'day': 11, 'hour': 11, 'minute': 52, 'second': 29, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'end_fmt_time': '2016-08-11T11:52:29.534000-07:00', 'end_loc': {'type': 'Point', 'coordinates': [-122.272921, 37.8062295]}, 'raw_trip': ObjectId('63b4c5405f9b032a64764161'), 'start_ts': 1470940950.700465, 'start_local_dt': {'year': 2016, 'month': 8, 'day': 11, 'hour': 11, 'minute': 42, 'second': 30, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2016-08-11T11:42:30.700465-07:00', 'start_loc': {'type': 'Point', 'coordinates': [-122.2726109, 37.8059891]}, 'duration': 598.8335349559784, 'distance': 611.5515222116015, 'start_place': ObjectId('63b4c5425f9b032a6476426a'), 'end_place': ObjectId('63b4c5425f9b032a6476426b'), 'cleaned_trip': ObjectId('63b4c5415f9b032a64764202'), 'inferred_labels': [], 'inferred_trip': ObjectId('63b4c5425f9b032a64764285'), 'expectation': {'to_label': True}, 'confidence_threshold': 0.55, 'expected_trip': ObjectId('63b4c5425f9b032a6476428e'), 'user_input': {}, 'trip_addition': [Entry({'_id': ObjectId('63b6242de642ad2a393c21d0'), 'user_id': UUID('90ba8f99-ca3a-4c12-bbc6-8e1bd074ba65'), 'metadata': {'time_zone': 'America/Los_Angeles', 'plugin': 'none', 'write_ts': 1672881166.827849, 'platform': 'ios', 'read_ts': 0, 'key': 'manual/trip_addition_input', 'type': 'message', 'write_local_dt': {'year': 2023, 'month': 1, 'day': 4, 'hour': 17, 'minute': 12, 'second': 46, 'weekday': 2, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-01-04T17:12:46.827849-08:00'}, 'data': {'end_ts': 1470941549.534, 'xmlResponse': '...', 'label': 'Answered', 'jsonDocResponse': {'a88RxBtE3jwSar3cwiZTdn': {'meta': {'attr': {}, 'instanceID': 'uuid:5f1ebc00-60ad-4462-bf22-ed5a915ca3a8'}, 'attr': {'id': 'a88RxBtE3jwSar3cwiZTdn', 'xmlns:jr': 'http://openrosa.org/javarosa', 'xmlns:orx': 'http://openrosa.org/xforms'}, 'end': '2023-01-04T17:11:52.270-08:00', 'group_hg4zz25': {'attr': {}, 'Personal_Care_activities': 'option_1', 'Voluntary_work_and_care_activities': '', 'Employment_related_a_Education_activities': '', 'Date': '2016-08-11', 'End_time': '11:53:00.000-08:00', 'Start_time': '11:43:00.000-08:00', 'Activity_Type': 'personal_care_activities', 'Domestic_activities': '', 'Recreation_and_leisure': '', 'Other': ''}, 'start': '2023-01-04T17:11:52.268-08:00'}}, 'start_ts': 1470940950.700465, 'name': 'TimeUseSurvey', 'version': 9, 'start_local_dt': {'year': 2016, 'month': 8, 'day': 11, 'hour': 11, 'minute': 42, 'second': 30, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2016-08-11T11:42:30.700465-07:00', 'end_local_dt': {'year': 2016, 'month': 8, 'day': 11, 'hour': 11, 'minute': 52, 'second': 29, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'end_fmt_time': '2016-08-11T11:52:29.534000-07:00'}})]}}) into timeseries
```
…trip was created

In e-mission@2129075
we added support for matching inputs that came in after the trip was created

However, it is also possible that the input came in before the trip was processed.
In that case, we need to find all input matches for the trip and fill it in when the trip is created

This is currently implemented in `final_candidate` for the single most recent match
We add a new function `get_not_deleted_candidates` that is similar, and write unit tests for both of them.

We then add a simple wrapper `get_additions_for_trip_object`, similar to `get_user_input_for_trip_object`
and use it to fill in the `confirmed_trip_dict["data"]["trip_addition"]` while creating confirmed trips

Testing done:
- unit tests pass
In a7ee02c, we implemented and unit tested
`get_additions_for_trip_object`. I naively assumed that the super simple
wrapper for it `get_additions_for_trip_object` would work correctly.

Unfortunately, since it wasn't tested, it really didn't work.
There were some syntax error that broke other unit tests.

Fixed them, and added a simple unit test to ensure that this functionality
continues to work.

Testing done:
- new unit test works
- one of the previous broken tests also works now
- The actual code changes are fairly trivial
- In the tests, change all `match_id` to `_id` in the fake data

Couple of more significant test changes:
- before this, we could not have two elements with the same `_id` in the database,
so the bulk_insert in `testGetAdditionsForTripObjects` would fail after the
first entry, and the result would be the first entry.
    - Now that we can have separate `match_id` entries, all the 3 entries are
      inserted and the first two cancel each other. This results in the
      **third** entry being the result
    - Since we no longer specify the `_id`, the database adds the `_id` field
      our fake data does not have an `_id` so we need to delete the field,
      otherwise the extra field will cause the equality check on the results to fail
Matching for new user inputs broke (regression due to
e-mission@a7ee02c)

The main issue was that we were filling in the `trip_additions` with None but only checking for existence.
More details at:
e-mission/e-mission-docs#840 (comment)

Fixing it in two different ways:
- Returning `[]` instead of None from `get_not_deleted_candidates` if there are no filtered candidates
  This will ensure that we don't have a None value
- check for both None and existence. This will ensure that even if we do have a None value, we can recover gracefully

Testing done:

Original failure:

```
2023-01-19T09:58:57.198828-08:00**********UUID e20f451e-c619-46c8-a189-54310231bce5: updating incoming user inputs**********
Error while matching incoming user inputs, timestamp is unchanged
Traceback (most recent call last):
  File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 19, in match_incoming_user_inputs
    last_user_input_done = match_incoming_inputs(user_id, time_query)
  File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 46, in match_incoming_inputs
    handle_multi_non_deleted_match(confirmed_trip, ui)
  File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 70, in handle_multi_non_deleted_match
    confirmed_trip["data"]["trip_addition"].append(ui)
AttributeError: 'NoneType' object has no attribute 'append'
```

New failure (updating incoming), no `match_id`

```
2023-01-19T09:59:26.680014-08:00**********UUID e20f451e-c619-46c8-a189-54310231bce5: updating incoming user inputs**********
Error while matching incoming user inputs, timestamp is unchanged
Traceback (most recent call last):
  File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 19, in match_incoming_user_inputs
    last_user_input_done = match_incoming_inputs(user_id, time_query)
  File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 46, in match_incoming_inputs
    handle_multi_non_deleted_match(confirmed_trip, ui)
  File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 72, in handle_multi_non_deleted_match
    after_del_list = [ta for ta in confirmed_trip["data"]["trip_addition"] if ta["match_id"] != ui["match_id"]]
  File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 72, in <listcomp>
    after_del_list = [ta for ta in confirmed_trip["data"]["trip_addition"] if ta["match_id"] != ui["match_id"]]
KeyError: 'match_id'
```

New failure (creating objects), no `match_id`

```
2023-01-19T14:47:59.883681-08:00**********UUID e20f451e-c619-46c8-a189-54310231bce5: creating confirmed objects **********
Error while creating confirmed objects, timestamp is unchanged
Traceback (most recent call last):
  File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 81, in create_confirmed_objects
    last_expected_trip_done = create_confirmed_trips(user_id, time_query)
  File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 110, in create_confirmed_trips
    esdt.get_additions_for_trip_object(ts, tct)
  File "/Users/kshankar/e-mission/gis_branch_tests/emission/storage/decorations/trip_queries.py", line 191, in get_additions_for_trip_object
    return get_not_deleted_candidates(valid_user_input(ts, trip_obj), potential_candidates)
  File "/Users/kshankar/e-mission/gis_branch_tests/emission/storage/decorations/trip_queries.py", line 168, in get_not_deleted_candidates
    not_deleted_active = [efpc for efpc in all_active_list if efpc["match_id"] not in all_deleted_id]
  File "/Users/kshankar/e-mission/gis_branch_tests/emission/storage/decorations/trip_queries.py", line 168, in <listcomp>
    not_deleted_active = [efpc for efpc in all_active_list if efpc["match_id"] not in all_deleted_id]
KeyError: 'match_id'
```
…his other location as well

Fixes a regression caused by
e-mission@4c6d198
To create confirmedplace.py, I also created expectedplace.py and inferredplace.py
Because these files are not needed as places do not need inferred and expected, i am removing them
pipelinestate.py
- Added a 17th ENUM to the PipelineStages named "CREATE_PLACE_OBJECTS"

reset.py
- Added a Pipeline stage to the stages_with_fuzz array for CREATE_PLACE_OBJECTS

common.py
- Added a "create_place_objects" function after all the other pipeline stage functions

intake_stage.py
- Added a stage for the intake to store the pipeline time for CREATE_PLACE_OBJECTS, and call the create_place_object function

confirmedplace.py
- Removed expected_place and primary_section because @shankari  said in our last meeting that the confirmedplace object does not need these

matcher.py
- Created the create_place_objects function which follows the same format as create_confirmed_objects, but only for trips. It uses the same existing mark_confirmed_object_creation_done/failed functions, and the same get_time_range_for_confirmed_object_creation
- Added a create_confirmed_places function which follows the same format as create_confirmed_trips; I removed any data which does not exist in the confirmed_places emission/core/wrapper/confirmedplace.py data model
-- additions matching function genericized for both trips and places
- this new stage in the pipeline will create composite trip objects, after confirmed objects have been created
Per #4 (comment) we will rename this to confirmed_object since it does not specifically need to be a trip
Per #4 (comment) we will remove inferred_labels, expectation, inferred_primary_mode, and user_input since confirmed_place will not need that for now
…g.debug

Per #4 (comment) we will remove Logging.info and replace with Logging.debug statements
- we will keep user_input
- 'cleaned place' will be used since there is no 'expected place'
- clarify comments/ log statements
shankari added 27 commits April 11, 2023 20:03
Before the composite trip changes, we retrieved confirmed trips for display.
So if/when the confirmed trips were updated, we would see them on the next phone retrieval.
In other words, we displayed the terminal output of the pipeline.

However, with the composite trip code, the terminal output is the composite trip.
But we update the confirmed trips as part of the matching.
So if we get new user inputs or update the confirmed place exit timestamps,
they will not be reflected in the display. We are no longer displaying the
terminal output of the pipeline, we have another display layer on top.

So when we update the confirmed objects, we need to figure out which the
matching composite objects are and update them as well.

We generally don't include forward references in our data objects, only backward references.
We already have a reference from the composite trip to the associated end_confirmed_place
We also add a reference to the associated confirmed_trip

Then, when we update a confirmed object, we find the related composite object
and update it as well.

This fixes: e-mission/e-mission-docs#880 (comment)
e-mission/e-mission-docs#880 (comment)
e-mission/e-mission-docs#880 (comment)
e-mission/e-mission-docs#880 (comment)

Testing done:

- Loaded data, no additions
- Added place additions via UI
- Pushed changes
- Re-ran pipeline

Additions match

```
>>> pd.json_normalize(all_places)["data.additions"]
0                                                   []
1                                                   []
2    [{'_id': 64361902f5622167bcbdf190, 'user_id': ...
3    [{'_id': 64361902f5622167bcbdf1c2, 'user_id': ...
4    [{'_id': 64361902f5622167bcbdf208, 'user_id': ...
5    [{'_id': 64361902f5622167bcbdf246, 'user_id': ...
6                                                   []
7                                                   []
8                                                   []
9                                                   []
```

Also in the composite trips

```
>>> pd.json_normalize(all_ct)["data.end_confirmed_place.data.additions"]
0                                                   []
1    [{'_id': 64361902f5622167bcbdf190, 'user_id': ...
2    [{'_id': 64361902f5622167bcbdf1c2, 'user_id': ...
3    [{'_id': 64361902f5622167bcbdf208, 'user_id': ...
4    [{'_id': 64361902f5622167bcbdf246, 'user_id': ...
5                                                   []
6                                                   []
7                                                   []
8                                                   []
```

Added trip inputs and additions via UI
Pushed changes
Re-ran pipeline; both show up in the confirmed trips

```
>>> pd.json_normalize(all_ct)["data.additions"]
0                                                   []
1                                                   []
2                                                   []
3                                                   []
4    [{'_id': 64361c6bf5622167bcbe08cf, 'user_id': ...
5    [{'_id': 64361c6bf5622167bcbe0921, 'user_id': ...
6                                                   []
7                                                   []
8                                                   []

>>> for ct in all_ct:
...     print(ct["data"]["user_input"])
...
{}
{}
{}
{}
{'trip_user_input':...}
{'trip_user_input':...}
{'trip_user_input':...}
{}
{}
```
First set of tests for `updateConfirmedAndComposite`
The tests are coded as sub-tests of one giant test so that we don't have to
load data and run the pipeline multiple times.

```
testSetConfirmedTripUserInput DONE
.
----------------------------------------------------------------------
Ran 1 test in 32.686s

OK
```
…ite trip list

The old diary function is retained until we remove the diary code
We use subcommands since we only need the date for the diary and not for the composite trip list
(for which we just download all trips)

Testing done:
- Generated composite trip lists for unit tests
- we were filling in `confirmed_start_place["additions"]` but it should
  actually be in `data`
+ Also improve a log message to help debug this in the future
… trips

- Load data for the 4th
- Run pipeline
- Load inputs for the 5th
- Run pipeline (they all match to the last place)
- Load data for the 5th
- Run pipeline (they all spread out)
- Load inputs (including trip inputs) for the 4th
- Run pipeline (they all match)

This essentially reproduces
e-mission/e-mission-docs#880 (comment)

Screenshots of the various test stages:
e-mission/e-mission-docs#880 (comment)
- we should get the confirmed place and composite trip count before starting to create composite trips
    otherwise the counts will be steadily increasing
- we should only create the start place or the end place for each trip, not both
    - creating both will lead to duplicate trips
    - we handle this by checking whether the confirmed place corresponding to a cleaned place already
        exists and fixing it
- fixed some start/end mismatches caused by copy pasting

Testing done:
- Reset to an older version of the code
- Ran the pipeline
- Moved ahead to this code
- Re-ran the pipeline
- verified that the object counts were correct
- verified that the resulting composite trips matched the expected values from fb6a982

e-mission/e-mission-docs#880 (comment)
e-mission/e-mission-docs#880 (comment)
e-mission/e-mission-docs#880 (comment)
e-mission/e-mission-docs#880 (comment)
e-mission/e-mission-docs#880 (comment)
e-mission/e-mission-docs#880 (comment)
…eation

Multiple fixes to link the confirmed trip timeline correctly + update composite trips properly
This fixes
e-mission/e-mission-docs#880 (comment)

For the fix, we pass in whether the time type should be forced or inferred.
It is currently forced since all the inputs (trip input, trip addition, place addition) use `data.start_ts`.
If we do support place inputs in the future, we need to decide whether to use start/end or enter/exit
and change this code appropriately

Testing done:
- Added an automated test for this scenario
- Stepped through the test and confirmed that it worked properly
e-mission/e-mission-docs#880 (comment)
- Automated test passes
Because of e-mission/e-mission-docs#880 (comment)
we weren't actually trying to match all entries from the last place properly.
After fixing it, while running the existing test, we actually executed
`_get_next_cleaned_timeline_entry`, which failed because `starting_trip` was not a field in `Cleanedplace`

The check should not have been `tl_entry.data.end_place is not None` and
`tl_entry.data.starting_trip is not None`, which generate key errors, but
rather, `"end_place" in tl_entry.data`.

If we didn't find a matching cleaned trip, we also needed to check if we found
some untracked time instead

With these changes, the test passes
This makes it easier to debug, since we can then see the state of system in the UI, re-run the pipeline with logs, etc

If the flag is true, we also register the UUID as `automated_tests` to make it easier to access via the UI

```
SKIP_TEARDOWN=1 ./e-mission-py.bash emission/tests/analysisTests/intakeTests/TestPipelineRealData.py TestPipelineRealData.testCompositeTripIncremental
```
- while returning early with no locations
- while verifying the origin key in the unit tests

+ change the hardcoded `analysis/confirmed_place` to `esda.CONFIRMED_PLACE_KEY`
per #13 (comment)
…buntu

Ubuntu 18 is deprecated
https://github.blog/changelog/2022-08-09-github-actions-the-ubuntu-18-04-actions-runner-image-is-being-deprecated-and-will-be-removed-by-12-1-22/

And since we are removing it, we might as well add tests against the OSX
environment to ensure that the dev environment is kept up to date
So that the action fails when the test fails

We do ensure that the webserver exit code is the exit code for the action

```
$ docker-compose -f setup/docker-compose.tests.yml up --exit-code-from web-server
```

But without this change, the exit code of the webserver is the exit code of the teardown command

```
web-server_1  | Ran 327 tests in 1168.905s
web-server_1  |
web-server_1  | FAILED (failures=5)
web-server_1  | Found pc_trip 2016-12-12T09:19:09.784000-08:00
web-server_1  | None
web-server_1  | Found pc_trip 2016-12-12T18:54:58.134886-08:00
web-server_1  | Removing environment from
web-server_1  |
web-server_1  | Remove all packages in environment /root/miniconda-23.1.0/envs/emissiontest:
web-server_1  |
setup_web-server_1 exited with code 0
```
We tried to add a test using an OSX manual install, but it failed with the error
So we can only continue to test on linux for now

```
Run supercharge/[email protected]
  with:
    mongodb-version: 4.4.0
Error: Container action is only supported on Linux
```
Before this, we returned EPOCH_MAXIMUM for the timestamp in case the end_ts was
None, as would happen for the last place in the timeline.

However, the first place in the timeline will have `enter_ts` as None. We
handle this by setting it to EPOCH_MINIMUM (aka 0) in both locations where we
were using EPOCH_MAXIMUM.

+ new logging statement to help debug this issue; currently commented out

Testing done:
Previously failing test passes e-mission#895 (comment)

This fixes
e-mission#895 (comment)
By ensuring that we used "format str" % args
instead of "format str", args
In the hope that it can resolve this surprising inconsistency
e-mission#895 (comment)
Since the docker CI is configured to use `db` with `DB_HOST`, we won't see the
`storage not configured` message. Changing it to ensure that we can pass in
both environments
Add server-side support for storing trip and place additions
…into gis-based-mode-detection

CONFLICT (content): Merge conflict in emission/analysis/intake/cleaning/clean_and_resample.py
- `resample`
- `_get_timezone`
- `_get_tz_ranges`
are in master, not in GIS

Where are they in GIS?
Moved out to
https://github.com/e-mission/e-mission-server/commits/gis-based-mode-detection/emission/analysis/intake/location_utils.py
Changes to master are at
e-mission@a6ac020
Ported over the change to the `location_utils.py` as well

CONFLICT (content): Merge conflict in emission/tests/analysisTests/intakeTests/TestPipelineRealData.py

Two conflicts similar to this

```
<<<<<<< HEAD
        with open("emission/tests/data/real_examples/shankari_2016-independence_day.alt.ground_truth") as gfp:
            ground_truth = json.load(gfp, object_hook=bju.object_hook)
=======
        with open("emission/tests/data/real_examples/shankari_2016-independence_day.ground_truth") as gfp:
            ground_truth = bju.loads(gfp.read(), json_options = bju.LEGACY_JSON_OPTIONS.with_options(uuid_representation= UuidRepresentation.PYTHON_LEGACY))
>>>>>>> 022182f
```

Fixing them by reading the alt ground truth in the new way.
Although honestly this is super clunky and we should change everything to use json.load with the lambda
Failures:
- Several instances of

```
TypeError: 'Collection' object is not callable. If you meant to call the 'remove' method on a 'Collection' object it is failing because no such method exists.
```

Failing tests:

```
ERROR: testAirOverrideHack (analysisTests.modeinferTests.TestPipeline.TestPipeline)
ERROR: testConvertPredictedProbToMap (analysisTests.modeinferTests.TestPipeline.TestPipeline)
ERROR: testEntirePipeline (analysisTests.modeinferTests.TestPipeline.TestPipeline)
ERROR: testFeatureGenWithOnePoint (analysisTests.modeinferTests.TestPipeline.TestPipeline)
ERROR: testGenerateFeatureMatrixAndIds (analysisTests.modeinferTests.TestPipeline.TestPipeline)
ERROR: testPredictedProb (analysisTests.modeinferTests.TestPipeline.TestPipeline)
ERROR: testSavePredictionsStep (analysisTests.modeinferTests.TestPipeline.TestPipeline)
ERROR: testSelectFeatureIndicesStep (analysisTests.modeinferTests.TestPipeline.TestPipeline)
```

These are all from the same lower level trace

```
  File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/classification/inference/mode/reset.py", line 57, in del_objects_after
    result = edb.get_analysis_timeseries_db().remove(del_query)
  File "/Users/kshankar/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/pymongo/collection.py", line 3213, in __call__
    raise TypeError(
TypeError: 'Collection' object is not callable. If you meant to call the 'remove' method on a 'Collection' object it is failing because no such method exists.
```

Fix: replaced `remove` with `delete_many`

- Few inconsistent results in new tests

```
FAIL: testCompositeTripIncremental (analysisTests.intakeTests.TestPipelineRealData.TestPipelineRealData)
self.assertEqual(len(ct['data']['locations']), len(et['data']['locations']))
AssertionError: 88 != 86

FAIL: testCompositeTripIncrementalLastPlaceMatches (analysisTests.intakeTests.TestPipelineRealData.TestPipelineRealData)
self.assertEqual(ct['data']['start_ts'], et['data']['start_ts'])
AssertionError: 1681523387.7296667 != 1681523302.4223254
```

Fix: create `alt` ground truth files and using them instead

- Existing failures in existing tests

```
        FAIL: testAllTimestampMetrics (netTests.TestMetricsConfirmedTrips.TestMetrics)
        AssertionError: 4108.083511703452 != 4305.02678 within 3 places (196.943268296548 difference)
```

and similar failures in the same test

Fix: replaced with the new values since they are off by a very small amount

```
-        self.assertAlmostEqual(user_met_dist_result[0]["label_bike"], 4305.02678, places=3)
+        self.assertAlmostEqual(user_met_dist_result[0]["label_bike"], 4108.08351, places=3)

-        self.assertAlmostEqual(user_met_old_spd_result[0]["label_bike"], 2.24535722467578, places=3)
+        self.assertAlmostEqual(user_met_old_spd_result[0]["label_bike"], 2.195459, places=3)

-        self.assertAlmostEqual(user_met_spd_result[0]["label_bike"], 2.24535722467578, places=3)
+        self.assertAlmostEqual(user_met_spd_result[0]["label_bike"], 2.195459, places=3)
```

Testing done:
all tests pass
@shankari shankari changed the title Merge to master Merge to main repo Apr 17, 2023
@shankari shankari merged commit 66d9340 into e-mission:gis-based-mode-detection Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants