Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💩 🔧 Enable suspending data collection for certain subgroups #1018

Merged
merged 1 commit into from
Feb 7, 2025

Conversation

shankari
Copy link
Contributor

@shankari shankari commented Feb 7, 2025

This revises the initial implementation in
#1016
of a hack for e-mission/e-mission-docs#1104

The request in the issue was to implement the ability to suspend and resume data collection.

The previous implementation checked for suspended subgroups in the auth method, and used the request.path to determine whether it was a usercache/put call or not. This was needed because, by design, we only see the opcode in the auth module; the rest of the code works only with UUIDs

The problem with that approach was that if the subgroup was suspended, we returned a UUID of None, which triggered a 403 error, which, in turn, caused the phone app to detect a failed call, so it did not delete the entries locally.

#1016 (comment)
#1016 (comment)

We fix this by returning the non-random parts of the opcode (at least the subgroup) in a context, and perform the validation downstream (in the putIntoCache method). So if the subgroup is suspended, putIntoCache just returns early instead of storing the values.

Note that this is also a bit of a hack, given that the way we handle the dynamic auth right now is to reset the global auth_method!! So we can't use a check for auth_method == 'dynamic' anywhere outside resolve_auth and have
to use the existence of the dynamic config as the check for whether the auth method was dynamic or not

Testing done:

  • TestWebserver.py passes
  • TestAuthSelection.py does not need any changes since we made sure to be backwards compatible

When trying to push to a suspended subgroup, we get

2025-02-07 11:10:26,127:DEBUG:6362165248:START POST /usercache/put
2025-02-07 11:10:26,127:DEBUG:6362165248:Called userCache.put
2025-02-07 11:10:26,189:DEBUG:6362165248:methodName = skip, returning <class 'emission.net.au
th.skip.SkipMethod'>
2025-02-07 11:10:26,189:DEBUG:6362165248:Using the skip method to verify id token nrelop_dev-
emulator-study_default_123 of length 37
2025-02-07 11:10:26,190:DEBUG:6362165248:subgroup default found in list ['test', 'default']
2025-02-07 11:10:26,190:DEBUG:6362165248:retContext = {'subgroup': 'default', 'user_id': UUID
('eb4a7aae-f2d4-4480-ba85-c568a45591b5')}
2025-02-07 11:10:26,191:DEBUG:6362165248:user_uuid {'subgroup': 'default', 'user_id': UUID('e
b4a7aae-f2d4-4480-ba85-c568a45591b5')}
2025-02-07 11:10:26,191:DEBUG:6362165248:suspended_subgroups=['default']
2025-02-07 11:10:26,191:INFO:6362165248:Received put message for subgroup default in suspende
d_subgroups=['default'], ignoring
2025-02-07 11:10:26,191:DEBUG:6362165248:END POST /usercache/put eb4a7aae-f2d4-4480-ba85-c568

When trying to push to a non-suspended subgroup, we get

2025-02-07 11:26:13,627:DEBUG:12901707776:START POST /usercache/put
2025-02-07 11:26:13,628:DEBUG:12901707776:Called userCache.put
2025-02-07 11:26:13,631:DEBUG:12901707776:methodName = skip, returning <class 'emission.net.auth.skip.SkipMethod'>
2025-02-07 11:26:13,631:DEBUG:12901707776:Using the skip method to verify id token nrelop_dev-emulator-study_test_123 of length 34
2025-02-07 11:26:13,634:DEBUG:12901707776:subgroup test found in list ['test', 'default']
2025-02-07 11:26:13,634:DEBUG:12901707776:retContext = {'subgroup': 'test', 'user_id': UUID('feb70456-abf4-444b-8848-1515fc3470cf')}
2025-02-07 11:26:13,634:DEBUG:12901707776:user_uuid {'subgroup': 'test', 'user_id': UUID('feb70456-abf4-444b-8848-1515fc3470cf')}
2025-02-07 11:26:13,634:DEBUG:12901707776:suspended_subgroups=['default']
2025-02-07 11:26:13,644:DEBUG:12901707776:Updated result for user = feb70456-abf4-444b-8848-1515fc3470cf, key = stats/client_time, write_ts = 1738956332.243175 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a65e5546756f8244f024f6'), 'ok': 1.0, 'updatedExisting': False}
2025-02-07 11:26:13,646:DEBUG:12901707776:Updated result for user = feb70456-abf4-444b-8848-1515fc3470cf, key = stats/client_time, write_ts = 1738956332.246808 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a65e5546756f8244f024f8'), 'ok': 1.0, 'updatedExisting': False}
2025-02-07 11:26:13,857:DEBUG:12901707776:Updated result for user = feb70456-abf4-444b-8848-1515fc3470cf, key = statemachine/transition, write_ts = 1738956373.598038 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a65e5546756f8244f025cd'), 'ok': 1.0, 'updatedExisting': False}
2025-02-07 11:26:13,858:DEBUG:12901707776:Updated result for user = feb70456-abf4-444b-8848-1515fc3470cf, key = background/battery, write_ts = 1738956373.5991821 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a65e5546756f8244f025cf'), 'ok': 1.0, 'updatedExisting': False}
2025-02-07 11:26:13,859:DEBUG:12901707776:Updated result for user = feb70456-abf4-444b-8848-1515fc3470cf, key = stats/client_nav_event, write_ts = 1738956373.607026 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a65e5546756f8244f025d1'), 'ok': 1.0, 'updatedExisting': False}
2025-02-07 11:26:13,860:DEBUG:12901707776:END POST /usercache/put feb70456-abf4-444b-8848-1515fc3470cf 0.2325270175933838

@shankari
Copy link
Contributor Author

shankari commented Feb 7, 2025

@JGreenlee @TeachMeTW once this change is on production, we can discuss if we want to continue with this hack on the server, or just go to the principled fix, in which case, most of these 💩 changes can be reverted.

This revises the initial implementation in
e-mission#1016
of a hack for e-mission/e-mission-docs#1104

The request in the issue was to implement the ability to suspend and resume
data collection.

The previous implementation checked for suspended subgroups in the auth method,
and used the `request.path` to determine whether it was a `usercache/put` call or not
This was needed because, by design, we only see the opcode in the auth module;
the rest of the code works only with UUIDs

The problem with that approach was that if the subgroup was suspended, we
returned a UUID of None, which triggered a 403 error, which, in turn, caused
the phone app to detect a failed call, so it did not delete the entries locally.

e-mission#1016 (comment)
e-mission#1016 (comment)

We fix this by returning the non-random parts of the opcode (at least the
subgroup) in a context, and perform the validation downstream (in the
`putIntoCache` method). So if the subgroup is suspended, `putIntoCache` just
returns early instead of storing the values.

Note that this is also a bit of a hack, given that the way we handle the
dynamic auth right now is to reset the global `auth_method`!! So we can't use
a check for `auth_method == 'dynamic'` anywhere outside `resolve_auth` and have
to use the existence of the dynamic config as the check for whether the auth
method was dynamic or not

Testing done:
- TestWebserver.py passes
- TestAuthSelection.py does not need any changes since we made sure to be
  backwards compatible

When trying to push to a suspended subgroup, we get

```
2025-02-07 11:10:26,127:DEBUG:6362165248:START POST /usercache/put
2025-02-07 11:10:26,127:DEBUG:6362165248:Called userCache.put
2025-02-07 11:10:26,189:DEBUG:6362165248:methodName = skip, returning <class 'emission.net.au
th.skip.SkipMethod'>
2025-02-07 11:10:26,189:DEBUG:6362165248:Using the skip method to verify id token nrelop_dev-
emulator-study_default_123 of length 37
2025-02-07 11:10:26,190:DEBUG:6362165248:subgroup default found in list ['test', 'default']
2025-02-07 11:10:26,190:DEBUG:6362165248:retContext = {'subgroup': 'default', 'user_id': UUID
('eb4a7aae-f2d4-4480-ba85-c568a45591b5')}
2025-02-07 11:10:26,191:DEBUG:6362165248:user_uuid {'subgroup': 'default', 'user_id': UUID('e
b4a7aae-f2d4-4480-ba85-c568a45591b5')}
2025-02-07 11:10:26,191:DEBUG:6362165248:suspended_subgroups=['default']
2025-02-07 11:10:26,191:INFO:6362165248:Received put message for subgroup default in suspende
d_subgroups=['default'], ignoring
2025-02-07 11:10:26,191:DEBUG:6362165248:END POST /usercache/put eb4a7aae-f2d4-4480-ba85-c568
```

When trying to push to a non-suspended subgroup, we get

```
2025-02-07 11:26:13,627:DEBUG:12901707776:START POST /usercache/put
2025-02-07 11:26:13,628:DEBUG:12901707776:Called userCache.put
2025-02-07 11:26:13,631:DEBUG:12901707776:methodName = skip, returning <class 'emission.net.auth.skip.SkipMethod'>
2025-02-07 11:26:13,631:DEBUG:12901707776:Using the skip method to verify id token nrelop_dev-emulator-study_test_123 of length 34
2025-02-07 11:26:13,634:DEBUG:12901707776:subgroup test found in list ['test', 'default']
2025-02-07 11:26:13,634:DEBUG:12901707776:retContext = {'subgroup': 'test', 'user_id': UUID('feb70456-abf4-444b-8848-1515fc3470cf')}
2025-02-07 11:26:13,634:DEBUG:12901707776:user_uuid {'subgroup': 'test', 'user_id': UUID('feb70456-abf4-444b-8848-1515fc3470cf')}
2025-02-07 11:26:13,634:DEBUG:12901707776:suspended_subgroups=['default']
2025-02-07 11:26:13,644:DEBUG:12901707776:Updated result for user = feb70456-abf4-444b-8848-1515fc3470cf, key = stats/client_time, write_ts = 1738956332.243175 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a65e5546756f8244f024f6'), 'ok': 1.0, 'updatedExisting': False}
2025-02-07 11:26:13,646:DEBUG:12901707776:Updated result for user = feb70456-abf4-444b-8848-1515fc3470cf, key = stats/client_time, write_ts = 1738956332.246808 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a65e5546756f8244f024f8'), 'ok': 1.0, 'updatedExisting': False}
2025-02-07 11:26:13,857:DEBUG:12901707776:Updated result for user = feb70456-abf4-444b-8848-1515fc3470cf, key = statemachine/transition, write_ts = 1738956373.598038 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a65e5546756f8244f025cd'), 'ok': 1.0, 'updatedExisting': False}
2025-02-07 11:26:13,858:DEBUG:12901707776:Updated result for user = feb70456-abf4-444b-8848-1515fc3470cf, key = background/battery, write_ts = 1738956373.5991821 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a65e5546756f8244f025cf'), 'ok': 1.0, 'updatedExisting': False}
2025-02-07 11:26:13,859:DEBUG:12901707776:Updated result for user = feb70456-abf4-444b-8848-1515fc3470cf, key = stats/client_nav_event, write_ts = 1738956373.607026 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a65e5546756f8244f025d1'), 'ok': 1.0, 'updatedExisting': False}
2025-02-07 11:26:13,860:DEBUG:12901707776:END POST /usercache/put feb70456-abf4-444b-8848-1515fc3470cf 0.2325270175933838
```
@shankari shankari force-pushed the suspend_certain_subgroups branch from 9b5131a to 788ffaf Compare February 7, 2025 20:20
@shankari shankari merged commit 72e92a8 into e-mission:master Feb 7, 2025
5 checks passed
@shankari
Copy link
Contributor Author

shankari commented Feb 7, 2025

On the server, I now see

2025-02-07T14:33:05.238-08:00 2025-02-07 22:33:05,238:DEBUG:139912475055680:suspended_subgroups=['temp']
2025-02-07T14:33:05.238-08:00 2025-02-07 22:33:05,238:INFO:139912475055680:Received put message for subgroup temp in suspended_subgroups=['temp'], ignoring

And on the phone, I see a "successful" push

16013,1738967585.136,2025-02-07T14:33:05.136000-08:00,BuiltinUserCache : Result count = 356
16018,1738967585.546,2025-02-07T14:33:05.546000-08:00,CommunicationHelper : Got response org.apache.http.message.BasicHttpResponse@4c3ab75 with statusHTTP/1.1 200 OK
16019,1738967585.565,2025-02-07T14:33:05.565000-08:00,BuiltinUserCache : Clearing entries for timequery 1.738793776E9 < write_ts < 1.738902464E9

And no cached entries when I next check.

3366,1738799958.856,2025-02-05T15:59:18.856000-08:00,TripDiaryStateMachineRcvr : START PERIODIC ACTIVITY
3382,1738799958.874,2025-02-05T15:59:18.874000-08:00,"BuiltinUserCache : While searching for regex of last transition, got 0 results"
3385,1738799958.876,2025-02-05T15:59:18.876000-08:00,BuiltinUserCache : Last trip end was at -1.0
3386,1738799958.877,2025-02-05T15:59:18.877000-08:00,"BuiltinUserCache : We don't have a completed trip, so we don't want to push anything yet"

@shankari
Copy link
Contributor Author

shankari commented Feb 7, 2025

I'm going to unsuspend temp so I can verify the non-suspended case once more, but then this is going to production

shankari added a commit to e-mission/nrel-openpath-deploy-configs that referenced this pull request Feb 7, 2025
This reverts commit 5f54292.
So that we can re-verify the common case before deploying to production
e-mission/e-mission-server#1018 (comment)
@shankari
Copy link
Contributor Author

shankari commented Feb 8, 2025

After unsuspending temp, I took a trip to run some errands. The push while on the errand failed due to lack of data.

18619,1738976991.436,2025-02-07T17:09:51.436000-08:00,BuiltinUserCache : Result count = 261
18684,1738977026.246,2025-02-07T17:10:26.246000-08:00,ServerSyncAdapter : IO Error javax.net.ssl.SSLPeerUnverifiedException: Cannot verify hostname: openpath-stage.nrel.gov while posting converted trips to JSON

But it succeeded after I returned home

19965,1738978237.716,2025-02-07T17:30:37.716000-08:00,BuiltinUserCache : Result count = 364
19989,1738978241.637,2025-02-07T17:30:41.637000-08:00,CommunicationHelper : Got response org.apache.http.message.BasicHttpResponse@8e413dc with statusHTTP/1.1 200 OK
19992,1738978241.666,2025-02-07T17:30:41.666000-08:00,BuiltinUserCache : Deleted 366 non-document entries
19998,1738978241.673,2025-02-07T17:30:41.673000-08:00,"ServerSyncAdapter : Push complete, now pulling"

Then today morning, we went up from 2 trips to 5 trips

16931,1738969668.615,2025-02-07T15:07:48.615000-08:00,"js : Timeline: readCompositePromise resolved with 2 trips;  
      readUnprocessedPromise resolved with 0 trips"

20786,1739033654.305,2025-02-08T08:54:14.305000-08:00,"js : Timeline: readCompositePromise resolved with 5 trips; 
      readUnprocessedPromise resolved with 0 trips"

On the server,

2025-02-07T14:33:05.238-08:00	
2025-02-07 22:33:05,238:INFO:139912475055680:Received put message for subgroup temp in suspended_subgroups=['temp'], ignoring
	
2025-02-07T17:30:10.166-08:00
2025-02-08 01:30:10,165:DEBUG:140046154872384:suspended_subgroups=[]
	
2025-02-07T17:30:37.818-08:00
2025-02-08 01:30:37,817:DEBUG:140046144382528:suspended_subgroups=[]

Moving this to production on nrel commute and open access for now, will move to production on the other deployments tonight.

@shankari
Copy link
Contributor Author

shankari commented Feb 9, 2025

verified that this is working properly with my personal phone

	
2025-02-08T14:47:51.830-08:00 2025-02-08 22:47:51,830:DEBUG:139889452979776:START POST /usercache/put
2025-02-08T14:47:51.830-08:00 2025-02-08 22:47:51,830:DEBUG:139889452979776:Called userCache.put
2025-02-08T14:47:51.831-08:00 2025-02-08 22:47:51,831:DEBUG:139889452979776:Old-style study, expecting token without a subgroup...
2025-02-08T14:47:51.831-08:00 2025-02-08 22:47:51,831:DEBUG:139889452979776:methodName = skip, returning <class 'emission.net.auth.skip.SkipMethod'>
2025-02-08T14:47:51.831-08:00 2025-02-08 22:47:51,831:DEBUG:139889452979776:Using the skip method to verify id token [redacted] of length 36
2025-02-08T14:47:51.842-08:00 2025-02-08 22:47:51,842:DEBUG:139889452979776:Old-style study, expecting token without a subgroup...
2025-02-08T14:47:51.842-08:00 2025-02-08 22:47:51,842:DEBUG:139889452979776:retUUID = {'subgroup': None, 'user_id': UUID('[redacted]')}
2025-02-08T14:47:51.842-08:00 2025-02-08 22:47:51,842:DEBUG:139889452979776:user_uuid {'subgroup': None, 'user_id': UUID('[redacted]')}
2025-02-08T14:47:51.842-08:00 2025-02-08 22:47:51,842:DEBUG:139889452979776:suspended_subgroups=[] 
2025-02-08T14:47:51.883-08:00 2025-02-08 22:47:51,883:DEBUG:139889452979776:Updated result for user = [redacted], key = config/app_ui_config, write_ts = 1692318184.717 = {'n': 1, 'nModified': 0, 'ok': 1.0, 'operationTime': Timestamp(1739054871, 1), 'updatedExisting': True}
2025-02-08T14:47:51.893-08:00 2025-02-08 22:47:51,893:DEBUG:139889452979776:Updated result for user = [redacted], key = config/consent, write_ts = 1692318248.839 = {'n': 1, 'nModified': 0, 'ok': 1.0, 'operationTime': Timestamp(1739054871, 1), 'updatedExisting': True}
2025-02-08T14:47:51.904-08:00 2025-02-08 22:47:51,904:DEBUG:139889452979776:Updated result for user = [redacted], key = stats/client_time, write_ts = 1739053988.084 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a7df1761091682c9723c8e'), 'ok': 1.0, 'operationTime': Timestamp(1739054871, 1), 'updatedExisting': False}
2025-02-08T14:47:51.915-08:00 2025-02-08 22:47:51,915:DEBUG:139889452979776:Updated result for user = [redacted], key = stats/client_time, write_ts = 1739053988.363 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a7df1761091682c9723c8f'), 'ok': 1.0, 'operationTime': Timestamp(1739054871, 1), 'updatedExisting': False}
2025-02-08T14:47:51.924-08:00 2025-02-08 22:47:51,924:DEBUG:139889452979776:Updated result for user = [redacted], key = stats/client_time, write_ts = 1739053988.368 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('67a7df1761091682c9723c90'), 'ok': 1.0, 'operationTime': Timestamp(1739054871, 1), 'updatedExisting': False}

I can't verify anything further downstream because the pipeline is stuck but this is working properly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant