Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classifier: Remove the faulty session id code #6499

Merged
merged 4 commits into from
Dec 10, 2024

Conversation

goplayoutside3
Copy link
Contributor

@goplayoutside3 goplayoutside3 commented Nov 27, 2024

Package

lib-classifier

Linked Issue and/or Talk Post

In response to discussion in: #6493 (comment)
Will link corresponding PFE PR here

Describe your changes

  • While investigating Standardize third party libraries for generating string hash and unique ids #6493, I found that the "generate a new session id after 5 minutes of inactivity" logic does not work in the classifier. As mentioned in that Issue, session id is attached to classification metadata and can be used for analysis of anonymous users contributing classifications.
  • In this PR I removed the "5 minutes" logic because its original condition of comparing a Date() string to a Date.now() number was never met, and therefore the code never worked as intended.

How to Review

  • Anytime a classification is submitted, a session storage object with { id } is added to your browser's session storage.
  • As long as your browser tab or window is not closed, that id is attached to the classification's metadata (this is all existing behavior, I'm simply removing the faulty code and unit tests).

Checklist

PR Creator - Please cater the checklist to fit the review needed for your code changes.
PR Reviewer - Use the checklist during your review. Each point should be checkmarked or discussed before PR approval.

General

  • Tests are passing locally and on Github
  • Documentation is up to date and changelog has been updated if appropriate
  • You can yarn panic && yarn bootstrap or docker-compose up --build and FEM works as expected
  • FEM works in all major desktop browsers: Firefox, Chrome, Edge, Safari (Use Browserstack account as needed)

General UX

Example Staging Project: i-fancy-cats

  • All pages of a FEM project load: Home Page, Classify Page, and About Pages
  • Can submit a classification
  • Can sign-in and sign-out
  • The component is accessible

Bug Fix

  • The PR creator has listed user actions to use when testing if bug is fixed
  • The bug is fixed
  • Unit tests are added or updated

@goplayoutside3 goplayoutside3 marked this pull request as ready for review November 27, 2024 18:29
@@ -1,19 +1,12 @@
import hash from 'hash.js'

const storage = window.sessionStorage || window.localStorage
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why the option for localStorage is here. This feature is supposed to be a session id, which refreshes just like a browser session. If researchers are using session ids to analyze anonymous users, then don't use localStorage as a backup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. ✔️ Agreed, in this scenario a fallback is actually a bad option that leads to inconsistent/unpredictable behaviour. Removing localStorage is a good idea.
  2. ❓ I'm wondering if the rest of the session.js needs to be made more robust. I just realised a lot of the code in this file does things like window.sessionStorage and storage.getItem(), instead of window?.sessionStorage and storage?.getItem(), which may inexplicably crash the process if window or window.sessionStorage is ever unavailable. (Question for Shaun: when would such a scenario happen? Browser security settings? Incognito mode? "Do not track" enabled?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the classifier runs only in the browser, the chance of window being undefined is lower than components of app-project, for example, and the entire classifier is also lazy loaded. I'm not sure if user browser settings could impact window 🤔 In a test env we've mocked window.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sessionStorage will throw a SecurityException if it violates browser privacy settings, so you should catch that.

https://developer.mozilla.org/en-US/docs/Web/API/Window/sessionStorage#exceptions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to throw that error by setting the browser to block www.zooniverse.org in an ad blocker or in cookie preferences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

@@ -24,18 +17,7 @@ const sessionUtils = {

getSessionID () {
const stored = (storage.getItem('session_id')) ? JSON.parse(storage.getItem('session_id')) : this.generateSessionID()
let { id, ttl } = stored

if (ttl < Date.now()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition was comparing a Date() string to a Date.now() number and therefore never met. It was copied directly from PFE's codebase, so I'll be doing the same cleanup over there.

@coveralls
Copy link

coveralls commented Nov 27, 2024

Coverage Status

coverage: 77.394% (+4.6%) from 72.785%
when pulling 6de108a on remove-faulty-session-id
into ad8d5be on master.

@eatyourgreens
Copy link
Contributor

Here's a little pen that, I think, shows that the original comparison is always false.

https://codepen.io/eatyourgreens/pen/gbYOqXV

generateSessionID () {
const sha2 = hash.sha256()
const id = sha2.update(`${Math.random() * 10000}${Date.now()}${Math.random() * 1000}`).digest('hex')
const ttl = this.fiveMinutesFromNow()
const stored = { id, ttl }
const stored = { id }
try {
storage.setItem('session_id', JSON.stringify(stored))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[rambling][minor] Curiously, this means that we're storing session ID in an unnecessary "object layer", i.e. session_id = "{ id: \"1234\" }" instead of session_id = "1234". Keeping it this way is fine by me, though. (And even if we did change it to the latter format, I wonder what'd happen if a user with an existing session ID refreshes their page after we push the changes. 🤔 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember that this stored object is also going to be read by PFE, so anything written to storage here must be readable by PFE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PFE might also expect the stored object to have a ttl property, come to think of it. I assume PFE reads the session when you go to Talk, or your collections etc.

Copy link
Contributor

@eatyourgreens eatyourgreens Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If PFE does generate a new session when you go to Talk, that would be a breaking change, so it's worth testing that case.

PFE code and FEM code running in the same tab, on the same origin, uses the same session ID in production.

@shaunanoordin shaunanoordin self-assigned this Dec 3, 2024
Copy link
Member

@shaunanoordin shaunanoordin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review (Ongoing)

Package: lib-classifier

This PR removes some code in our session ID-generating function. The removed portion was never working, so no actual changes in website behaviour are expected.

  • Previously:
    • In theory, a session ID (for a browser tab/window) should be generated when the page loads first classification is submitted (see ❗ note below on FEM vs PFE), and should be re-generated after 5 minutes of inactivity.
    • In practice, a session ID (f.a.b.t.w.) is generated when the page loads first classification is submitted and remains throughout the lifetime of the browser window/tab.
      • (...unless localStorage somehow kicks in, then it's anybody's game. This is an extremely unlikely scenario, though.)
  • With this PR:
    • A session ID is generated when the page loads first classification is submitted and remains throughout the lifetime of the browser window/tab.
    • The README has been updated accordingly. (Note: there is a separate effort to inform project owners and researchers of the actual vs expected behaviour.)

Code changes look good, I just need to run some tests to confirm.

Dev Notes

❗ Important note:

  • On FEM, the session ID (f.a.b.t.w.) is ONLY generated on the Classify page, when a classification is submitted for the first time.
  • On PFE, the session ID (f.a.b.t.w.) is generated as soon as you open any PFE page.
  • This behaviour should be fine if Session ID only matters to classifications, which I think they are.

Additional notes, to illustrate the issue that Delilah uncovered, with some examples:

  • fiveMinutesFromNow() generates a Date.
  • That date is stored as a String that looks like "2024-12-03T15:59:48.883Z"
  • Date.now() generates a number like 1733241013923
  • So "2024-12-03T15:59:48.883Z" < 1733241013923 => false.
  • In fact, "(any string that's not a simple number that JS can auto-convert)" < ANY_NUMBER => false, always. (Same is true for >, <=, >=, == btw)

Testing

Testing ongoing with macOS + Chrome, on localhost, on lib-classifier

  • Run lib-classifier via yarn dev
  • Open Classifier in a new tab (Tab 1). Submit a Classification. Expect to see a new session ID in Chrome's session storage for this tab.
  • Close Tab 1.
  • Open Classifier in a new tab (Tab 2). Submit a yadda yadda, expect to see a new session ID, different from Tab 1.
  • Without closing Tab 2, open Tab 3. Submit blah blah, expect to see a new session ID in Tab 3, while Tab 2's session ID should remain the same.

Status

Testing ongoing, should be done soon.

Copy link
Member

@shaunanoordin shaunanoordin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review (Complete)

All tests pass - functionality checks out, and behaviour is working as expected. LGTM! 👍

@github-actions github-actions bot added the approved This PR is approved for merging label Dec 3, 2024
@eatyourgreens
Copy link
Contributor

In fact, "(any string that's not a simple number that JS can auto-convert)" < ANY_NUMBER => false, always. (Same is true for >, <=, >=, == btw)

Non-numeric strings are converted to NaN when a string is coerced to a number. Comparisons with NaN are always false.

@goplayoutside3 goplayoutside3 merged commit d051d9c into master Dec 10, 2024
7 checks passed
@goplayoutside3 goplayoutside3 deleted the remove-faulty-session-id branch December 10, 2024 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved This PR is approved for merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants