Adds exact journal matches to historical analysis #65

JPrevost · 2024-07-24T15:19:03Z

Why are these changes being introduced:

Understanding our ability to detect search intent over time is core to our ability to know how we are doing

Relevant ticket(s):

https://mitlibraries.atlassian.net/browse/TCO-54

How does this address that need:

Adds new field to track exact journal matches to Metrics::Algorithms
Updates Metrics::Algorithms to run journal exact matches in addition to the existing StandardIdentifer matches. This included a refactor to better support multiple match types.

Document any side effects to this change:

Journal exact matching is not guaranteed to be an indicator of user search intent because Journal names are also common words in many cases. When we build our our validation workflows, we'll be able to understand what percentage of these types of matches are definitely Journals and what percentage is ambiguous. We can likely update our algorithm to drop some of the more ambiguous detections at that point.

jazairi

Exciting feature, thanks for this! While I agree that the current iteration of exact title matching may not be predictive of user intent, I think this data will help us better understand what our users are searching for (and how to refine the algorithm).

jazairi · 2024-07-30T20:54:26Z

test/models/metrics/algorithms_test.rb

@@ -38,6 +39,11 @@ class Algorithms < ActiveSupport::TestCase
    assert aggregate.pmid == 1
  end

+  test 'journal exact counts are included in monthly aggregation' do
+    aggregate = Metrics::Algorithms.new.generate(DateTime.now)
+    assert aggregate.journal_exact == 1


I worry a bit that this type of testing will lead to similar issues as we've seen in ETD (i.e.,adding a fixture breaks a bunch of tests). Not requesting a change, partly because I don't have any good suggestions and partly because I'd love to get this ticket to done.

Yeah, you are definitely correct. As more fixtures are added, this test could fail.

I don't have a great idea that doesn't involve just not using fixtures or reimplementing the algorithm that calculates matches in the test itself... neither of which are really acceptable.

Thanks for pointing this out as I do think it's important for us to acknowledge when these "future" problems are being introduced.

Why are these changes being introduced: * Understanding our ability to detect search intent over time is core to our ability to know how we are doing Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TCO-54 How does this address that need: * Adds new field to track exact journal matches to Metrics::Algorithms * Updates Metrics::Algorithms to run journal exact matches in addition to the existing StandardIdentifer matches. This included a refactor to better support multiple match types. Document any side effects to this change: Journal exact matching is not guaranteed to be an indicator of user search intent because Journal names are also common words in many cases. When we build our our validation workflows, we'll be able to understand what percentage of these types of matches are definitely Journals and what percentage is ambiguous. We can likey update our algorithm to drop some of the more ambiguous detections at that point.

mitlib temporarily deployed to tacos-api-pipeline-pr-65 July 24, 2024 15:21 Inactive

JPrevost requested review from matt-bernhardt and jazairi and removed request for matt-bernhardt July 24, 2024 16:58

jazairi self-assigned this Jul 30, 2024

jazairi approved these changes Jul 30, 2024

View reviewed changes

JPrevost force-pushed the tco54-historical-snapshot-journals branch from e43ae42 to 1e0fdc1 Compare July 31, 2024 12:44

JPrevost merged commit eec3780 into main Jul 31, 2024
1 check passed

JPrevost deleted the tco54-historical-snapshot-journals branch July 31, 2024 12:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds exact journal matches to historical analysis #65

Adds exact journal matches to historical analysis #65

JPrevost commented Jul 24, 2024

jazairi left a comment

jazairi Jul 30, 2024

JPrevost Jul 31, 2024

Adds exact journal matches to historical analysis #65

Adds exact journal matches to historical analysis #65

Conversation

JPrevost commented Jul 24, 2024

jazairi left a comment

Choose a reason for hiding this comment

jazairi Jul 30, 2024

Choose a reason for hiding this comment

JPrevost Jul 31, 2024

Choose a reason for hiding this comment