-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds exact journal matches to historical analysis #65
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting feature, thanks for this! While I agree that the current iteration of exact title matching may not be predictive of user intent, I think this data will help us better understand what our users are searching for (and how to refine the algorithm).
@@ -38,6 +39,11 @@ class Algorithms < ActiveSupport::TestCase | |||
assert aggregate.pmid == 1 | |||
end | |||
|
|||
test 'journal exact counts are included in monthly aggregation' do | |||
aggregate = Metrics::Algorithms.new.generate(DateTime.now) | |||
assert aggregate.journal_exact == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry a bit that this type of testing will lead to similar issues as we've seen in ETD (i.e.,adding a fixture breaks a bunch of tests). Not requesting a change, partly because I don't have any good suggestions and partly because I'd love to get this ticket to done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are definitely correct. As more fixtures are added, this test could fail.
I don't have a great idea that doesn't involve just not using fixtures or reimplementing the algorithm that calculates matches in the test itself... neither of which are really acceptable.
Thanks for pointing this out as I do think it's important for us to acknowledge when these "future" problems are being introduced.
Why are these changes being introduced: * Understanding our ability to detect search intent over time is core to our ability to know how we are doing Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TCO-54 How does this address that need: * Adds new field to track exact journal matches to Metrics::Algorithms * Updates Metrics::Algorithms to run journal exact matches in addition to the existing StandardIdentifer matches. This included a refactor to better support multiple match types. Document any side effects to this change: Journal exact matching is not guaranteed to be an indicator of user search intent because Journal names are also common words in many cases. When we build our our validation workflows, we'll be able to understand what percentage of these types of matches are definitely Journals and what percentage is ambiguous. We can likey update our algorithm to drop some of the more ambiguous detections at that point.
e43ae42
to
1e0fdc1
Compare
Why are these changes being introduced:
Relevant ticket(s):
How does this address that need:
Document any side effects to this change:
Journal exact matching is not guaranteed to be an indicator of user search intent because Journal names are also common words in many cases. When we build our our validation workflows, we'll be able to understand what percentage of these types of matches are definitely Journals and what percentage is ambiguous. We can likely update our algorithm to drop some of the more ambiguous detections at that point.