Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate ISBNs detected via regex #171

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

JPrevost
Copy link
Member

@JPrevost JPrevost commented Jan 14, 2025

Why are these changes being introduced:

  • It is not possible to validate detected ISBNs via regex alone
  • We are seeing false positives for ISBNs in production that are invalid
  • We validate ISSNs, but had not gone back and added this logic to ISBN

Relevant ticket(s):

How does this address that need:

  • Follows the pattern to strip detected ISBNs in a way similar to ISSNs
  • Follows the ISBN-10 and ISBN-13 validation specifications
  • Updates tests for valid ISBNs are actually using valid ISBNs (previously, the ISBN-10s were incorrect converted to ISBN-13s)
  • Adds additional ISBN-13 examples to test positive validations
  • Adds additional ISBN-13 examples to test negative validations

Document any side effects to this change:

  • Removed "ISBN: " optional portion of the ISBN detection. It didn't feel useful and needed to be stripped out before validation (which felt like it supported to me that it wasn't part of the ISBN and shouldn't be detected)
  • Updated yardopts to include private methods. We do a lot of important work in private methods. While we shouldn't call them from outside of the instance, having them in the docs feels better as some of our docs were effectively making Classes look like black boxes without this change.
  • Some of our test ISBN-13s were invalid. The ISBN-10 list was run through an external ISBN-10 to ISBN-13 convertor. As all of those will start with 978 prefixes, I manually added some valid ISBN-13 examples to the tests that started with the other valid prefix (979).

Example query. Run in prod, this will show a detected ISBN that then fails the details lookup as it isn't actually valid. In the PR build or locally with this branch, it does not detect the invalid ISBN

{
  logSearchEvent(
    searchTerm: "Artin, M. Algebra (2nd Edition). Addison Wesley, 2010. ISBN: 9780132413771"
    sourceSystem: "playground"
  ) {
    phrase
    categories {
      name
      confidence
    }
    detectors {
      standardIdentifiers {
        kind
        value
        details {
          linkResolverUrl
        }
      }
    }
  }
}

Developer

Ticket(s)

https://mitlibraries.atlassian.net/browse/TCO-###

Accessibility

  • ANDI or Wave has been run in accordance to our guide and
    all issues introduced by these changes have been resolved or opened
    as new issues (link to those issues in the Pull Request details above)
  • There are no accessibility implications to this change

Documentation

  • Project documentation has been updated, and yard output previewed
  • No documentation changes are needed

ENV

  • All new ENV is documented in README.
  • All new ENV has been added to Heroku Pipeline, Staging and Prod.
  • ENV has not changed.

Stakeholders

  • Stakeholder approval has been confirmed
  • Stakeholder approval is not needed

Dependencies and migrations

NO dependencies are updated

NO migrations are included

Reviewer

Code

  • I have confirmed that the code works as intended.
  • Any CodeClimate issues have been fixed or confirmed as
    added technical debt.

Documentation

  • The commit message is clear and follows our guidelines
    (not just this pull request message).
  • The documentation has been updated or is unnecessary.
  • New dependencies are appropriate or there were no changes.

Testing

  • There are appropriate tests covering any new functionality.
  • No additional test coverage is required.

Why are these changes being introduced:

* It is not possible to validate detected ISBNs via regex alone
* We are seeing false positives for ISBNs in production that are invalid
* We validate ISSNs, but had not gone back and added this logic to ISBN

Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/TCO-114

How does this address that need:

* Follows the pattern to strip detected ISBNs in a way similar to ISSNs
* Follows the ISBN-10 and ISBN-13 validation specifications
* Updates tests for valid ISBNs are actually using valid ISBNs
  (previously, the ISBN-10s were incorrect converted to ISBN-13s)
* Adds additional ISBN-13 examples to test positive validations
* Adds additional ISBN-13 examples to test negative validations

Document any side effects to this change:

* Removed "ISBN: " optional portion of the ISBN detection. It didn't feel useful
  and needed to be stripped out before validation (which felt like it supported to
  me that it wasn't part of the ISBN and shouldn't be detected)
* Updated yardopts to include private methods. We do a lot of important
  work in private methods. While we shouldn't call them from outside of
  the instance, having them in the docs feels better as some of our docs
  were effectively making Classes look like black boxes without this
  change.
* Some of our test ISBN-13s were invalid. The ISBN-10 list was run
  through an external ISBN-10 to ISBN-13 convertor. As all of those will
  start with 978 prefixes, I manually added some valid ISBN-13 examples
  to the tests that started with the other valid prefix (979).
@mitlib mitlib temporarily deployed to tacos-api-pipeline-pr-171 January 14, 2025 15:49 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants