Validate ISBNs detected via regex #171

JPrevost · 2025-01-14T15:47:59Z

Why are these changes being introduced:

It is not possible to validate detected ISBNs via regex alone
We are seeing false positives for ISBNs in production that are invalid
We validate ISSNs, but had not gone back and added this logic to ISBN

Relevant ticket(s):

https://mitlibraries.atlassian.net/browse/TCO-114

How does this address that need:

Follows the pattern to strip detected ISBNs in a way similar to ISSNs
Follows the ISBN-10 and ISBN-13 validation specifications
Updates tests for valid ISBNs are actually using valid ISBNs (previously, the ISBN-10s were incorrect converted to ISBN-13s)
Adds additional ISBN-13 examples to test positive validations
Adds additional ISBN-13 examples to test negative validations

Document any side effects to this change:

Removed "ISBN: " optional portion of the ISBN detection. It didn't feel useful and needed to be stripped out before validation (which felt like it supported to me that it wasn't part of the ISBN and shouldn't be detected)
Updated yardopts to include private methods. We do a lot of important work in private methods. While we shouldn't call them from outside of the instance, having them in the docs feels better as some of our docs were effectively making Classes look like black boxes without this change.
Some of our test ISBN-13s were invalid. The ISBN-10 list was run through an external ISBN-10 to ISBN-13 convertor. As all of those will start with 978 prefixes, I manually added some valid ISBN-13 examples to the tests that started with the other valid prefix (979).

Example query. Run in prod, this will show a detected ISBN that then fails the details lookup as it isn't actually valid. In the PR build or locally with this branch, it does not detect the invalid ISBN

{
  logSearchEvent(
    searchTerm: "Artin, M. Algebra (2nd Edition). Addison Wesley, 2010. ISBN: 9780132413771"
    sourceSystem: "playground"
  ) {
    phrase
    categories {
      name
      confidence
    }
    detectors {
      standardIdentifiers {
        kind
        value
        details {
          linkResolverUrl
        }
      }
    }
  }
}

Developer

Ticket(s)

https://mitlibraries.atlassian.net/browse/TCO-###

Accessibility

ANDI or Wave has been run in accordance to our guide and
all issues introduced by these changes have been resolved or opened
as new issues (link to those issues in the Pull Request details above)
There are no accessibility implications to this change

Documentation

Project documentation has been updated, and yard output previewed
No documentation changes are needed

ENV

All new ENV is documented in README.
All new ENV has been added to Heroku Pipeline, Staging and Prod.
ENV has not changed.

Stakeholders

Stakeholder approval has been confirmed
Stakeholder approval is not needed

Dependencies and migrations

NO dependencies are updated

NO migrations are included

Reviewer

Code

I have confirmed that the code works as intended.
Any CodeClimate issues have been fixed or confirmed as
added technical debt.

Documentation

The commit message is clear and follows our guidelines
(not just this pull request message).
The documentation has been updated or is unnecessary.
New dependencies are appropriate or there were no changes.

Testing

There are appropriate tests covering any new functionality.
No additional test coverage is required.

Why are these changes being introduced: * It is not possible to validate detected ISBNs via regex alone * We are seeing false positives for ISBNs in production that are invalid * We validate ISSNs, but had not gone back and added this logic to ISBN Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TCO-114 How does this address that need: * Follows the pattern to strip detected ISBNs in a way similar to ISSNs * Follows the ISBN-10 and ISBN-13 validation specifications * Updates tests for valid ISBNs are actually using valid ISBNs (previously, the ISBN-10s were incorrect converted to ISBN-13s) * Adds additional ISBN-13 examples to test positive validations * Adds additional ISBN-13 examples to test negative validations Document any side effects to this change: * Removed "ISBN: " optional portion of the ISBN detection. It didn't feel useful and needed to be stripped out before validation (which felt like it supported to me that it wasn't part of the ISBN and shouldn't be detected) * Updated yardopts to include private methods. We do a lot of important work in private methods. While we shouldn't call them from outside of the instance, having them in the docs feels better as some of our docs were effectively making Classes look like black boxes without this change. * Some of our test ISBN-13s were invalid. The ISBN-10 list was run through an external ISBN-10 to ISBN-13 convertor. As all of those will start with 978 prefixes, I manually added some valid ISBN-13 examples to the tests that started with the other valid prefix (979).

mitlib temporarily deployed to tacos-api-pipeline-pr-171 January 14, 2025 15:49 Inactive

Use casecmp instead of downcase

1f8d59c

JPrevost deployed to tacos-api-pipeline-pr-171 January 14, 2025 15:53 View deployment

JPrevost requested review from jazairi and matt-bernhardt January 14, 2025 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate ISBNs detected via regex #171

Validate ISBNs detected via regex #171

JPrevost commented Jan 14, 2025 •

edited

Loading

Validate ISBNs detected via regex #171

Are you sure you want to change the base?

Validate ISBNs detected via regex #171

Conversation

JPrevost commented Jan 14, 2025 • edited Loading

Developer

Ticket(s)

Accessibility

Documentation

ENV

Stakeholders

Dependencies and migrations

Reviewer

Code

Documentation

Testing

JPrevost commented Jan 14, 2025 •

edited

Loading