Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Sanitizing institution data harvested from GrSciColl breaks functionality #1982

Open
mickley opened this issue Dec 17, 2024 · 5 comments
Assignees

Comments

@mickley
Copy link
Collaborator

mickley commented Dec 17, 2024

In the institution editor, there is now a call to sanitize all institution data before displaying it using htmlspecialchars(), see:

$instArr = $instManager->cleanOutArray($instArr);

That seems reasonable, however, when I wrote the code to harvest data from GBIF's GrSciColl API, I incorporated several GrSciColl/indexHerbariorum fields and links to Index Herbariorum and GrSciColl in the notes field, using some html. Santitizing with htmlspecialchars unfortunately breaks that functionality. See:

// Add taxonomic coverage to notes, if included

Here's what it should look like:

image

Here's what the sanitizing does:

image

@Atticus29: There's different ways this could be fixed: sanitizing all except the notes, allowing limited html, having more institution editor fields to avoid this need, etc. Up to you as to what works best.

@mickley mickley changed the title Bug when sanitizing institution data harvested from GrSciColl [Bug] Sanitizing institution data harvested from GrSciColl breaks functionality Dec 17, 2024
@mickley
Copy link
Collaborator Author

mickley commented Jan 2, 2025

Another note in relation to this:
The notes field in the Institutions table has been changed back to 250 characters, which will cut off this content.
I had it set as VARCHAR(19000), perhaps it should be TEXT.

egbot added a commit that referenced this issue Jan 7, 2025
- Remove sanitation off outbound notes content to avoid interfering with embedded html tags that are added when this field is appended with GriSciColl info.
- Remove sanitation notes which were only meant to communicate to internal team when sanitation content was originally added.
Resolves following issue, in part: #1982
@egbot egbot mentioned this issue Jan 7, 2025
20 tasks
@egbot
Copy link
Member

egbot commented Jan 7, 2025

I've submitted a hotfix patch that removes outbound sanitation of the notes field, which should partially resolve the issue above. This should make it into the master branch probably within the week.

@egbot
Copy link
Member

egbot commented Jan 7, 2025

Another note in relation to this: The notes field in the Institutions table has been changed back to 250 characters, which will cut off this content. I had it set as VARCHAR(19000), perhaps it should be TEXT.

The new feature you added that grabs institutional data directly from GBIF and IH is awesome. Love it. I wonder if we should add a new field for the additional content you put into the notes field. Maybe a field called dynamicProperties (TEXT) where we store teh data as a JSON object (e.g. key/value pairs), which can be output as html with the core values sanitized. Just throwing out ideas. Thoughts?

@egbot
Copy link
Member

egbot commented Jan 8, 2025

Keeping this open until we deal with next step, which I tagged it as an enhancement.

@mickley
Copy link
Collaborator Author

mickley commented Jan 14, 2025

Another note in relation to this: The notes field in the Institutions table has been changed back to 250 characters, which will cut off this content. I had it set as VARCHAR(19000), perhaps it should be TEXT.

The new feature you added that grabs institutional data directly from GBIF and IH is awesome. Love it. I wonder if we should add a new field for the additional content you put into the notes field. Maybe a field called dynamicProperties (TEXT) where we store teh data as a JSON object (e.g. key/value pairs), which can be output as html with the core values sanitized. Just throwing out ideas. Thoughts?

@egbot That sounds fine to me. I think it would be more flexible as well, e.g., if more information is available in the future. From a collections standpoint, it's quite nice to have this data when preparing shipments, and not to have to go looking for it elsewhere.

GregoryPost added a commit that referenced this issue Jan 16, 2025
* Minor editor fix

- Remove head from editor due to causing more disruptions than benefits due to variations within portals central css stylings. Best to integrate head into new redesign of occurrence editor.

* Update hotfix version (3.1.6)

* closes #1954 Fixes use of hidden button and button submit value

Fix transfer taxa form so that action is submited in the request

* Fixing Login Form CSS closes #1975

# Issue #1975

# Summary
input id value for email was "login" and css rule `#login {float:
right}` was being applied to this accidently causing some odd behavior.
Changed id of login so the collid wouldn't happen and bumped the width
of the form so that it will display correctly.

* resolve merge conflict

* Occurrence Profile bugs

- Adjust editor permissions check so that it includes  creator of general research collections (observerUid = active user)
- Avoid double sanitation of identifier and collector

* Update geolocate.php (#1996)

Fixes minor typo that occurs in error message

* hotfix - protection

- If collid input is a number + single quote, assume it's an SQL Inject support and set value to 0, which returns nothing, rather than putting a load on the server

* Institution Sanitation issue

- Remove sanitation off outbound notes content to avoid interfering with embedded html tags that are added when this field is appended with GriSciColl info.
- Remove sanitation notes which were only meant to communicate to internal team when sanitation content was originally added.
Resolves following issue, in part: #1982

* API Annotation Bug

- Fix issue with missing recordID field from SQL statement definition

* added the changes to hotfix

* Bug adding image

- If user is null, user verification code incorrectly checks to see if there is a user with an empty string username or email. Thus, add code that skips checking user table if login details are an empty value
- Don't add empty strings to database. Keep them as null values.
- Comment out user verification check. Just test to make sure it's a number.

* remove associations changes

* Closes #2040 Sorts By Sciname within family (#2052)

# Issue #2040

# Summary
Adds extra sort conditions so that records are sorted by sciname after
being storted by family.

* Closes #2049 Fixes typo on globals variable

# Issues #2049

# Summary
Fixes typo for `IMAGE_ROOT_PATH` and `IMAGE_ROOT_URL` global variables.
Note this will be overritten in the coming 3.2 changes with the
multimedia changes so maybe it would be worth merging.

* Fix country synonyms, some U.S. states, add U.S. state abbreviations (#2059)

* Closes #2064 Fixing String Number multiply

# Issues #2064

# Summary
Adds as is_numeric check on 'page' request variable so that the
`$pageNumber` variable is alwasys a number

* Update geolocate.php per Nelson's suggestion (#2066)

See #1702 (comment)

Co-authored-by: Edward Gilbert <[email protected]>

* Update db_schema_patch-3.1.sql

- Explicitly set the index for omoccurrences.locality to a length of 100, thus avoid DB setting it to a default larger length that is beyond what is needed nor practical.
Addresses issue: #2050

* removed arrow functions and union types

* replaced str_contains with str_pos

---------

Co-authored-by: Edward Gilbert <[email protected]>
Co-authored-by: MuchQuak <[email protected]>
Co-authored-by: Logan Wilt <[email protected]>
Co-authored-by: atticus29 <[email protected]>
Co-authored-by: Lindsay Walker <[email protected]>
Co-authored-by: Nikita Salikov <[email protected]>
Co-authored-by: NikitaSalikov <[email protected]>
Co-authored-by: Katie Pearson <[email protected]>
@GregoryPost GregoryPost mentioned this issue Jan 16, 2025
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants