Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index update CLI command shouldn't take into account dynamic mapping fields #229

Open
slint opened this issue Apr 10, 2024 · 1 comment
Open

Comments

@slint
Copy link
Member

slint commented Apr 10, 2024

The invenio index update ... CLI command is meant for performing backwards-compatible updates on existing index mappings. To prevent any mistakes, it also checks for backwards-incompatible changes (i.e. field removals, renames/moves, etc.) by comparing the JSON of the existing mapping on the cluster against the new one.

This works for simple cases, but when e.g. there are dynamic_templates that match nested paths, the JSON diff will falsely think that there are field removals suggested by the new mapping.

As a simple example, imagine the following mapping:

{
  "mappings": {
    "dynamic_templates": [
      {
        "i18n_title": {
          "path_match": "*.title.*",
          "match_mapping_type": "object",
          "mapping": {
            "type": "text",
            "fields": {
              "keyword": { "type": "keyword" }
            }
          }
        }
      }
    ],
    "properties": {
      "title": { "type": "object", "dynamic": "true" },
      "org": {
        "type": "object",
        "properties": {
          "title": { "type": "object", "dynamic": "true" }
        }
      }
    }
  }
}

This means that one can index a document like:

{
  "title": {
    "en": "Dataset",
    "de": "Datensatz",
  },
  "org": {
    "title": {
      "en": "University of Berlin",
      "de": "Universität Berlin"
    }
  }
}

Which would generate the keyword type mapping for title.en, title.de, org.title.en, org.title.de.

If the we would e.g. add a new top-level field to the mapping, like "year": {"type": "date"}, when running the invenio index update ... command, we would get an error, since the differ thinks that the existing title.en, title.de, etc. fields were "removed" by the new mapping.

The solution would be to parse the dynamic_templates[].path_match field and filter out all dictdiffer paths that match them. We should also take into account unmatch directives that exclude some of the fields.

@slint slint self-assigned this Aug 1, 2024
@slint slint moved this to In progress in Sprint Q1/2025 Aug 1, 2024
@carlinmack carlinmack moved this from In progress to To release 🤖 in Sprint Q1/2025 Aug 1, 2024
@carlinmack carlinmack moved this from To release 🤖 to In progress in Sprint Q1/2025 Aug 1, 2024
@slint slint moved this from In progress to In review in Sprint Q1/2025 Aug 2, 2024
@slint slint removed their assignment Aug 2, 2024
@slint slint moved this from In review to Ready in Sprint Q1/2025 Aug 5, 2024
@slint
Copy link
Member Author

slint commented Aug 12, 2024

Diffing mappings is actually not that important, or something we should be doing on the application side. That is because if a new mapping is incompatible, Elastic/OpenSearch will anyways return a 400 error and explain what the incompatibility is. Keeping up to date with the different rules that explain why a mapping is incompatible or not is quite difficult, so it might be worth to entirely drop the diffing feature.

For now we've added a --check/--no-check flag to the invenio index update ... command, so that one can disable the diff check entirely: #232

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog 😴
Development

Successfully merging a pull request may close this issue.

1 participant