Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support struct field as indexed column #213

Conversation

dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented Jan 4, 2024

Description

  1. To preserve the nested structure of subfields without flattening them, store the root StructType and retrieve fields by examining this structure.
  2. When query rewrite, match struct column name with index expression.

TODO

To support any index expression in general, we need to analyze the given index expression string when create index or query rewrite. PoC is done here but requires more changes: #232. Will delay this until it's required.

Testing

I'm unable to check skipping index data in IT due to issue #233. Checked this manually by debugging IT:

curl "localhost:56262/flint_spark_catalog_default_nested_field_table_skipping_index/_search?pretty"
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "flint_spark_catalog_default_nested_field_table_skipping_index",
        "_id" : "a66a1b7b17d545a9248d8a236b2d1c01a358b4a1",
        "_score" : 1.0,
        "_source" : {
          "file_path" : "file:///.../nested_field_table/part-00000-0f065a88-7d52-4e68-a8e3-d39e389221be-c000.json",
          "MinMax_struct_col.field2_0" : 123,
          "MinMax_struct_col.field2_1" : 456,
          "struct_col.field1.subfield" : [
            "value1",
            "value2"
          ]
        }
      },
      {
        "_index" : "flint_spark_catalog_default_nested_field_table_skipping_index",
        "_id" : "1ae12dc8aa4215668c0f4da81f5312ef21987568",
        "_score" : 1.0,
        "_source" : {
          "file_path" : "file:///.../nested_field_table/part-00000-f86ea4eb-19d7-4d60-bf76-61167537a2ad-c000.json",
          "MinMax_struct_col.field2_0" : 789,
          "MinMax_struct_col.field2_1" : 789,
          "struct_col.field1.subfield" : [
            "value3"
          ]
        }
      }
    ]
  }
}

Issues Resolved

#194

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dai-chen dai-chen added the bug Something isn't working label Jan 4, 2024
@dai-chen dai-chen self-assigned this Jan 4, 2024
@dai-chen dai-chen added the 0.2 label Jan 5, 2024
@dai-chen dai-chen marked this pull request as ready for review January 23, 2024 23:21
@dai-chen dai-chen merged commit 851ade3 into opensearch-project:main Feb 7, 2024
4 checks passed
@dai-chen dai-chen deleted the support-struct-field-as-indexed-column branch February 7, 2024 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.2 bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants