[pkg/ottl] flatten() function does not resolve attribute key conflicts when flattening #35793

odubajDT · 2024-10-15T12:14:45Z

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

The flatten() function is not able to flatten a map, which has conflicting keys after flattening. In the current state, the latest processed key in the map takes precedence. An example is

{
  "name": "test",
  "address": {
    "street": "first",
    "house": 1234
  },
  "address.street": 2
}

which will result in

{
  "name": "test",
  "address.street": 2,
  "address.house": 1234,
}

and therefore we have data loss

Describe the solution you'd like

I would like to introduce another optional parameter of type boolean, which will enable/disable the resolution of conflicts by adding a suffix for the conflicting keys, similar to how flatten() handles slices. The definition might look like this

flatten(target, Optional[prefix], Optional[depth], Optional[resolve_conflict])

Where the 3rd parameter is type boolean and if set to false the behavior stays as it is. If set to true, it will keep all the conflicting keys and use a suffix to distinguish them. A solution for the mentioned example would be

{
  "name": "test",
  "address.street.00": "first",
  "address.house": 1234,
  "address.street.01": 2,
}

We can use a prefix 0 to the suffix to distinguish the array conversions and map key conflict resolutions

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-15T12:15:01Z

Pinging code owners:

pkg/ottl: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley

See Adding Labels via Comments if you do not have permissions to add labels yourself.

joaopgrassi · 2024-10-15T12:19:37Z

Would also be nice to keep the "original" attribute as is and only add the suffix to the new ones that arrived after flattening. This way it is clear which one was the original vs modified.

odubajDT · 2024-10-16T12:50:33Z

created a first version of solution how this can potentially work #35831. Please consider it as a draft of an idea and not a PR ready for review.

Happy to hear about your feedback!

evan-bradley · 2024-10-17T13:30:10Z

I agree we should have a conflict resolution option available for this function, thanks for opening a draft PR to look over.

Can we offer more than just a suffixing strategy? I can think of a few others that we should offer:

Merge the two keys and put the values in a list.
First wins
Last wins (we should keep this as the default for now)

odubajDT · 2024-10-17T13:37:50Z

I agree we should have a conflict resolution option available for this function, thanks for opening a draft PR to look over.

Can we offer more than just a suffixing strategy? I can think of a few others that we should offer:
1. Merge the two keys and put the values in a list.

Interesting idea, but doesn't this go against the functionality of flatten()? Since the function is actually converting slice to a flat map, it seems to me that this will go against the principle of the function

2. First wins

3. Last wins (we should keep this as the default for now)

This might be valid, but how can you put an order on a map (unordered datatype)? Here we might be able to choose between the deeper/shallower value, but ordering might be problematic, since it will be non-deterministic.

evan-bradley · 2024-10-18T13:13:38Z

Interesting idea, but doesn't this go against the functionality of flatten()? Since the function is actually converting slice to a flat map, it seems to me that this will go against the principle of the function

I see what you mean, I had forgotten slices are flattened as well. In that case, I think we just need to make sure that suffixing keeps slices in mind, for cases like this:

{
  "name": "test",
  "address": {
    "street": "first",
    "house": 1234
  },
  "address.street": ["fifth", "avenue"]
}

This might be valid, but how can you put an order on a map (unordered datatype)? Here we might be able to choose between the deeper/shallower value, but ordering might be problematic, since it will be non-deterministic.

That's a fair point, but pdata maps are actually essentially just wrappers around slices to key-value objects, so there is technically a deterministic ordering. Maybe providing options for preferring deeper/shallower values is a better approach. If you don't provide first-wins/last-wins options, I think we should deprecate the current behavior.

odubajDT · 2024-10-21T14:17:09Z

Interesting idea, but doesn't this go against the functionality of flatten()? Since the function is actually converting slice to a flat map, it seems to me that this will go against the principle of the function

I see what you mean, I had forgotten slices are flattened as well. In that case, I think we just need to make sure that suffixing keeps slices in mind, for cases like this:
{
  "name": "test",
  "address": {
    "street": "first",
    "house": 1234
  },
  "address.street": ["fifth", "avenue"]
}
This might be valid, but how can you put an order on a map (unordered datatype)? Here we might be able to choose between the deeper/shallower value, but ordering might be problematic, since it will be non-deterministic.

That's a fair point, but pdata maps are actually essentially just wrappers around slices to key-value objects, so there is technically a deterministic ordering. Maybe providing options for preferring deeper/shallower values is a better approach. If you don't provide first-wins/last-wins options, I think we should deprecate the current behavior.

Ok let's get some conflict examples solved here, so I can proceed with the implementation.

Use case with conflicting maps

"address": map[string]any{
	"street": "first",
	"house":  int64(1234),
},
"address.street": "second",

results in

"address.street":   "first",
"address.house":    int64(1234),
"address.street.1": "second",

Use case with conflicting slices

"address": map[string]any{
	"street": []any{"first"},
	"house":  int64(1234),
},
"address.street": []any{"second"},

results in

"address.street.0":   "first",
"address.house":      int64(1234),
"address.street.0.1": "second",

Use case with conflicting slices and maps

"address": map[string]any{
	"street": map[string]any{
		"number": "first",
	},
	"house": int64(1234),
},
"address.street": map[string]any{
	"number": []any{"second", "third"},
},
"address.street.number": "fourth",

results in

"address.street.number":   "first",
"address.house":           int64(1234),
"address.street.number.0": "second",
"address.street.number.1": "third",
"address.street.number.2": "fourth",

Would you expect a different behavior in any of the use-cases? Especially the third use-case would need some additional attention, since I am not sure, if the information that parameters were an array is important in the resulting outcome. If not, we can just bump the suffix counter without actually preserving the information in any way.

Thanks!

odubajDT · 2024-10-25T10:11:33Z

Ok, looked at a little bit different approach how to resolve conflicts between slices and maps, please have a look at the PR.

If you agree, I can have a look at the potential first/last wins (only when using no conflicts resolution - here it makes no sense, since we do not loose any data)

Thanks!

@evan-bradley @TylerHelmuth

odubajDT · 2024-11-05T08:11:47Z

cc @evan-bradley @TylerHelmuth

TylerHelmuth · 2024-11-12T20:08:51Z

Is there a solution instead where we can tell user's "if you want to avoid conflicts, use a prefix" and document that if there is still a conflict the value that was already at the root wins? I don't really like the suffix solution as it seems to get pretty messy with edge cases.

evan-bradley · 2024-11-12T21:29:23Z

Is there a solution instead where we can tell user's "if you want to avoid conflicts, use a prefix" and document that if there is still a conflict the value that was already at the root wins?

I wouldn't mind seeing if we can achieve this functionality without too much complexity if it means we can provide an option that guarantees no data is lost.

I don't really like the suffix solution as it seems to get pretty messy with edge cases.

The part that makes me feel more comfortable with this is that the conflict resolution will just put in a key that increments the largest numbered suffix already at the root. I think if we stick to that simple rule, we can keep the complexity to a minimum.

odubajDT · 2024-11-15T06:02:45Z

Is there a solution instead where we can tell user's "if you want to avoid conflicts, use a prefix" and document that if there is still a conflict the value that was already at the root wins?

I wouldn't mind seeing if we can achieve this functionality without too much complexity if it means we can provide an option that guarantees no data is lost.

I don't really like the suffix solution as it seems to get pretty messy with edge cases.

The part that makes me feel more comfortable with this is that the conflict resolution will just put in a key that increments the largest numbered suffix already at the root. I think if we stick to that simple rule, we can keep the complexity to a minimum.

This is what the solution currently looks like. May I continue with polishing the implementation, adding tests and preparing the PR for a review? Or do you have a different solution in mind to resolve this issue?

odubajDT · 2024-12-09T11:44:19Z

Is there any issue with the solution proposed? Should we continue with it or try to find a different approach?

odubajDT added enhancement New feature or request needs triage New item requiring triage labels Oct 15, 2024

github-actions bot added the pkg/ottl label Oct 15, 2024

odubajDT mentioned this issue Oct 16, 2024

[pkg/ottl] enhance flatten converter to resolve attribute key conflicts #35831

Closed

evan-bradley removed the needs triage New item requiring triage label Oct 17, 2024

github-actions bot mentioned this issue Oct 22, 2024

Weekly Report: 2024-10-15 - 2024-10-22 #35905

Closed

TylerHelmuth changed the title ~~flatten() function does not resolve attribute key conflicts when flattening~~ [pkg/ottl] flatten() function does not resolve attribute key conflicts when flattening Nov 15, 2024

TylerHelmuth added the priority:p2 Medium label Nov 15, 2024

odubajDT linked a pull request Jan 2, 2025 that will close this issue

[pkg/ottl] enhance flatten editor to resolve attribute key conflicts #37006

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pkg/ottl] flatten() function does not resolve attribute key conflicts when flattening #35793

[pkg/ottl] flatten() function does not resolve attribute key conflicts when flattening #35793

odubajDT commented Oct 15, 2024

github-actions bot commented Oct 15, 2024

joaopgrassi commented Oct 15, 2024

odubajDT commented Oct 16, 2024 •

edited

Loading

evan-bradley commented Oct 17, 2024

odubajDT commented Oct 17, 2024

evan-bradley commented Oct 18, 2024

odubajDT commented Oct 21, 2024

odubajDT commented Oct 25, 2024

odubajDT commented Nov 5, 2024

TylerHelmuth commented Nov 12, 2024

evan-bradley commented Nov 12, 2024

odubajDT commented Nov 15, 2024

odubajDT commented Dec 9, 2024

[pkg/ottl] flatten() function does not resolve attribute key conflicts when flattening #35793

[pkg/ottl] flatten() function does not resolve attribute key conflicts when flattening #35793

Comments

odubajDT commented Oct 15, 2024

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

github-actions bot commented Oct 15, 2024

joaopgrassi commented Oct 15, 2024

odubajDT commented Oct 16, 2024 • edited Loading

evan-bradley commented Oct 17, 2024

odubajDT commented Oct 17, 2024

evan-bradley commented Oct 18, 2024

odubajDT commented Oct 21, 2024

odubajDT commented Oct 25, 2024

odubajDT commented Nov 5, 2024

TylerHelmuth commented Nov 12, 2024

evan-bradley commented Nov 12, 2024

odubajDT commented Nov 15, 2024

odubajDT commented Dec 9, 2024

odubajDT commented Oct 16, 2024 •

edited

Loading