Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] flatten() function does not resolve attribute key conflicts when flattening #35793

Open
odubajDT opened this issue Oct 15, 2024 · 13 comments · May be fixed by #37006
Open

[pkg/ottl] flatten() function does not resolve attribute key conflicts when flattening #35793

odubajDT opened this issue Oct 15, 2024 · 13 comments · May be fixed by #37006
Labels
enhancement New feature or request pkg/ottl priority:p2 Medium

Comments

@odubajDT
Copy link
Contributor

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

The flatten() function is not able to flatten a map, which has conflicting keys after flattening. In the current state, the latest processed key in the map takes precedence. An example is

{
  "name": "test",
  "address": {
    "street": "first",
    "house": 1234
  },
  "address.street": 2
}

which will result in

{
  "name": "test",
  "address.street": 2,
  "address.house": 1234,
}

and therefore we have data loss

Describe the solution you'd like

I would like to introduce another optional parameter of type boolean, which will enable/disable the resolution of conflicts by adding a suffix for the conflicting keys, similar to how flatten() handles slices. The definition might look like this

flatten(target, Optional[prefix], Optional[depth], Optional[resolve_conflict])

Where the 3rd parameter is type boolean and if set to false the behavior stays as it is. If set to true, it will keep all the conflicting keys and use a suffix to distinguish them. A solution for the mentioned example would be

{
  "name": "test",
  "address.street.00": "first",
  "address.house": 1234,
  "address.street.01": 2,
}

We can use a prefix 0 to the suffix to distinguish the array conversions and map key conflict resolutions

Describe alternatives you've considered

No response

Additional context

No response

@odubajDT odubajDT added enhancement New feature or request needs triage New item requiring triage labels Oct 15, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@joaopgrassi
Copy link
Member

Would also be nice to keep the "original" attribute as is and only add the suffix to the new ones that arrived after flattening. This way it is clear which one was the original vs modified.

@odubajDT
Copy link
Contributor Author

odubajDT commented Oct 16, 2024

created a first version of solution how this can potentially work #35831. Please consider it as a draft of an idea and not a PR ready for review.

Happy to hear about your feedback!

@evan-bradley evan-bradley removed the needs triage New item requiring triage label Oct 17, 2024
@evan-bradley
Copy link
Contributor

I agree we should have a conflict resolution option available for this function, thanks for opening a draft PR to look over.

Can we offer more than just a suffixing strategy? I can think of a few others that we should offer:

  1. Merge the two keys and put the values in a list.
  2. First wins
  3. Last wins (we should keep this as the default for now)

@odubajDT
Copy link
Contributor Author

I agree we should have a conflict resolution option available for this function, thanks for opening a draft PR to look over.

Can we offer more than just a suffixing strategy? I can think of a few others that we should offer:

1. Merge the two keys and put the values in a list.

Interesting idea, but doesn't this go against the functionality of flatten()? Since the function is actually converting slice to a flat map, it seems to me that this will go against the principle of the function

2. First wins

3. Last wins (we should keep this as the default for now)

This might be valid, but how can you put an order on a map (unordered datatype)? Here we might be able to choose between the deeper/shallower value, but ordering might be problematic, since it will be non-deterministic.

@evan-bradley
Copy link
Contributor

Interesting idea, but doesn't this go against the functionality of flatten()? Since the function is actually converting slice to a flat map, it seems to me that this will go against the principle of the function

I see what you mean, I had forgotten slices are flattened as well. In that case, I think we just need to make sure that suffixing keeps slices in mind, for cases like this:

{
  "name": "test",
  "address": {
    "street": "first",
    "house": 1234
  },
  "address.street": ["fifth", "avenue"]
}

This might be valid, but how can you put an order on a map (unordered datatype)? Here we might be able to choose between the deeper/shallower value, but ordering might be problematic, since it will be non-deterministic.

That's a fair point, but pdata maps are actually essentially just wrappers around slices to key-value objects, so there is technically a deterministic ordering. Maybe providing options for preferring deeper/shallower values is a better approach. If you don't provide first-wins/last-wins options, I think we should deprecate the current behavior.

@odubajDT
Copy link
Contributor Author

Interesting idea, but doesn't this go against the functionality of flatten()? Since the function is actually converting slice to a flat map, it seems to me that this will go against the principle of the function

I see what you mean, I had forgotten slices are flattened as well. In that case, I think we just need to make sure that suffixing keeps slices in mind, for cases like this:

{
  "name": "test",
  "address": {
    "street": "first",
    "house": 1234
  },
  "address.street": ["fifth", "avenue"]
}

This might be valid, but how can you put an order on a map (unordered datatype)? Here we might be able to choose between the deeper/shallower value, but ordering might be problematic, since it will be non-deterministic.

That's a fair point, but pdata maps are actually essentially just wrappers around slices to key-value objects, so there is technically a deterministic ordering. Maybe providing options for preferring deeper/shallower values is a better approach. If you don't provide first-wins/last-wins options, I think we should deprecate the current behavior.

Ok let's get some conflict examples solved here, so I can proceed with the implementation.

  1. Use case with conflicting maps
"address": map[string]any{
	"street": "first",
	"house":  int64(1234),
},
"address.street": "second",

results in

"address.street":   "first",
"address.house":    int64(1234),
"address.street.1": "second",
  1. Use case with conflicting slices
"address": map[string]any{
	"street": []any{"first"},
	"house":  int64(1234),
},
"address.street": []any{"second"},

results in

"address.street.0":   "first",
"address.house":      int64(1234),
"address.street.0.1": "second",
  1. Use case with conflicting slices and maps
"address": map[string]any{
	"street": map[string]any{
		"number": "first",
	},
	"house": int64(1234),
},
"address.street": map[string]any{
	"number": []any{"second", "third"},
},
"address.street.number": "fourth",

results in

"address.street.number":   "first",
"address.house":           int64(1234),
"address.street.number.0": "second",
"address.street.number.1": "third",
"address.street.number.2": "fourth",

Would you expect a different behavior in any of the use-cases? Especially the third use-case would need some additional attention, since I am not sure, if the information that parameters were an array is important in the resulting outcome. If not, we can just bump the suffix counter without actually preserving the information in any way.

Thanks!

@odubajDT
Copy link
Contributor Author

Ok, looked at a little bit different approach how to resolve conflicts between slices and maps, please have a look at the PR.

If you agree, I can have a look at the potential first/last wins (only when using no conflicts resolution - here it makes no sense, since we do not loose any data)

Thanks!

@evan-bradley @TylerHelmuth

@odubajDT
Copy link
Contributor Author

odubajDT commented Nov 5, 2024

cc @evan-bradley @TylerHelmuth

@TylerHelmuth
Copy link
Member

Is there a solution instead where we can tell user's "if you want to avoid conflicts, use a prefix" and document that if there is still a conflict the value that was already at the root wins? I don't really like the suffix solution as it seems to get pretty messy with edge cases.

@evan-bradley
Copy link
Contributor

Is there a solution instead where we can tell user's "if you want to avoid conflicts, use a prefix" and document that if there is still a conflict the value that was already at the root wins?

I wouldn't mind seeing if we can achieve this functionality without too much complexity if it means we can provide an option that guarantees no data is lost.

I don't really like the suffix solution as it seems to get pretty messy with edge cases.

The part that makes me feel more comfortable with this is that the conflict resolution will just put in a key that increments the largest numbered suffix already at the root. I think if we stick to that simple rule, we can keep the complexity to a minimum.

@odubajDT
Copy link
Contributor Author

Is there a solution instead where we can tell user's "if you want to avoid conflicts, use a prefix" and document that if there is still a conflict the value that was already at the root wins?

I wouldn't mind seeing if we can achieve this functionality without too much complexity if it means we can provide an option that guarantees no data is lost.

I don't really like the suffix solution as it seems to get pretty messy with edge cases.

The part that makes me feel more comfortable with this is that the conflict resolution will just put in a key that increments the largest numbered suffix already at the root. I think if we stick to that simple rule, we can keep the complexity to a minimum.

This is what the solution currently looks like. May I continue with polishing the implementation, adding tests and preparing the PR for a review? Or do you have a different solution in mind to resolve this issue?

@TylerHelmuth TylerHelmuth changed the title flatten() function does not resolve attribute key conflicts when flattening [pkg/ottl] flatten() function does not resolve attribute key conflicts when flattening Nov 15, 2024
@TylerHelmuth TylerHelmuth added the priority:p2 Medium label Nov 15, 2024
@odubajDT
Copy link
Contributor Author

odubajDT commented Dec 9, 2024

Is there any issue with the solution proposed? Should we continue with it or try to find a different approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg/ottl priority:p2 Medium
Projects
None yet
4 participants