Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix static converters #2270

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Fix static converters #2270

wants to merge 5 commits into from

Conversation

wildum
Copy link
Contributor

@wildum wildum commented Dec 12, 2024

PR Description

#2262 I discovered a few bugs with the static converters:

  • the spanmetrics connector is starting a goroutine on start. If the component is not stopped, it will panic. We don't have this panic on main because the component is not started. With the fix above, the component is started on creation.
  • the azure static converter is always adding two auth methods but only one is allowed
  • the otelcol.exporter.prometheus is not using the default value for the gc_frequency which result in an error when the value is not provided

As explained in the comment in testing.go, we need to run the controller to cleanup the component that are starting go routines on Start. But we can't do it for all components because some have config errors and others panic. Because these are only static converters tests, I decided to add the workaround that the controller will run only when the config contains a spanmetrics connector for now. LMK if you're ok with it or if you want to explore alternatives.

PR Checklist

  • CHANGELOG.md updated
  • Tests updated
  • Config converters updated

@wildum wildum requested a review from a team as a code owner December 12, 2024 13:48
RefreshInterval: time.Duration(sdConfig.RefreshInterval),
ResourceGroup: sdConfig.ResourceGroup,
ProxyConfig: common.ToProxyConfig(sdConfig.HTTPClientConfig.ProxyConfig),
FollowRedirects: sdConfig.HTTPClientConfig.FollowRedirects,
EnableHTTP2: sdConfig.HTTPClientConfig.EnableHTTP2,
TLSConfig: *common.ToTLSConfig(&sdConfig.HTTPClientConfig.TLSConfig),
}

// Only one auth method is allowed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user supplied a config with both auth methods, isn't that technically a problem with the config and not with the converters? I'm ok for this check to remain, but not sure if it's overkill to check for invalid configuration in the converter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that setting two auth methods is possible in static mode because you must use the authentication_method field to specify your method (default OAuth). The problem here is that no matter which auth method the user would set, the converter would always create the two auth blocks, resulting in an invalid config.

ForwardTo: forwardTo,
AddMetricSuffixes: defaultArgs.AddMetricSuffixes,
ResourceToTelemetryConversion: defaultArgs.ResourceToTelemetryConversion,
}

// Override default only if > 0 because GCFrequency of 0 is not allowed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, do we really need to check if the incoming config is invalid? Also, if we set a different GC frequency than what the user specified, we're technically not converting their exact config anymore. Maybe it'd make more sense to just refuse to convert the config, or to log a warning.

But IMO we could just convert it as it is and let Alloy fail when it loads it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug deeper into this one because I could not understand why it would be set to 0 by default and I found out that when we moved to Alloy we transformed some static components into empty shells: https://github.com/grafana/alloy/pull/55/files#diff-1b127540731f7403cbe5211de129eddfc35579d88d6bcbc371520642a9aa5486. The problem with this component is that the default values were set in the newRemoteWriteExporter() function. This StaleTime default value is actually supposed to be 15 minutes. I got rid of the check in the converter and set the correct value in the yaml component. This way, if you convert a config that uses the default yaml value, it will be correctly converted to 15m in Alloy.

},
EnableCommunityComps: true,
})
err = f.LoadSource(cfg, nil, "")

// This is a bit of an ugly workaround but the spanmetrics connector is starting a go routine in its start function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is referring to the start function of OTel Collector components? Do we call this when we call Alloy'sNew() function for a component? I think those New() functions aren't meant to have side effects such as opening files and starting goroutines. They should only do this once Run is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we often have the pattern that the New() function is calling the Update function and this creates some side-effects but only for the tests. Currently we start the Otel components in the Run() function and that's where the race problems appeared. This change: #2262 is making the otel components start in the New() function, which is solving the problem at runtime but it can be a bit more annoying for testing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out... I added a comment to the other PR. New is not supposed to have such side effects.

@ptodev ptodev self-assigned this Dec 12, 2024
Copy link
Contributor

This PR has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If you do not have enough time to follow up on this PR or you think it's no longer relevant, consider closing it.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your PR will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants