Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: File conflicts are not handled correctly #231

Open
1 task done
niklastreml opened this issue Dec 9, 2024 · 4 comments
Open
1 task done

Bug: File conflicts are not handled correctly #231

niklastreml opened this issue Dec 9, 2024 · 4 comments
Assignees
Labels
area/target-manager Issues/PRs related to the TargetManager bug Something isn't working

Comments

@niklastreml
Copy link
Collaborator

niklastreml commented Dec 9, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

If a registration file remains in a gitlab repo, like after an unsuccessful shutdown, sparrow fails to re-register at the next boot. This happens, because we're POSTing the new file against the API, which gitlab views as a conflict, as it already exists. Gitlab returns a 400 error, and sparrow fails the initial registration.

Workaround

Until this is fixed, one can manually remove the old registration file from the repo, which should allow the sparrow instance to register again.

Expected Behavior

  1. For now we should print a proper log output that "a file with this name already exits".

Gitlab API returns a 400 with an error:

{
  "message": "A file with this name already exists"
} 
  1. A metric should show the registration status of the sparrow (registered or not). In case the registration failed the metric should be false/0.

Edit: We decided not to overwrite the file that is already present. This should also prevent from choosing the same name for different sparrows twice. We decided that the failing registration is the default behaviour for now.

Steps To Reproduce

  1. Create a test repo
  2. Commit a registration file for your sparrow
{"url":"http://my-sparrow.com","lastSeen":"2024-12-09T15:52:51.763660555Z"}
  1. Start a sparrow which would try to overwrite this file:
name: sparrow.caas-t02.telekom.de
targetManager:
  checkInterval: 1m
  enabled: true
  gitlab:
    baseUrl: https://gitlab.com
    projectId: 123456
  registrationInterval: 10m
  scheme: https
  type: gitlab
  unhealthyThreshold: 48h
  updateInterval: 5h
  1. Observe logs

Relevant logs and/or screenshots, environment information, etc.

{"time":"2024-12-09T15:04:37.426700882Z","level":"ERROR","source":{"function":"github.com/caas-team/sparrow/pkg/sparrow/targets.(*manager).register","file":"/home/runner/work/sparrow/sparrow/pkg/sparrow/targets/manager.go","line":180},"msg":"Failed to register global gitlabTargetManager","error":"request failed, status is 400 Bad Request"}
{"time":"2024-12-09T15:04:37.426726504Z","level":"WARN","source":{"function":"github.com/caas-team/sparrow/pkg/sparrow/targets.(*manager).Reconcile","file":"/home/runner/work/sparrow/sparrow/pkg/sparrow/targets/manager.go","line":100},"msg":"Failed to register self as global target","error":"request failed, status is 400 Bad Request"} 

Who can address the issue?

Any Dev

Anything else?

https://docs.gitlab.com/ee/api/repository_files.html#update-existing-file-in-repository

@niklastreml niklastreml added bug Something isn't working area/target-manager Issues/PRs related to the TargetManager labels Dec 9, 2024
@puffitos
Copy link
Collaborator

puffitos commented Dec 9, 2024

This was intended behavior, to be honest. The sparrow should remain simple and not try to assume anything about the backend repository, if it cannot register itself because of a sparrow with the same name. As you've mentioned, this cpuld indicate another concurrent sparrow running, which is something the user should first handle themselves.

The maintainers of the backend should keep their inventory in check. If we want to make sure that an unregistered sparrow is unhealthy, we should make that part of the healthiness endpoint and not make assumptions about the target repository. Of course, only if the targetManager is active.

I find the idea of a forceUpdate flag good middle ground, as it moves the responsibility to the user. Its implementation may vary heavily depending on the target backend, which may also be another can of worms we don't wanna open.

@niklastreml
Copy link
Collaborator Author

if we want to handle the case of multiple sparrows properly,we might want to make it easier for the sparrows to distinguish this case of multiple instances. The same issue could still arise now in the following case:

  • Sparrow1 registers as sparrow.com
  • Sparrow2 registers as sparrow.com => fails
  • User removes duplicate config
  • Sparrow2 creates new config, before Sparrow1 tries to update the file
    Now both Sparrow1 and Sparrow2 are happily writing to the same config, unaware of each other

I think this is highly unlikely to be honest. If we really want to detect this case properly, we might want to generate a random token on startup and include it in the registration. That way sparrow can detect, if the config has a token different from its own and log the case.

I'm gonna try to implement something this week, to allow a forceUpdate flag, so this is easier to fix for users. Although I do kind of worry, that this will be the kind of flag that is always going to be set to true, as it's easier

@niklastreml niklastreml self-assigned this Dec 10, 2024
@y-eight
Copy link
Member

y-eight commented Dec 11, 2024

I have updated the issue with the output of the discussion some minutes ago.

@niklastreml niklastreml removed their assignment Dec 12, 2024
@puffitos puffitos self-assigned this Dec 23, 2024
@puffitos
Copy link
Collaborator

Regarding the 400 from the POST request:

This shouldn't happen if users properly clean up their registration repos - the only case of this happening is when the scheme of the sparrow changes, since we save the file with just the DNS name and not the scheme in its name (the scheme came after the conception of the target manager). The target manager catches the case where the sparrow didn't deregister properly by checking for the presence of a registration file:

https://github.com/caas-team/sparrow/blob/92835cfcbcdded009e925495d1705ab3312a81b9/pkg/sparrow/targets/manager.go#L259-L264

In action:

time=18:51:18 level=DEBUG source=/Users/puffito/dev/repos/sparrow/pkg/sparrow/targets/remote/gitlab/gitlab.go:234 msg="Successfully fetched file list recursively" files=1
time=18:51:18 level=DEBUG source=/Users/puffito/dev/repos/sparrow/pkg/sparrow/targets/remote/gitlab/gitlab.go:151 msg="Successfully fetched file" file=dev.telekom.de.json
time=18:51:18 level=INFO source=/Users/puffito/dev/repos/sparrow/pkg/sparrow/targets/remote/gitlab/gitlab.go:105 msg="Successfully fetched all target files" files=1
time=18:51:18 level=DEBUG source=/Users/puffito/dev/repos/sparrow/pkg/sparrow/targets/manager.go:261 msg="Found self as global target" lastSeenMinsAgo=21.492964566666668

So this error can really only happen if someone changed their scheme from http -> https (for example) and now the sparrow can't find itself anymore and tries to reregister. It that case, the user should have either:

  • explained in an issue why the sparrow didn't deregister properly, so we can address it
  • clean up the old registration file themselves (the error is a 400 -> something's wrong on our side)

Just marshalling the error message here isn't the most sustainable solution here. It adds code which could still result in errors not being properly decoded into a BadRequest struct (see the various possible structures of a 400 Error in the official docs. ATM we're only getting

{
  "message": "A file with this name already exists"
} 

but who knows, what other kind of error message is possible in this case. Just reading the body in case of an error seems a bit simpler and it should provide the information needed for a non-technical user.

IMO, we shouldn't start decoding the Gitlab Errors. We're not a Gitlab client, we're a canary checker that offers some integration with a remote Gitlab repository as a state manager. Conveying the error code and the response body should be more than enough.

@puffitos puffitos mentioned this issue Dec 25, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/target-manager Issues/PRs related to the TargetManager bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants