Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to provide the agent-id in enroll API #4226

Open
blakerouse opened this issue Dec 17, 2024 · 5 comments · May be fixed by #4290
Open

Add ability to provide the agent-id in enroll API #4226

blakerouse opened this issue Dec 17, 2024 · 5 comments · May be fixed by #4290
Assignees
Labels
Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@blakerouse
Copy link
Contributor

blakerouse commented Dec 17, 2024

Describe the enhancement:

Add the ability to specify the agent-id of the enrolling Elastic Agent.

Describe a specific use case for the enhancement or feature:

On serverless, an Elastic Agent is static but the pod doesn't have any persistent storage so it cannot store the enrollment information between restarts of the Elastic Agent. There have also been other reports of this issue from customers where they do not need persistent storage from the integration and requiring the Elastic Agent to have it just for the enrollment information is not possible.

To provide a stable Elastic Agent in the agents list in Kibana, this would allow an Elastic Agent to enroll with the ID they want to have. This would also replace an existing Elastic Agent if one already has the same ID.

The new enrolled Elastic Agent will replace the previous Elastic Agent prevent it from being able to communicate with the Fleet Server any more.

Describe any security issues:

This does open the possibility that if a bad actor had the enrollment token and the ID of the Elastic Agent it would be able to enroll over top of it and prevent the communication of that current Elastic Agent as the other Elastic Agent would be come the newly communicating Elastic Agent.

To prevent this only an additional replace-token would be added to the enrollment API. This would be any unique value that is stored as a bcrypt hash on the Elastic Agent record. If an Elastic Agent is enrolled without this token then it doesn't allow any other Elastic Agent to enroll with the same ID (trying to enroll with the same ID would error). If an Elastic Agent is enrolled with the replace token and its the first enrollment then it would successfully enroll. On a second enrollment to replace the Elastic Agent the exact same replace token must be provided and if it matches (using bcrypt hash) then it would be considered the replacement of the Elastic Agent and allow the enrollment to complete.

@michel-laterman
Copy link
Contributor

I think we already provide this through the enrollment_id in the API:

enrollment_id:
type: string
description: |
The enrollment ID of the agent.
To replace an agent on enroll fail.
The existing agent with a matching enrollment_id will be deleted if it never checked in. The new agent will be enrolled with the enrollment_id.

It was added with #2655

@blakerouse
Copy link
Contributor Author

@michel-laterman The existing agent with a matching enrollment_id will be deleted if it never checked in. What if it has checked-in?

@kpollich kpollich assigned kpollich and unassigned kpollich Jan 7, 2025
@michel-laterman
Copy link
Contributor

@blakerouse and I had a brief conversation about this.

We've decided to add an ID field to enrolment requests that is distinct from the existing enrollment_id value.
If this field is used, and indicates an existing agent that agent's current policy & existing API keys are used by the "new agent".
If the agent does not exist it's treated as a new enrolment.

This is so that we don't break/get blocked on existing scale tests when delivering this feature; and as a follow up we can see if we can make the scale tests just use the new ID value and deprecate enrollment_id (cc @juliaElastic).

I've also looked a bit more into opamp for how it handles duplicate IDs. In short, this type of workflow (where we may have more than one pod that are "the same agent") isn't supported.
It's pretty clear by the implications of the duplicate websockets connection section.

When sending a message an agent is able to specify their own instance_uid value or request one from the server
The server can also force agents to use a new instance_uid value at any time.

Additionally AgentToServer messages are expected to be sequential (indicated by sequence_num) as a mechanism for detecting missed messages.

Supporting this workflow is something we'll need to handle once we start supporting opamp.

@blakerouse blakerouse linked a pull request Jan 8, 2025 that will close this issue
7 tasks
@jlind23
Copy link
Contributor

jlind23 commented Jan 8, 2025

@nimarezainia do you think this is something we could piggy back on in order to migrate Agent from a cluster to another using the same ID in the enroll command?

@blakerouse
Copy link
Contributor Author

@jlind23 After our discussion of the security implications I have added a section to the description about the addition agent-token API option for enrollment. Hopefully this implementation would alleviate those implications.

@elastic/product-security Could you give the security implications a review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants