-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDK refactoring support #705
base: main
Are you sure you want to change the base?
Conversation
text/0162-refactoring-support.md
Outdated
Not at the moment. One of the constraints we imposed on this feature is that it | ||
should work in any environment, including CI/CD pipelines, where there is no | ||
user to answer questions. Although we could easily extend this feature to | ||
include ambiguity resolution for the interactive case, it wouldn't transfer well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core capability is identifying logical ids that changed and generating the refactor for them at run time. Could we add a capability to generate on a developer machine a mapping file that could be used during ci/cd deployment such that the developer can confirm mappings in advance and commit the mapping file to be picked up during CI/CD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, but I don't feel comfortable with this. It's just too hard for users to get this right. To create a mapping file, you need to predict what the current state of the environment will be. But when it comes the time to apply this mapping, if the state is not what you predicted, the result is undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @otaviomacedo - its too risky right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the risk if this file is explicitly provided by the user? My concern is less about CI/CD but that we are currently just flat out refusing to refactor any ambiguous resources.
state nor in the desired state. | ||
|
||
In particular, the logical ID won't match the CDK construct path, stored in the | ||
resource's metadata. This has consequences for the CloudFormation console, which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has consequences for the CloudFormation console, which will show a Tree view that is not consistent with the Flat view.
is this the only user-facing impact? if so, can user rerun the cdk commands and expect success? if not, do we have a workaround to fix cdk app in this state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can user rerun the cdk commands and expect success?
Yes, in the next run, there will be nothing to refactor, so it goes straight to deployment. If the deployment succeeds, the consistency is restored.
is this the only user-facing impact?
This is the most obvious one. But there is a more esoteric case I can think of:
- Developer renames a resource from "A" to "B", and also adds some new ones.
- Refactor succeeds.
- Deployment fails and the stack is rolled back.
- Before it has a chance to revert the refactor, the CLI gets interrupted.
- Developer decides to abandon that change, and reverts the code to the
previous state (before the change described in step 1). Now, in the code, the
resource is called "A", but in the deployed stack, it is called "B". - Some time later, some other developer, unaware of this discrepancy, makes a
change to the content of resource "A", believing this will lead to an update. - The CDK CLI won't detect this as a rename (because of the content change),
and will thus proceed to deployment, leading to the replacement of resource "B".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wont this also affect cdk diff
? It will show metadata changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. But that works to our advantage, then, doesn't it? It's a warning to developers that it's probably not safe to deploy. But, of course, they can still ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're example was exactly what came to mind to me for risk here.
- occurring may sound like a super niche case, but in practice this kind of thing happening on a large team with CD pipelines is fairly regular occurrence. It's concerning when the outcome is potentially catastrophic.
How about the cli being smarter to detect the current cloud formation status? Something like if refactor is turned on and the cli sees that the current stack state is Update Rollback X, then check if refactor rollback is needed before continuing with new deploy?
Another option would be something like refactor plan being stored as state somewhere (cdk asset bucket?) and this state only gets deleted if the stack update completes or the stack update fails and the refactor rollback completes. If the deploy gets interrupted, on next deploy the cdk would detect the refactor state and complete the rollback before continuing with next deploy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is refactoring itself atomic, could some resources be refactored and others not be refactored? Could such a state be recovered from?
text/0162-refactoring-support.md
Outdated
|
||
There is still some work to be done to prove this system works in practice. But | ||
assuming it does, we could use it to expand the scope to which the automatic | ||
refactoring applies. Consider the case in which you want to rename a certain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah so the proposed in-scope refactoring solution will treat this case (rename a resource + update properties of the resource) a replacement not refactoring? would we print out a warning message for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah so the proposed in-scope refactoring solution will treat this case (rename a resource + update properties of the resource) a replacement not refactoring?
Yes.
would we print out a warning message for this case?
We could do that, but this case is not as clear-cut as it sounds. Unlike the in-scope case, there is no unambiguous way to differentiate between refactor and delete + creation. We would need to define some distance function between two resources, and empirically come up with a threshold for that distance. If there are pairs whose distance is below the threshold, we would print a warning. But it's possible, and something I'm interested in exploring, anyway. This could be a phase 3.
text/0162-refactoring-support.md
Outdated
|
||
✅ Stack refactor complete | ||
|
||
You can configure the refactoring behavior by passing the `--refactoring` flag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is --refactoring
supposed to be given a value? I think the command reads nicely as cdk deploy --refactor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm interpreting this to mean that you can configure refactoring behavior in lieu of the prompt, and the values you can send in to the --refactoring
command is EXECUTE_AND_DEPLOY
, or DEPLOY_WITHOUT_REFACTORING
, or QUIT
. but i think i'm filling in the blanks to get to that understanding; i'd rather the RFC be clear about the options.
I also assume that this is primarily a way to bypass the prompting of the user, which would benefit the toolkit-lib, is that right?
text/0162-refactoring-support.md
Outdated
```json | ||
{ | ||
"app": "...", | ||
"refactoring": "EXECUTE_AND_DEPLOY" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we already have similar configs like these that map to cdk deploy
options? Thinking about how would you configure this per stack(s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to configure this per stack. A refactor operation applies to a set of stacks at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaizencc And I have plans to make every option available in cdk.json
under a well-known schema. The way this flag is currently proposed, it would be:
{
"app": "...",
"build": {
"refactoring": true
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactoring is a boolean command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the proposal that every flag has to be boolean?
- Implement the display of the ambiguities to the user. | ||
- Add physical ID to resource equivalence. | ||
|
||
#### Phase 2 (application) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an implicit bake time here to get customer feedback on the dry-run? We probably shouldn't be doing this if the dry-run is not vetted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't thought about that. The splitting into these phases was more to de-risk the project and lead to smaller PRs. But I guess we can plan a bake time, as well. How long would be good, do you reckon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 months? unless we have good engagement before that and feel confident.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a good use case for the --unstable
flag?
- Implement rollback. | ||
- Implement the progress bar to display the refactoring progress. | ||
- Implement feature flags. | ||
- Handle cross-stack references. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on that? Maybe in the high level design? Feels like this is a delicate and common case, given how easy CDK makes it to create those.
state nor in the desired state. | ||
|
||
In particular, the logical ID won't match the CDK construct path, stored in the | ||
resource's metadata. This has consequences for the CloudFormation console, which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wont this also affect cdk diff
? It will show metadata changes.
Co-authored-by: Momo Kornher <[email protected]>
Co-authored-by: Momo Kornher <[email protected]>
|
||
After refactoring the stack, the CLI will proceed with the deployment | ||
(assuming that is your choice). If the deployment fails, and CloudFormation | ||
rolls it back, the CLI will execute a second refactor, in reverse, to bring the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would CFN roll back?
text/0162-refactoring-support.md
Outdated
|
||
Since we are talking about Git, what about using each commit operation as a | ||
decision point? In this case, the source and target states would come from the | ||
synthesized cloud assemblies in the previous and current revision, respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plenty of customers are doing dynamic stuff in their CDK code based on the environment it runs in (although we dont recommend that). For those customers, the real production cloud assembly cannot be produced by a developer.
└ Queue4 | ||
|
||
then the CLI will not be able to establish a 1:1 mapping between the old and new | ||
names. In this case, it will show you the ambiguity, and stop the deployment: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we should allow the user to provide a resolution file in this case (or any case really) that explicitly maps these changes.
text/0162-refactoring-support.md
Outdated
|
||
### Is this a breaking change? | ||
|
||
No. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call-out: We need to implement this carefully for cdk deploy
otherwise it could accidentally break Cx.
|
||
In this phase we are going to implement the detection of resource moves, and | ||
show the user what changes are going to be made. The only new command available | ||
at this phase is `cdk refactor --dry-run`. Execution of this command without the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using --unstable=refactor
as well?
**How it works** section. | ||
- `refactor`: automatically refactor and deploy. | ||
- `quit`: stop with a non-zero exit code. | ||
- `skip`: deploy without refactoring. This is the default value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this always be the default? I would expect most use cases not want this to be the default and either default to refactor or confirm; and choosing this as the default is to not affect existing behavior. Would the plan be to change the default to refactor in the next major version of cdk cli and this functionality is no longer experimental
|
||
The most straightforward alternative is to implement a simple wrapper around the | ||
CloudFormation API, and have the user provide all the parameters: which | ||
resources to move from which stack to which stack. But the CDK CLI can provide a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, a simple wrapper around the cloud formation stack refactor api does not solve the developer pain problem over the current solution. It might be cleaner than pinning logical ids so refactor can actually occur rather than have a forever mapping in repo, but it doesn't reduce developer cost in terms of management and cognitive load in any way if the cdk doesn't attempt to detect and perform refactors on its own.
alone. A corollary is that such resources can be renamed and have their | ||
properties updated at the same time, and still be considered equivalent. | ||
|
||
Otherwise, the digest is computed from its type, its own properties (that is, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this CFN type? So refactoring from one CDK construct to another that generated the same CFN type and properties would be possible?
|
||
### A. Equivalence between resources | ||
|
||
To detect which resources should be refactored, we need to indentify which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we don't check for resources that will be deleted and ones that will be created and find some near equivalency? Could we generate a warning, e.g. Lambda A is being deleted Lambda B is being created, one property difference is preventing refactoring.
case: there's no way to tell them which is which in case they both move to a | ||
different stack or get renamed. | ||
|
||
Indistinguishable resources in this sense are said to be _equivalent_ (see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In software terms does it matter? if we use one twin or the other, they are identical and for software that would make them interchangeable. Perhaps what we are missing is what might make the CLI see them as identical when in fact they are not and thus not interchangeable? Perhaps some examples, I am guessing maybe some stateful resource might have same properties but actually be in a different state?
If you ever find yourself doing one of the following, you will benefit from | ||
stack refactoring support: | ||
|
||
- Renaming constructs, either intentionally or by mistake. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about moving from nearly equivalent constructs, perhaps an org created lambda construct to the cdk lib lambda construct and vice versa (also see my comment asking if we can detect near equivalence, in this case maybe the org version sets a default the user was not aware of, if we could make them aware they could change their code to explicitly match the default or maybe we could allow them to refactor and change the default)
### How it works | ||
|
||
When you run `cdk deploy`, the CLI will compare the templates in the cloud | ||
assembly with the templates in the deployed stack. If it detects that a resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how will we detect these changes, is it only gonna be based on property level differences as sometimes change in path can lead to change in default tag values, that can be considered a property change?
// Keys are the content address of the resources | ||
// Values are the location addresses | ||
{ | ||
"5e19886121239b7a": { // Moved and renamed -> goes to mapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will it throw ambiguity in case the rename in logicalID is within the same stack?
This is a request for comments about CDK Refactoring Support. See #162 for
additional details.
APIs are signed off by @{BAR_RAISER}.
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache-2.0 license