From 54ebe7ada370ab2af9c65eb7b37aa7fb070f0266 Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Tue, 18 Feb 2025 17:27:51 +0000 Subject: [PATCH 01/21] WIP --- text/0162-refactoring-support.md | 278 +++++++++++++++++++++++++++++++ 1 file changed, 278 insertions(+) create mode 100644 text/0162-refactoring-support.md diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md new file mode 100644 index 000000000..f9dbb8500 --- /dev/null +++ b/text/0162-refactoring-support.md @@ -0,0 +1,278 @@ +# CDK Refactoring Support + +- **Original Author(s):**: @otaviomacedo +- **Tracking Issue**: #162 +- **API Bar Raiser**: @{BAR_RAISER_USER} + +## Working Backwards + +AWS CloudFormation identifies resources by their logical ID. As a consequence, +if you change the logical ID of a resource after it has been deployed, +CloudFormation will create a new resource with the new logical ID and possibly +delete the old one. In any case, for stateful resources, this may cause +interruption of service or data loss. + +Historically, the advice has been to avoid changing logical IDs of resources. +However, this is not always possible, or goes against good engineering +practices. For example, you may have duplicated code across different CDK +applications, and you want to consolidate it into a single reusable construct +(usually referred to as an L3 construct). The very introduction of a new root +for the construct will lead to the renaming of the logical IDs of the resources +in that subtree. You may also need to move resources around in the tree to make +it more readable, or even between stacks to better isolate concerns. Not to +mention accidental renames, which are bound to happen. + +With the recent launch of CloudFormation's stack refactoring API, the CDK now +automatically detects these cases, and refactors the stack on your behalf. This +brings more flexibility for developers, and reduces the risk of accidental +changes that lead to resource renaming. + +### How it works + +When you run `cdk deploy`, the CLI will compare the template in the cloud +assembly with the template in the deployed stack. If it detects that a resource +has been renamed, it will automatically perform the refactoring, and then +proceed with the deployment. + +For example, suppose your CDK application has a single stack, called `MyStack`, +containing an S3 bucket, a CloudFront distribution and a Lambda function. The +construct tree (L1 resources omitted for brevity) looks like this: + + App + └ MyStack + └ Bucket + └ Distribution + └ Function + +Now suppose you want to make the following changes: + +- Rename the bucket from `Bucket` to the more descriptive `Origin`. +- Create a new L3 construct called `Website` that groups the bucket and the + distribution, to make this pattern reusable in different applications. +- Move the `Website` construct to a new stack called `Web`, for better + separation of concerns. +- Rename the original stack to `Service`, to better reflect its new specific + role in the application. + +The construct tree now looks like this: + + App + └ Web + └ Website + └ Origin + └ Distribution + └ Service + └ Function + +Even though none of the resources have changed, their paths have +(`MyStack/Bucket/Resource` to `Web/Website/Origin/Resource` etc.) Since the CDK +computes the logical IDs of the resources based on their path in the tree, all +three resources will have different logical IDs in the synthesized template. + +If you run `cdk deploy` now, the CLI will detect this change and automatically +refactor the stacks, by default. You will see the following output: + + Refactoring... + Creating stack refactor... + + Prg | Time | Status | Resource Type | Old Logical ID | New Logical ID + ----|------------|----------------------|-------------------------------|------------------------------|----------------------------- + 0/3 | 2:03:17 PM | REFACTOR_IN_PROGRESS | AWS::S3::Bucket | MyStack.Bucket5766466B | Web.Bucket843D52FF + 0/3 | 2:03:17 PM | REFACTOR_IN_PROGRESS | AWS::CloudFront::Distribution | MyStack.DistributionE3BB089E | Web.Distribution7142E1F1 + 1/3 | 2:03:18 PM | REFACTOR_COMPLETE | AWS::S3::Bucket | MyStack.Bucket5766466B | Web.Bucket843D52FF + 1/3 | 2:03:18 PM | REFACTOR_IN_PROGRESS | AWS::Lambda::Function | MyStack.FunctionA5EA2BD8 | Service.Function8F0BB69B + 2/3 | 2:03:19 PM | REFACTOR_COMPLETE | AWS::CloudFront::Distribution | MyStack.DistributionE3BB089E | Web.DistributionE3BB089E + 3/3 | 2:03:20 PM | REFACTOR_COMPLETE | AWS::Lambda::Function | MyStack.FunctionA5EA2BD8 | Service.FunctionA5EA2BD8 + + ✅ Stack refactor complete + +In case you do want to replace the resources, you can override this default +behavior and skip the refactoring, by passing the `--skip-refactoring` flag to +the CLI, or by configuring this setting in the `cdk.json` file: + +```json +{ + "app": "...", + "skipRefactoring": true +} +``` + +### Ambiguity + +In some cases, the CLI may not be able to automatically refactor the stack. To +understand this, consider the following example, where there are two identical +resources, called `Queue1` and `Queue2`, in the same stack: + + App + └ Stack + └ Queue1 + └ Queue2 + +If they get renamed to, let's say, `Queue3` and `Queue4`, + + App + └ Stack + └ Queue3 + └ Queue4 + +Then the CLI will not be able to establish a 1:1 mapping between the old and new +names. In this case, it will ask you to confirm the changes: + + Resource Name Changes + ┌───┬──────────────────────┐ + │ │ Resource │ + ├───┼──────────────────────┤ + │ - │ Stack.Queue1A4198146 │ + │ │ Stack.Queue2B0BA5D32 │ + ├───┼──────────────────────┤ + │ + │ Stack.Queue3C7606C37 │ + │ │ Stack.Queue4D681F510 │ + └───┴──────────────────────┘ + + If you want to take advantage of automatic resource refactoring, avoid + renaming or moving multiple identical resources at the same time. + + If these changes were intentional, and you want to proceed with the + resource replacements, please confirm below. + + Do you wish to deploy these changes (y/n)? + +To skip this prompt, pass the `--ignore-ambiguous-renaming` flag to the CLI, or +configure this setting in the `cdk.json` file: + +```json +{ + "app": "...", + "ignoreAmbiguousRenaming": true +} +``` + +--- + +Ticking the box below indicates that the public API of this RFC has been +signed-off by the API bar raiser (the `status/api-approved` label was applied to +the RFC pull request): + +``` + +[ ] Signed-off by API Bar Raiser @xxxxx + +``` + +## Public FAQ + +### What are we launching today? + +An improvement to the CDK CLI `deploy` command, to protect against replacement +of resources that had their logical IDs modified. The CLI now detects these +cases automatically, and refactors the stack before the actual deployment. + +### Why should I use this feature? + +If you ever find yourself needing to do one of the following, you will benefit +from stack refactoring support: + +- Moving constructs between different stacks, either in the same or to a + different CDK application. +- Moving constructs within the same stack. This could be just for organization + purposes, or to create reusable components. +- Renaming a stack. +- Renaming a construct, intentionally or by mistake. + +## Internal FAQ + +> The goal of this section is to help decide if this RFC should be implemented. +> It should include answers to questions that the team is likely ask. Contrary +> to the rest of the RFC, answers should be written "from the present" and +> likely discuss design approach, implementation plans, alternative considered +> and other considerations that will help decide if this RFC should be +> implemented. + +### Why are we doing this? + +This feature was initially requested back in May 2020. It is one of the top 5 +most up-voted RFCs, with 94 thumbs-up. Despite being almost 5 years old, it has +continued to be highly discussed and debated for almost 5 years by CDK customers +and the community. The solution we currently provide, the `renameLogicalId` +method, is perceived by customers as a workaround, adding unnecessary cognitive +load for developers. Code refactoring is a fundamental job-to-be-done that +developers expect to be supported natively by the tool. + +In addition to this, the recent launch of CloudFormation's stack refactoring API +made it possible to support refactoring on the service side. We are building on +top of that API to bring a seamless experience to CDK users. + +### Why should we _not_ do this? + +> Is there a way to address this use case with the current product? What are the +> downsides of implementing this feature? + +### What is the technical solution (design) of this feature? + +On a very high level, this is what the refactoring algorithm does: + +First, it lists all the stacks (both local and deployed). Then it builds an +index of all resources from all stacks. This index maps the _content_ address +(hash) of each resource to all the _location_ addresses (stack name + logical +ID) they can be found in. Resources that have different locations in new stacks +compared to the old ones are considered to have been moved. For each of those, +it creates a mapping from the old location to the new one. + +Since the CloudFormation API expects not only the mapping, but also the +templates in their final states, we need to compute those as well. This is done +by applying all the mappings locally, essentially simulating what CloudFormation +will eventually do. For example, if a mapping says that a resource has moved +from stack A with name Foo to stack B with name Bar, we will remove Foo from the +template for stack A, and add a new resource called Bar to the template for +stack B. + +At this point, if there is any ambiguity (more than one source or more than one +destination for a given mapping), it stops and asks the user whether they want +to proceed. + +Assuming there were no ambiguities, or the user wants to proceed anyway, it is +ready to call the API to actually perform the refactor, using the mappings and +templates built previously. + +As every AWS API call, refactoring is restricted to a given environment +(account and region). Given that one CDK app can have stacks for multiple +environments, the CLI will group the stacks by environment and perform the +refactoring separately in each one. Trying to move resources between stacks that +belong in different environments will result in an error. + + +### Is this a breaking change? + +No. + +### What alternative solutions did you consider? + +> Briefly describe alternative approaches that you considered. If there are +> hairy details, include them in an appendix. + +### What are the drawbacks of this solution? + +> Describe any problems/risks that can be introduced if we implement this RFC. + +### What is the high-level project plan? + +> Describe your plan on how to deliver this feature from prototyping to GA. +> Especially think about how to "bake" it in the open and get constant feedback +> from users before you stabilize the APIs. +> +> If you have a project board with your implementation plan, this is a good +> place to link to it. + +### Are there any open issues that need to be addressed later? + +> Describe any major open issues that this RFC did not take into account. Once +> the RFC is approved, create GitHub issues for these issues and update this RFC +> of the project board with these issue IDs. + +## Appendix + +Feel free to add any number of appendices as you see fit. Appendices are +expected to allow readers to dive deeper to certain sections if they like. For +example, you can include an appendix which describes the detailed design of an +algorithm and reference it from the FAQ. + From 89e89807b338fcf2e53dd0ec661cdb5d4fd59949 Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Wed, 19 Feb 2025 11:15:17 +0000 Subject: [PATCH 02/21] WIP --- text/0162-refactoring-support.md | 137 +++++++++++++++++-------------- 1 file changed, 76 insertions(+), 61 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index f9dbb8500..5971ea315 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -4,35 +4,40 @@ - **Tracking Issue**: #162 - **API Bar Raiser**: @{BAR_RAISER_USER} +An improvement to the CDK CLI `deploy` command, to protect against replacement +of resources that had their logical IDs modified. The CLI now detects these +cases automatically, and refactors the stack before the actual deployment. + ## Working Backwards AWS CloudFormation identifies resources by their logical ID. As a consequence, if you change the logical ID of a resource after it has been deployed, CloudFormation will create a new resource with the new logical ID and possibly delete the old one. In any case, for stateful resources, this may cause -interruption of service or data loss. +interruption of service or data loss, or both. Historically, the advice has been to avoid changing logical IDs of resources. However, this is not always possible, or goes against good engineering practices. For example, you may have duplicated code across different CDK applications, and you want to consolidate it into a single reusable construct -(usually referred to as an L3 construct). The very introduction of a new root -for the construct will lead to the renaming of the logical IDs of the resources -in that subtree. You may also need to move resources around in the tree to make -it more readable, or even between stacks to better isolate concerns. Not to -mention accidental renames, which are bound to happen. - -With the recent launch of CloudFormation's stack refactoring API, the CDK now -automatically detects these cases, and refactors the stack on your behalf. This -brings more flexibility for developers, and reduces the risk of accidental -changes that lead to resource renaming. +(usually referred to as an L3 construct). The very introduction of a new node +for the L3 construct in the construct tree will lead to the renaming of the +logical IDs of the resources in that subtree. You may also need to move +resources around in the tree to make it more readable, or even between stacks to +better isolate concerns. Not to mention accidental renames, which have also +impacted customers in the past. + +To address all these problems, the CDK CLI now automatically detects these +cases, and refactors the stack on your behalf, using the new CloudFormation +stack refactoring API. This brings more flexibility for developers, and reduces +the risk of accidental changes that lead to resource renaming. ### How it works -When you run `cdk deploy`, the CLI will compare the template in the cloud -assembly with the template in the deployed stack. If it detects that a resource -has been renamed, it will automatically perform the refactoring, and then -proceed with the deployment. +When you run `cdk deploy`, the CLI will compare the templates in the cloud +assembly with the templates in the deployed stack. If it detects that a resource +has been moved or renamed, it will automatically perform the refactoring, and +then proceed with the deployment. For example, suppose your CDK application has a single stack, called `MyStack`, containing an S3 bucket, a CloudFront distribution and a Lambda function. The @@ -40,13 +45,14 @@ construct tree (L1 resources omitted for brevity) looks like this: App └ MyStack - └ Bucket - └ Distribution + ├ Bucket + ├ Distribution └ Function -Now suppose you want to make the following changes: +Now suppose you want to make the following changes, after having deployed it to +your AWS account: -- Rename the bucket from `Bucket` to the more descriptive `Origin`. +- Rename the bucket from `Bucket` to the more descriptive name `Origin`. - Create a new L3 construct called `Website` that groups the bucket and the distribution, to make this pattern reusable in different applications. - Move the `Website` construct to a new stack called `Web`, for better @@ -57,20 +63,21 @@ Now suppose you want to make the following changes: The construct tree now looks like this: App - └ Web - └ Website - └ Origin - └ Distribution + ├ Web + │ └ Website + │ ├ Origin + │ └ Distribution └ Service └ Function Even though none of the resources have changed, their paths have -(`MyStack/Bucket/Resource` to `Web/Website/Origin/Resource` etc.) Since the CDK -computes the logical IDs of the resources based on their path in the tree, all -three resources will have different logical IDs in the synthesized template. +(from `MyStack/Bucket/Resource` to `Web/Website/Origin/Resource` etc.) Since the +CDK computes the logical IDs of the resources based on their path in the tree, +all three resources will have different logical IDs in the synthesized template, +compared to what is already deployed. -If you run `cdk deploy` now, the CLI will detect this change and automatically -refactor the stacks, by default. You will see the following output: +If you run `cdk deploy` now, by default the CLI will detect this change and +automatically refactor the stacks. You will see the following output: Refactoring... Creating stack refactor... @@ -100,19 +107,19 @@ the CLI, or by configuring this setting in the `cdk.json` file: ### Ambiguity In some cases, the CLI may not be able to automatically refactor the stack. To -understand this, consider the following example, where there are two identical +understand this, consider the following example, where there are two _identical_ resources, called `Queue1` and `Queue2`, in the same stack: App └ Stack - └ Queue1 + ├ Queue1 └ Queue2 If they get renamed to, let's say, `Queue3` and `Queue4`, App └ Stack - └ Queue3 + ├ Queue3 └ Queue4 Then the CLI will not be able to establish a 1:1 mapping between the old and new @@ -181,13 +188,6 @@ from stack refactoring support: ## Internal FAQ -> The goal of this section is to help decide if this RFC should be implemented. -> It should include answers to questions that the team is likely ask. Contrary -> to the rest of the RFC, answers should be written "from the present" and -> likely discuss design approach, implementation plans, alternative considered -> and other considerations that will help decide if this RFC should be -> implemented. - ### Why are we doing this? This feature was initially requested back in May 2020. It is one of the top 5 @@ -209,21 +209,21 @@ top of that API to bring a seamless experience to CDK users. ### What is the technical solution (design) of this feature? -On a very high level, this is what the refactoring algorithm does: +High level description of the algorithm: -First, it lists all the stacks (both local and deployed). Then it builds an -index of all resources from all stacks. This index maps the _content_ address -(hash) of each resource to all the _location_ addresses (stack name + logical -ID) they can be found in. Resources that have different locations in new stacks -compared to the old ones are considered to have been moved. For each of those, -it creates a mapping from the old location to the new one. +First, it lists all the stacks: both local and deployed. Then it builds an index +of all resources from all stacks. This index maps the _content_ address +(physical ID or digest) of each resource to all the _location_ addresses (stack +name + logical ID) they can be found in. Resources that have different locations +in new stacks compared to the old ones are considered to have been moved. For +each of those, it creates a mapping from the old location to the new one. -Since the CloudFormation API expects not only the mapping, but also the +Since the CloudFormation API expects not only the mappings, but also the templates in their final states, we need to compute those as well. This is done -by applying all the mappings locally, essentially simulating what CloudFormation +by applying all the mappings locally, essentially emulating what CloudFormation will eventually do. For example, if a mapping says that a resource has moved -from stack A with name Foo to stack B with name Bar, we will remove Foo from the -template for stack A, and add a new resource called Bar to the template for +from stack A with name Foo, to stack B with name Bar, we will remove Foo from +the template for stack A, and add a new resource called Bar to the template for stack B. At this point, if there is any ambiguity (more than one source or more than one @@ -232,7 +232,7 @@ to proceed. Assuming there were no ambiguities, or the user wants to proceed anyway, it is ready to call the API to actually perform the refactor, using the mappings and -templates built previously. +templates computed previously. As every AWS API call, refactoring is restricted to a given environment (account and region). Given that one CDK app can have stacks for multiple @@ -240,7 +240,6 @@ environments, the CLI will group the stacks by environment and perform the refactoring separately in each one. Trying to move resources between stacks that belong in different environments will result in an error. - ### Is this a breaking change? No. @@ -252,22 +251,38 @@ No. ### What are the drawbacks of this solution? -> Describe any problems/risks that can be introduced if we implement this RFC. +See the open issues section below. ### What is the high-level project plan? -> Describe your plan on how to deliver this feature from prototyping to GA. -> Especially think about how to "bake" it in the open and get constant feedback -> from users before you stabilize the APIs. -> -> If you have a project board with your implementation plan, this is a good -> place to link to it. +See https://github.com/orgs/aws/projects/272. ### Are there any open issues that need to be addressed later? -> Describe any major open issues that this RFC did not take into account. Once -> the RFC is approved, create GitHub issues for these issues and update this RFC -> of the project board with these issue IDs. +This improved deployment experience actually consists of two separate steps, +behind the scenes: refactoring followed by deployment. And the whole workflow is +controlled by the CLI. As a result, this is not an atomic operation: it is +possible that the refactoring step succeeds, but before the CLI has a chance to +deploy, it gets interrupted, for whatever reason (computer crash, network +failures, etc.) In this case, the user will be left with a stack that is neither +in the original state nor in the desired state. + +In particular, the logical ID won't match the CDK construct path, stored in the +resource's metadata. This has consequences for the CloudFormation console, which +will show a Tree view that is not consistent with the Flat view. + +Some possible solutions to consider, from more specific to more general: + +- CloudFormation to ignore changes in the `Metadata[aws:cdk:path]` resource + attribute in refactor operations. +- CloudFormation to allow resource additions and deletions in refactor + operations. +- Two-phase commit. The CLI could create the refactor and the changeset, and + then have a new command to execute both in a single atomic operation (let's + say, a `executeChangeSetAndRefactor()`). + +Since all these solutions depend on changes on the CloudFormation side, and this +edge case is unlikely to happen, we are going to address it later. ## Appendix From ec5566b66ad50b183683e5a4a9a126b36a7c6a08 Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Thu, 20 Feb 2025 14:57:06 +0000 Subject: [PATCH 03/21] WIP --- text/0162-refactoring-support.md | 168 ++++++++++++++++++++++++++++++- 1 file changed, 163 insertions(+), 5 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index 5971ea315..0bb6250ff 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -186,6 +186,15 @@ from stack refactoring support: - Renaming a stack. - Renaming a construct, intentionally or by mistake. +### Can the CLI help me resolve ambiguity when refactoring resources? + +Not at the moment. One of the tenets behind this feature is that it should work +in any environment, including CI/CD pipelines, where there is no user to answer +questions. Although we could easily extend this feature to include ambiguity +resolution for the interactive case, it wouldn't transfer well to the +non-interactive case. If you are interested in an in-depth explanation of the +problem and a possible solution, check Appendix A. + ## Internal FAQ ### Why are we doing this? @@ -286,8 +295,157 @@ edge case is unlikely to happen, we are going to address it later. ## Appendix -Feel free to add any number of appendices as you see fit. Appendices are -expected to allow readers to dive deeper to certain sections if they like. For -example, you can include an appendix which describes the detailed design of an -algorithm and reference it from the FAQ. - +### A. Ambiguity + +The only safe way to resolve ambiguity in cases such as renaming multiple +identical resources, is to ask the developer what their intent is. But what if +the developer is not present to answer questions (in a CI/CD pipeline, for +instance)? A necessary condition in this case is that the developer's intent has +been captured earlier, encoded as a mapping between resource locations, and +stored somewhere. + +But this is not sufficient. Note that every mapping is created from a pair of +source and target states, out of which the ambiguities arose. To be able to +safely carry a mapping over to other environments, two additional conditions +must be met: + +1. The source state on which a mapping is applied must be the same as the source + state where the mapping was captured. +2. The target state used to create the mapping should indeed be what the user + wants as a result. + +I am using the abstract term "state" here, but how could such a state be +instantiated in practice? Let's consider some options and see how they can fail +to satisfy the conditions above. + +First, we need to establish a point when the mapping is created (and the +developer is involved to resolve possible ambiguities). Let's call this the +"decision point". As a first attempt, let's try to use every deployment in the +development cycle as the decision point. In this solution, the development +account is the source state, and the cloud assembly to be deployed is the target +state. If any ambiguities were resolved, they are saved in a mapping file, under +version control. On every deployment, to other environments, the mapping file is +used to perform the refactoring. + +This sounds like it could work, but if the development environment is not in the +same state as the one where the mapping is applied, condition 1 is violated. And +if the developer fails, for whatever reason, to run a deployment against their +environment before commiting a change to the version control system (which I +will henceforth assume is Git), condition 2 is violated. + +Since we are talking about Git, what about using each commit as a decision +point? In this case, the source and target states would come from the +synthesized cloud assemblies in the previous and current commit, respectively. +We still have a mapping file, containing ambiguity resolutions, which are added +to the commit, using a Git hook. For this solution to work, we need an +additional constraint: that every revision produces a valid cloud assembly, +which can also be enforced with a Git hook. + +Let's evaluate this solution in terms of the two conditions above. Because the +developer doesn't have a choice anymore of which target state to use (or target +state, for that matter), condition 2 is satisfied. But remember that the scope +of the mapping file is the difference between two consecutive revisions. If the +developer's local branch is multiple commits ahead of the revision that was +deployed to production, the source state in production is not the same as the +one in the mapping file, violating condition 1. + +#### Making history + +An improvement we can make is to implement an event sourcing system. Instead of +storing a single mapping between two states, we store the whole history of the +stacks. A **history** is a chain of **events** in chronological order. An event +is a set of operations (create, update, delete, and refactor) on a set of +stacks. + +The decision point remains the same, but now we append a new event to a version +controlled history file on every commit. This event includes all creates, +updates and deletes, plus all refactors, whether they were automatically +detected or manually resolved. + +As with any event sourcing system, if we want to produce a snapshot of the +stacks at a given point in time, all we need to do is replay the events in +order, up to that point. We are now ready to state the key invariant of this +system: + +> **Invariant**: For every revision `r`, the cloud assembly synthesized from +> `r` is equal to the snapshot at `r`. + +In other words, the current state should be consistent with the history that led +up to that state. + +One final piece to add to the system: every environment should also have its own +history file, which should also maintain a similar invariant (through CFN hooks, +for example). Having all this in place, we can execute the following algorithm +on every deployment: + + --------------------------------- + Key: + H(E): environment history + H(A): application history + LCA: lowest common ancestor + --------------------------------- + + if H(E) is a prefix of H(A): + Compute the diff between H(A) and H(E); + Extract the mapping from the diff; + Apply the mapping to the stacks in the environment; + Deploy; + else: + Compute the diff from the LCA of H(A) and H(E); + Extract the mapping from the diff; + if the mapping is empty or the override flag is on: + Deploy; + else: + Error: source state doesn't match the mapping. + +For example, suppose the histories at play are (`*` denotes the current state): + + H(E) = e1 ◄── e2* + H(A) = e1 ◄── e2 ◄── e3 ◄── e4 + +Then the diff between them is `e3 ◄── e4`. If these events contain any refactor, +we just apply them, and then deploy the application. The resulting environment +history is the merge of the two: + + H(E) = e1 ◄── e2 ◄── e3 ◄── e4* + +Now suppose the histories involved are: + + H(E) = e1 ◄── e2 ◄── e3* + H(A) = e1 ◄── e2 ◄── e4 ◄── e5 + +In this case, `H(E)` is not a prefix of `H(A)`, but they share a common +ancestor. Their LCA is `e2`. Computing the diff from there we get `e4 ◄── e5`. +If there are no refactors to apply from this diff, we can go ahead and deploy +the application. Again, the new state results from the merge of `H(E)` and +`H(A)`: + + H(E) = e1 ◄── e2 ◄── e3 + ▲ + │ + └──── e4 ◄── e5* + +By default, if there are refactors to be done, this is considered an error, +because we can't guarantee that the refactor makes sense (let alone that this +was the developer's intent). But the application developer can decide to accept +the risk of replacement beforehand, by setting an override flag, in which case +we go ahead and deploy, but skip the refactoring step. The resulting history of +the environment is the same as in the diagram above. + +#### The future + +Once we have a system like this in place, we can expand the scope to which the +automatic refactoring applies. Consider the case in which you want to rename a +certain resource and, at the same time, make some minor changes, such as +adding or updating a couple of properties. This is another ambiguous case, +because it's not clear what the intent is: update with rename, or replacement? +But with the history system, we can detect such cases, interact with the +developer, and store the decision in the history. + +Since this historical model contains all the information about the state of the +stacks in an environment, it could also be used for other purposes. For example, +development tools could use the history to provide a "time machine" feature, +that allows developers to see the state of the infrastructure at any point in +time. CloudFormation itself could build on that, and provide a way to roll back +or forward to another state. Potentially, this could also help with drift +resolution (or prevention). \ No newline at end of file From d9138d6daf23e3f2408fe76e72f58a2621bf95fd Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Thu, 20 Feb 2025 17:16:44 +0000 Subject: [PATCH 04/21] WIP --- text/0162-refactoring-support.md | 130 ++++++++++++++++--------------- 1 file changed, 67 insertions(+), 63 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index 0bb6250ff..4d63724eb 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -16,16 +16,16 @@ CloudFormation will create a new resource with the new logical ID and possibly delete the old one. In any case, for stateful resources, this may cause interruption of service or data loss, or both. -Historically, the advice has been to avoid changing logical IDs of resources. -However, this is not always possible, or goes against good engineering -practices. For example, you may have duplicated code across different CDK -applications, and you want to consolidate it into a single reusable construct -(usually referred to as an L3 construct). The very introduction of a new node -for the L3 construct in the construct tree will lead to the renaming of the -logical IDs of the resources in that subtree. You may also need to move -resources around in the tree to make it more readable, or even between stacks to -better isolate concerns. Not to mention accidental renames, which have also -impacted customers in the past. +Historically, the advice for developers has been to avoid changing logical IDs +of resources. However, this is not always possible, or goes against good +engineering practices. For example, you may have duplicated code across +different CDK applications, and you want to consolidate it into a single +reusable construct (usually referred to as an L3 construct). The very +introduction of a new node for the L3 construct in the construct tree will lead +to the renaming of the logical IDs of the resources in that subtree. You may +also need to move resources around in the tree to make it more readable, or even +between stacks to better isolate concerns. Not to mention accidental renames, +which have also impacted customers in the past. To address all these problems, the CDK CLI now automatically detects these cases, and refactors the stack on your behalf, using the new CloudFormation @@ -41,7 +41,7 @@ then proceed with the deployment. For example, suppose your CDK application has a single stack, called `MyStack`, containing an S3 bucket, a CloudFront distribution and a Lambda function. The -construct tree (L1 resources omitted for brevity) looks like this: +construct tree (L1 constructs omitted for brevity) looks like this: App └ MyStack @@ -49,8 +49,8 @@ construct tree (L1 resources omitted for brevity) looks like this: ├ Distribution └ Function -Now suppose you want to make the following changes, after having deployed it to -your AWS account: +Now suppose you make the following changes, after having deployed it to your AWS +account: - Rename the bucket from `Bucket` to the more descriptive name `Origin`. - Create a new L3 construct called `Website` that groups the bucket and the @@ -176,15 +176,14 @@ cases automatically, and refactors the stack before the actual deployment. ### Why should I use this feature? -If you ever find yourself needing to do one of the following, you will benefit -from stack refactoring support: +If you ever find yourself doing one of the following, you will benefit from +stack refactoring support: -- Moving constructs between different stacks, either in the same or to a - different CDK application. -- Moving constructs within the same stack. This could be just for organization - purposes, or to create reusable components. -- Renaming a stack. -- Renaming a construct, intentionally or by mistake. +- Renaming constructs, either intentionally or by mistake. +- Moving constructs within the same stack. This could be just for better + organization, or to create reusable components. +- Moving constructs between different stacks. +- Renaming stacks. ### Can the CLI help me resolve ambiguity when refactoring resources? @@ -201,11 +200,11 @@ problem and a possible solution, check Appendix A. This feature was initially requested back in May 2020. It is one of the top 5 most up-voted RFCs, with 94 thumbs-up. Despite being almost 5 years old, it has -continued to be highly discussed and debated for almost 5 years by CDK customers -and the community. The solution we currently provide, the `renameLogicalId` -method, is perceived by customers as a workaround, adding unnecessary cognitive -load for developers. Code refactoring is a fundamental job-to-be-done that -developers expect to be supported natively by the tool. +continued to be highly discussed and debated by CDK customers and the community. +The solution we currently provide, the `renameLogicalId` method, is perceived by +customers as a workaround, adding unnecessary cognitive load for developers. +Code refactoring is a fundamental job-to-be-done that developers expect to be +supported natively by the tool. In addition to this, the recent launch of CloudFormation's stack refactoring API made it possible to support refactoring on the service side. We are building on @@ -213,8 +212,12 @@ top of that API to bring a seamless experience to CDK users. ### Why should we _not_ do this? -> Is there a way to address this use case with the current product? What are the -> downsides of implementing this feature? +The main attraction of this feature is also, in a way, its greatest deterrent: +that the refactoring happens automatically, and that the CLI makes decisions +that the user may not even be aware they need to make (accidental renames, for +example). This may cause anxiety for some users, who might not understand what +exactly happened after a successful deployment that had refactoring involved. We +can mitigate this risk with good documentation and careful interaction design. ### What is the technical solution (design) of this feature? @@ -255,8 +258,11 @@ No. ### What alternative solutions did you consider? -> Briefly describe alternative approaches that you considered. If there are -> hairy details, include them in an appendix. +The most straightforward alternative is to implement a wrapper around the +CloudFormation API, and have the user provide all the parameters: which +resources to move from which stack to which stack. But the CDK CLI can provide a +better experience by automatically detecting these cases, and interacting with +the user when necessary. ### What are the drawbacks of this solution? @@ -272,9 +278,9 @@ This improved deployment experience actually consists of two separate steps, behind the scenes: refactoring followed by deployment. And the whole workflow is controlled by the CLI. As a result, this is not an atomic operation: it is possible that the refactoring step succeeds, but before the CLI has a chance to -deploy, it gets interrupted, for whatever reason (computer crash, network -failures, etc.) In this case, the user will be left with a stack that is neither -in the original state nor in the desired state. +deploy, it gets interrupted (computer crash, network failures, etc.) In this +case, the user will be left with a stack that is neither in the original state +nor in the desired state. In particular, the logical ID won't match the CDK construct path, stored in the resource's metadata. This has consequences for the CloudFormation console, which @@ -295,7 +301,7 @@ edge case is unlikely to happen, we are going to address it later. ## Appendix -### A. Ambiguity +### A. Ambiguity resolution The only safe way to resolve ambiguity in cases such as renaming multiple identical resources, is to ask the developer what their intent is. But what if @@ -315,8 +321,8 @@ must be met: wants as a result. I am using the abstract term "state" here, but how could such a state be -instantiated in practice? Let's consider some options and see how they can fail -to satisfy the conditions above. +instantiated in practice? Let's consider some options and see how they fail to +satisfy the conditions above. First, we need to establish a point when the mapping is created (and the developer is involved to resolve possible ambiguities). Let's call this the @@ -324,25 +330,25 @@ developer is involved to resolve possible ambiguities). Let's call this the development cycle as the decision point. In this solution, the development account is the source state, and the cloud assembly to be deployed is the target state. If any ambiguities were resolved, they are saved in a mapping file, under -version control. On every deployment, to other environments, the mapping file is +version control. On every deployment to other environments, the mapping file is used to perform the refactoring. -This sounds like it could work, but if the development environment is not in the +It sounds like this could work, but if the development environment is not in the same state as the one where the mapping is applied, condition 1 is violated. And if the developer fails, for whatever reason, to run a deployment against their -environment before commiting a change to the version control system (which I -will henceforth assume is Git), condition 2 is violated. +environment before commiting an ambiguous change to the version control system +(which I will henceforth assume is Git), condition 2 is violated. -Since we are talking about Git, what about using each commit as a decision -point? In this case, the source and target states would come from the -synthesized cloud assemblies in the previous and current commit, respectively. +Since we are talking about Git, what about using each commit operation as a +decision point? In this case, the source and target states would come from the +synthesized cloud assemblies in the previous and current revision, respectively. We still have a mapping file, containing ambiguity resolutions, which are added to the commit, using a Git hook. For this solution to work, we need an -additional constraint: that every revision produces a valid cloud assembly, -which can also be enforced with a Git hook. +additional constraint, which can also be enforced with a Git hook: that every +revision produces a valid cloud assembly. Let's evaluate this solution in terms of the two conditions above. Because the -developer doesn't have a choice anymore of which target state to use (or target +developer doesn't have a choice anymore of which target state to use (or source state, for that matter), condition 2 is satisfied. But remember that the scope of the mapping file is the difference between two consecutive revisions. If the developer's local branch is multiple commits ahead of the revision that was @@ -351,11 +357,11 @@ one in the mapping file, violating condition 1. #### Making history -An improvement we can make is to implement an event sourcing system. Instead of -storing a single mapping between two states, we store the whole history of the -stacks. A **history** is a chain of **events** in chronological order. An event -is a set of operations (create, update, delete, and refactor) on a set of -stacks. +An improvement we can make is to turn this into an event sourcing system. +Instead of storing a single mapping between two states, we store the whole +history of the stacks. A **history** is a chain of events in chronological +order. An **event** is a set of operations (create, update, delete, and +refactor) on a set of stacks. The decision point remains the same, but now we append a new event to a version controlled history file on every commit. This event includes all creates, @@ -391,9 +397,10 @@ on every deployment: Apply the mapping to the stacks in the environment; Deploy; else: - Compute the diff from the LCA of H(A) and H(E); + a = LCA of H(A) and H(E); + Compute the sub-chain of H(A) from a to the end; Extract the mapping from the diff; - if the mapping is empty or the override flag is on: + if the mapping is empty: Deploy; else: Error: source state doesn't match the mapping. @@ -414,10 +421,10 @@ Now suppose the histories involved are: H(E) = e1 ◄── e2 ◄── e3* H(A) = e1 ◄── e2 ◄── e4 ◄── e5 -In this case, `H(E)` is not a prefix of `H(A)`, but they share a common -ancestor. Their LCA is `e2`. Computing the diff from there we get `e4 ◄── e5`. -If there are no refactors to apply from this diff, we can go ahead and deploy -the application. Again, the new state results from the merge of `H(E)` and +In this case, `H(E)` is not a prefix of `H(A)`, but they have common ancestors. +Their LCA is `e2`. Computing the sub-chain from there we get `e4 ◄── e5`. If +there are no refactors to apply from this diff, we can go ahead and deploy the +application. Again, the new state results from the merge of `H(E)` and `H(A)`: H(E) = e1 ◄── e2 ◄── e3 @@ -425,12 +432,9 @@ the application. Again, the new state results from the merge of `H(E)` and │ └──── e4 ◄── e5* -By default, if there are refactors to be done, this is considered an error, -because we can't guarantee that the refactor makes sense (let alone that this -was the developer's intent). But the application developer can decide to accept -the risk of replacement beforehand, by setting an override flag, in which case -we go ahead and deploy, but skip the refactoring step. The resulting history of -the environment is the same as in the diagram above. +If there are refactors to be done, this is considered an error, because we can't +guarantee that the refactor makes sense (let alone that this was the developer's +intent). #### The future From 98b6c68a5d2bf4d6d458e82c8f24add0cd3cbae5 Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Fri, 21 Feb 2025 09:55:22 +0000 Subject: [PATCH 05/21] WIP --- text/0162-refactoring-support.md | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index 4d63724eb..6581e512e 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -301,7 +301,7 @@ edge case is unlikely to happen, we are going to address it later. ## Appendix -### A. Ambiguity resolution +### A. Ideas on ambiguity resolution The only safe way to resolve ambiguity in cases such as renaming multiple identical resources, is to ask the developer what their intent is. But what if @@ -438,13 +438,14 @@ intent). #### The future -Once we have a system like this in place, we can expand the scope to which the -automatic refactoring applies. Consider the case in which you want to rename a -certain resource and, at the same time, make some minor changes, such as -adding or updating a couple of properties. This is another ambiguous case, -because it's not clear what the intent is: update with rename, or replacement? -But with the history system, we can detect such cases, interact with the -developer, and store the decision in the history. +There is still some work to be done to prove this system works in practice. But +assuming it does, we could use it to expand the scope to which the automatic +refactoring applies. Consider the case in which you want to rename a certain +resource and, at the same time, make some minor changes, such as adding or +updating a couple of properties. This is another ambiguous case, because it's +not clear what the intent is: update with rename, or replacement? But with the +history system, we can detect such cases, interact with the developer, and store +the decision in the history. Since this historical model contains all the information about the state of the stacks in an environment, it could also be used for other purposes. For example, From c076e6107160ee02583329cb728879b03e53fa10 Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Fri, 21 Feb 2025 13:22:21 +0000 Subject: [PATCH 06/21] WIP --- text/0162-refactoring-support.md | 49 ++++++++++++++++++++++++++++++-- 1 file changed, 47 insertions(+), 2 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index 6581e512e..1de706712 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -154,6 +154,25 @@ configure this setting in the `cdk.json` file: } ``` +If you want to execute only the automatic refactoring, without deploying new +resources or removing existing ones, you can use the `cdk refactor` command. + +### Trigger + +This feature will also be available as a [CDK Trigger], which you can enable by +passing the flag `autoRefactor` to your application: + +```typescript +const app = new App({ + autoRefactor: true, + ... +}); +``` + +If this is enabled, the CLI will skip the refactoring step during deployment, +and a Lambda function will execute it instead, as part of the provisioning +process by CloudFormation. + --- Ticking the box below indicates that the public API of this RFC has been @@ -194,6 +213,18 @@ resolution for the interactive case, it wouldn't transfer well to the non-interactive case. If you are interested in an in-depth explanation of the problem and a possible solution, check Appendix A. +### What if I can't use the CDK CLI in my pipeline? + +Some customers have their own mechanisms for deploying stacks to AWS, that don't +use the CDK CLI. If that is your case, there is still a way you can use this +feature. The refactoring logic will also be released as a Node.js library and a +standalone CLI tool (names TBD). If you can incorporate any of these into your +deployment tooling, you will have the same functionality as the `refactor` +command in the main CDK CLI. This command uses assumes a specific role, with a +narrow set of permissions. + +Alternatively, you can use this feature via a [trigger](#trigger). + ## Internal FAQ ### Why are we doing this? @@ -264,13 +295,21 @@ resources to move from which stack to which stack. But the CDK CLI can provide a better experience by automatically detecting these cases, and interacting with the user when necessary. +Another alternative is to use aliases, following [Pulumi's model] +[pulumi-aliases], for example. This feature would be similar to the +`renameLogicalId` function, but operating on a higher level of abstraction, by +taking into account the construct tree and construct IDs. But, just like +`renameLogicalId`, it could be perceived as a workaround, that adds scar tissue +to the code every time refactoring is done. However, we could revisit this +decision if enough customers indicate their preference for it in this RFC. + ### What are the drawbacks of this solution? See the open issues section below. ### What is the high-level project plan? -See https://github.com/orgs/aws/projects/272. +See the [project board]. ### Are there any open issues that need to be addressed later? @@ -453,4 +492,10 @@ development tools could use the history to provide a "time machine" feature, that allows developers to see the state of the infrastructure at any point in time. CloudFormation itself could build on that, and provide a way to roll back or forward to another state. Potentially, this could also help with drift -resolution (or prevention). \ No newline at end of file +resolution (or prevention). + +[pulumi-aliases]: https://www.pulumi.com/docs/iac/concepts/options/aliases/ + +[project board]: https://github.com/orgs/aws/projects/272 + +[CDK Trigger]: https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.triggers-readme.html \ No newline at end of file From c053d18f06161f502d25664198ed70de0e0aeea1 Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Mon, 24 Feb 2025 12:38:09 +0000 Subject: [PATCH 07/21] WIP --- text/0162-refactoring-support.md | 184 +++++++++++++++++++------------ 1 file changed, 116 insertions(+), 68 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index 1de706712..d5641177a 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -5,27 +5,27 @@ - **API Bar Raiser**: @{BAR_RAISER_USER} An improvement to the CDK CLI `deploy` command, to protect against replacement -of resources that had their logical IDs modified. The CLI now detects these -cases automatically, and refactors the stack before the actual deployment. +of resources when they change location. The CLI now detects this case, and +automatically refactors the stack before the actual deployment. ## Working Backwards AWS CloudFormation identifies resources by their logical ID. As a consequence, if you change the logical ID of a resource after it has been deployed, CloudFormation will create a new resource with the new logical ID and possibly -delete the old one. In any case, for stateful resources, this may cause -interruption of service or data loss, or both. - -Historically, the advice for developers has been to avoid changing logical IDs -of resources. However, this is not always possible, or goes against good -engineering practices. For example, you may have duplicated code across -different CDK applications, and you want to consolidate it into a single -reusable construct (usually referred to as an L3 construct). The very -introduction of a new node for the L3 construct in the construct tree will lead -to the renaming of the logical IDs of the resources in that subtree. You may -also need to move resources around in the tree to make it more readable, or even -between stacks to better isolate concerns. Not to mention accidental renames, -which have also impacted customers in the past. +delete the old one. For stateful resources, this may cause interruption of +service or data loss, or both. + +Historically, our advice for developers has been to avoid changing logical IDs +of resources. In practice, however, this is not always possible, or goes against +good engineering practices. For example, you may have duplicated code across +different CDK applications which you want to consolidate into a single reusable +construct (usually referred to as an L3 construct). The very introduction of a +new node for the L3 construct in the construct tree will lead to the renaming of +the logical IDs of the resources in that subtree. You may also need to move +resources around in the tree to make it more readable, or even between stacks to +better isolate concerns. Not to mention accidental renames, which have also +impacted customers in the past. To address all these problems, the CDK CLI now automatically detects these cases, and refactors the stack on your behalf, using the new CloudFormation @@ -55,12 +55,12 @@ account: - Rename the bucket from `Bucket` to the more descriptive name `Origin`. - Create a new L3 construct called `Website` that groups the bucket and the distribution, to make this pattern reusable in different applications. -- Move the `Website` construct to a new stack called `Web`, for better - separation of concerns. +- Move the web-related constructs (now under the `Website` L3) construct to a + new stack called `Web`, for better separation of concerns. - Rename the original stack to `Service`, to better reflect its new specific role in the application. -The construct tree now looks like this: +The refactored construct tree looks like this: App ├ Web @@ -76,14 +76,32 @@ CDK computes the logical IDs of the resources based on their path in the tree, all three resources will have different logical IDs in the synthesized template, compared to what is already deployed. -If you run `cdk deploy` now, by default the CLI will detect this change and -automatically refactor the stacks. You will see the following output: +If you run `cdk deploy` now, by default the CLI will detect these changes and +present you with a selection prompt: + + The following resources were moved or renamed: + + ┌───────────────────────────────┬───────────────────────────────┬──────────────────────────┐ + │ Resource Type │ Old Logical ID │ New Logical ID │ + ├───────────────────────────────┼───────────────────────────────┼──────────────────────────┤ + │ AWS::S3::Bucket │ MyStack.Queue1A4198146 │ Web.Bucket843D52FF │ + ├───────────────────────────────┼───────────────────────────────┼──────────────────────────┤ + │ AWS::CloudFront::Distribution │ MyStack.DistributionE3BB089E │ Web.Distribution7142E1F1 │ + ├───────────────────────────────┼───────────────────────────────┼──────────────────────────┤ + │ AWS::Lambda::Function │ MyStack.FunctionA5EA2BD8 │ Service.Function8F0BB69B │ + └───────────────────────────────┴───────────────────────────────┴──────────────────────────┘ + + ? What do you want to do? (Use arrow keys) + ❯ Execute the refactor and deploy + Deploy without refactoring (will cause resource replacement) + Quit + +If you choose to refactor and deploy, the CLI will show the progress as the +refactor is executed: Refactoring... Creating stack refactor... - Prg | Time | Status | Resource Type | Old Logical ID | New Logical ID - ----|------------|----------------------|-------------------------------|------------------------------|----------------------------- 0/3 | 2:03:17 PM | REFACTOR_IN_PROGRESS | AWS::S3::Bucket | MyStack.Bucket5766466B | Web.Bucket843D52FF 0/3 | 2:03:17 PM | REFACTOR_IN_PROGRESS | AWS::CloudFront::Distribution | MyStack.DistributionE3BB089E | Web.Distribution7142E1F1 1/3 | 2:03:18 PM | REFACTOR_COMPLETE | AWS::S3::Bucket | MyStack.Bucket5766466B | Web.Bucket843D52FF @@ -93,22 +111,24 @@ automatically refactor the stacks. You will see the following output: ✅ Stack refactor complete -In case you do want to replace the resources, you can override this default -behavior and skip the refactoring, by passing the `--skip-refactoring` flag to -the CLI, or by configuring this setting in the `cdk.json` file: +You can configure the refactoring behavior by passing the `--refactoring` flag +to the CLI, or by configuring this setting in the `cdk.json` file: ```json { "app": "...", - "skipRefactoring": true + "refactoring": "EXECUTE_AND_DEPLOY" } ``` ### Ambiguity -In some cases, the CLI may not be able to automatically refactor the stack. To -understand this, consider the following example, where there are two _identical_ -resources, called `Queue1` and `Queue2`, in the same stack: +In the unlikely event that there are two or more _identical_ resources (that is, +resources with the same properties, except for the `Metadata` field), and you +rename or move two of them at the same time, the CLI will not be able to +automatically determine which resource should be replaced by which. For example, +suppose you have two identical queues, named `Queue1` and `Queue2`, in the same +stack: App └ Stack @@ -125,7 +145,7 @@ If they get renamed to, let's say, `Queue3` and `Queue4`, Then the CLI will not be able to establish a 1:1 mapping between the old and new names. In this case, it will ask you to confirm the changes: - Resource Name Changes + Ambiguous Resource Name Changes ┌───┬──────────────────────┐ │ │ Resource │ ├───┼──────────────────────┤ @@ -144,13 +164,14 @@ names. In this case, it will ask you to confirm the changes: Do you wish to deploy these changes (y/n)? -To skip this prompt, pass the `--ignore-ambiguous-renaming` flag to the CLI, or -configure this setting in the `cdk.json` file: +To skip this prompt and go straight to deployment, pass the +`--ignore-ambiguous-refactoring` flag to the CLI, or configure this setting in +the `cdk.json` file: ```json { "app": "...", - "ignoreAmbiguousRenaming": true + "ignoreAmbiguousRefactoring": true } ``` @@ -189,9 +210,10 @@ the RFC pull request): ### What are we launching today? -An improvement to the CDK CLI `deploy` command, to protect against replacement -of resources that had their logical IDs modified. The CLI now detects these -cases automatically, and refactors the stack before the actual deployment. +A new developer experience for CDK users, that allows them to change the +location of a construct (stack plus logical ID) without causing resource +replacement. This new experience is available in the CDK CLI `deploy` +and `refactor` commands, as well as a CDK trigger. ### Why should I use this feature? @@ -206,12 +228,12 @@ stack refactoring support: ### Can the CLI help me resolve ambiguity when refactoring resources? -Not at the moment. One of the tenets behind this feature is that it should work -in any environment, including CI/CD pipelines, where there is no user to answer -questions. Although we could easily extend this feature to include ambiguity -resolution for the interactive case, it wouldn't transfer well to the -non-interactive case. If you are interested in an in-depth explanation of the -problem and a possible solution, check Appendix A. +Not at the moment. One of the constraints we imposed on this feature is that it +should work in any environment, including CI/CD pipelines, where there is no +user to answer questions. Although we could easily extend this feature to +include ambiguity resolution for the interactive case, it wouldn't transfer well +to the non-interactive case. If you are interested in an in-depth explanation of +the problem and a possible solution, check Appendix A. ### What if I can't use the CDK CLI in my pipeline? @@ -225,12 +247,19 @@ narrow set of permissions. Alternatively, you can use this feature via a [trigger](#trigger). +### What if the deployment fails? + +After refactoring the stack, the CLI will proceed with the deployment +(assuming that is your choice). If the deployment fails, and CloudFormation +rolls it back, the CLI will execute a second refactor, in reverse, to bring the +resources back to their original locations. + ## Internal FAQ ### Why are we doing this? -This feature was initially requested back in May 2020. It is one of the top 5 -most up-voted RFCs, with 94 thumbs-up. Despite being almost 5 years old, it has +This feature was initially requested in May 2020. It is one of the top 5 most +up-voted RFCs, with 94 thumbs-up. Despite being almost 5 years old, it has continued to be highly discussed and debated by CDK customers and the community. The solution we currently provide, the `renameLogicalId` method, is perceived by customers as a workaround, adding unnecessary cognitive load for developers. @@ -247,31 +276,31 @@ The main attraction of this feature is also, in a way, its greatest deterrent: that the refactoring happens automatically, and that the CLI makes decisions that the user may not even be aware they need to make (accidental renames, for example). This may cause anxiety for some users, who might not understand what -exactly happened after a successful deployment that had refactoring involved. We -can mitigate this risk with good documentation and careful interaction design. +exactly is happening, and what the consequences are. We can mitigate this risk +with good documentation and careful interaction design. ### What is the technical solution (design) of this feature? High level description of the algorithm: -First, it lists all the stacks: both local and deployed. Then it builds an index -of all resources from all stacks. This index maps the _content_ address -(physical ID or digest) of each resource to all the _location_ addresses (stack -name + logical ID) they can be found in. Resources that have different locations -in new stacks compared to the old ones are considered to have been moved. For -each of those, it creates a mapping from the old location to the new one. +First, list all the stacks: both local and deployed. Then build an index of all +resources from all stacks. This index maps the _content_ address (physical ID or +digest) of each resource to all the _location_ addresses (stack name + logical +ID) they can be found in. Resources that have different locations in new stacks +compared to the old ones are considered to have been moved. For each of those, +it creates a mapping from the old location to the new one. Since the CloudFormation API expects not only the mappings, but also the templates in their final states, we need to compute those as well. This is done by applying all the mappings locally, essentially emulating what CloudFormation will eventually do. For example, if a mapping says that a resource has moved -from stack A with name Foo, to stack B with name Bar, we will remove Foo from -the template for stack A, and add a new resource called Bar to the template for -stack B. +from stack `A` with name `Foo`, to stack `B` with name `Bar`, we will remove +`Foo` from the template for stack `A`, and add a new resource called `Bar` to +the template for stack `B`. At this point, if there is any ambiguity (more than one source or more than one -destination for a given mapping), it stops and asks the user whether they want -to proceed. +destination for a given mapping), stop and ask the user whether they want to +proceed. Assuming there were no ambiguities, or the user wants to proceed anyway, it is ready to call the API to actually perform the refactor, using the mappings and @@ -289,19 +318,29 @@ No. ### What alternative solutions did you consider? -The most straightforward alternative is to implement a wrapper around the +The most straightforward alternative is to implement a simple wrapper around the CloudFormation API, and have the user provide all the parameters: which resources to move from which stack to which stack. But the CDK CLI can provide a better experience by automatically detecting these cases, and interacting with the user when necessary. -Another alternative is to use aliases, following [Pulumi's model] -[pulumi-aliases], for example. This feature would be similar to the +Another alternative is to use aliases, using [Pulumi's model] +[pulumi-aliases] as inspiration. This feature would be similar to the `renameLogicalId` function, but operating on a higher level of abstraction, by taking into account the construct tree and construct IDs. But, just like `renameLogicalId`, it could be perceived as a workaround, that adds scar tissue -to the code every time refactoring is done. However, we could revisit this -decision if enough customers indicate their preference for it in this RFC. +to the code at every refactor. However, we could revisit this decision if enough +customers indicate their preference for it in this RFC. + +A possible variation of the solution presented in this RFC is to do something +similar to resource lookup: for every environment where the application could be +deployed to, the CLI would have configured in a file what refactors have to be +done. The entries to this file could be automatically generated by some CLI +command, run individually in each environment. But this solution requires a lot +more work and coordination among the parties involved (developers, +administrators, security engineers, etc.), and is more error-prone: failure to +record a refactor in the file could lead to inconsistencies between the +environments, and even unintended resource replacements. ### What are the drawbacks of this solution? @@ -317,15 +356,15 @@ This improved deployment experience actually consists of two separate steps, behind the scenes: refactoring followed by deployment. And the whole workflow is controlled by the CLI. As a result, this is not an atomic operation: it is possible that the refactoring step succeeds, but before the CLI has a chance to -deploy, it gets interrupted (computer crash, network failures, etc.) In this -case, the user will be left with a stack that is neither in the original state -nor in the desired state. +deploy the changes, it gets interrupted (computer crash, network failures, etc.) +In this case, the user will be left with a stack that is neither in the original +state nor in the desired state. In particular, the logical ID won't match the CDK construct path, stored in the resource's metadata. This has consequences for the CloudFormation console, which will show a Tree view that is not consistent with the Flat view. -Some possible solutions to consider, from more specific to more general: +Possible solutions to consider, from more specific to more general: - CloudFormation to ignore changes in the `Metadata[aws:cdk:path]` resource attribute in refactor operations. @@ -491,8 +530,17 @@ stacks in an environment, it could also be used for other purposes. For example, development tools could use the history to provide a "time machine" feature, that allows developers to see the state of the infrastructure at any point in time. CloudFormation itself could build on that, and provide a way to roll back -or forward to another state. Potentially, this could also help with drift -resolution (or prevention). +or forward to another state. + +Another problem that could be solved with the historical model is the infamous +"deadly embrace", where a consumer stack depends on a producer stack via +CloudFormation Exports, and you want to remove the use from the consumer. At the +moment, customers have to use the `stack.exportValue(...)` method, and do two +deployments. The history would give the CLI all the information it needs to do +this without user intervention. + +Potentially, this could also help with drift resolution (or prevention), if +CloudFormation itself starts using the history internally. [pulumi-aliases]: https://www.pulumi.com/docs/iac/concepts/options/aliases/ From 96ca84de48f5677cd189dc4585ca786901e3de00 Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Mon, 24 Feb 2025 15:40:02 +0000 Subject: [PATCH 08/21] bar raiser --- text/0162-refactoring-support.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index d5641177a..4e4594041 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -2,7 +2,7 @@ - **Original Author(s):**: @otaviomacedo - **Tracking Issue**: #162 -- **API Bar Raiser**: @{BAR_RAISER_USER} +- **API Bar Raiser**: @rix0rrr An improvement to the CDK CLI `deploy` command, to protect against replacement of resources when they change location. The CLI now detects this case, and @@ -202,7 +202,7 @@ the RFC pull request): ``` -[ ] Signed-off by API Bar Raiser @xxxxx +[ ] Signed-off by API Bar Raiser @rix0rrr ``` From 06be7ae5fae74be03277a666bec50ad56e748dab Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Tue, 25 Feb 2025 11:23:52 +0000 Subject: [PATCH 09/21] Addressed feedback --- text/0162-refactoring-support.md | 87 ++++++++++++++++++-------------- 1 file changed, 48 insertions(+), 39 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index 4e4594041..5dfa63d4b 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -4,9 +4,9 @@ - **Tracking Issue**: #162 - **API Bar Raiser**: @rix0rrr -An improvement to the CDK CLI `deploy` command, to protect against replacement -of resources when they change location. The CLI now detects this case, and -automatically refactors the stack before the actual deployment. +An improvement to the CDK CLI and toolkit library, to protect against +replacement of resources when they change location. The CLI now detects this +case, and automatically refactors the stack before the actual deployment. ## Working Backwards @@ -123,12 +123,11 @@ to the CLI, or by configuring this setting in the `cdk.json` file: ### Ambiguity -In the unlikely event that there are two or more _identical_ resources (that is, -resources with the same properties, except for the `Metadata` field), and you -rename or move two of them at the same time, the CLI will not be able to -automatically determine which resource should be replaced by which. For example, -suppose you have two identical queues, named `Queue1` and `Queue2`, in the same -stack: +In the unlikely event that there are two or more _equivalent_ resources +(see Appendix B) in the same template, and you rename or move them at the same +time, the CLI will not be able to automatically determine which resource should +be replaced by which. For example, suppose you have two identical queues, named +`Queue1` and `Queue2`, in the same stack: App └ Stack @@ -175,24 +174,11 @@ the `cdk.json` file: } ``` -If you want to execute only the automatic refactoring, without deploying new -resources or removing existing ones, you can use the `cdk refactor` command. - -### Trigger - -This feature will also be available as a [CDK Trigger], which you can enable by -passing the flag `autoRefactor` to your application: - -```typescript -const app = new App({ - autoRefactor: true, - ... -}); -``` - -If this is enabled, the CLI will skip the refactoring step during deployment, -and a Lambda function will execute it instead, as part of the provisioning -process by CloudFormation. +If you want to execute only the automatic refactoring, use the `cdk +refactor` command. The behavior is basically the same as with `cdk deploy`: it +will detect whether there are refactors to be made, ask for confirmation if +necessary (depending on the flag values), and refactor the stacks involved. But +it will stop there and not proceed with the deployment. --- @@ -213,7 +199,7 @@ the RFC pull request): A new developer experience for CDK users, that allows them to change the location of a construct (stack plus logical ID) without causing resource replacement. This new experience is available in the CDK CLI `deploy` -and `refactor` commands, as well as a CDK trigger. +and `refactor` commands. ### Why should I use this feature? @@ -239,13 +225,10 @@ the problem and a possible solution, check Appendix A. Some customers have their own mechanisms for deploying stacks to AWS, that don't use the CDK CLI. If that is your case, there is still a way you can use this -feature. The refactoring logic will also be released as a Node.js library and a -standalone CLI tool (names TBD). If you can incorporate any of these into your -deployment tooling, you will have the same functionality as the `refactor` -command in the main CDK CLI. This command uses assumes a specific role, with a -narrow set of permissions. - -Alternatively, you can use this feature via a [trigger](#trigger). +feature: the refactoring logic will also be released in the CDK toolkit library, +that can be used programmatically by your own tools. If you can incorporate it +into your deployment tooling, you will have the same functionality as the +`refactor` command in the CDK CLI. ### What if the deployment fails? @@ -328,9 +311,9 @@ Another alternative is to use aliases, using [Pulumi's model] [pulumi-aliases] as inspiration. This feature would be similar to the `renameLogicalId` function, but operating on a higher level of abstraction, by taking into account the construct tree and construct IDs. But, just like -`renameLogicalId`, it could be perceived as a workaround, that adds scar tissue -to the code at every refactor. However, we could revisit this decision if enough -customers indicate their preference for it in this RFC. +`renameLogicalId`, it could be perceived as a workaround. However, we are open +to revisiting this decision if enough customers indicate their preference for it +in this RFC. A possible variation of the solution presented in this RFC is to do something similar to resource lookup: for every environment where the application could be @@ -542,8 +525,34 @@ this without user intervention. Potentially, this could also help with drift resolution (or prevention), if CloudFormation itself starts using the history internally. +### B. Equivalence between resources + +To detect which resources should be refactored, we need to indentify which +resources have only changed their location, but have remained "the same", in +some sense. This can be made precise by defining an [equivalence relation] on +the set of resources. + +Before that, let's define a digest function, `d`: + + d(resource) = hash(type + physicalId) , if physicalId is defined + = hash(type + properties + dependencies.map(d)) , otherwise + +where `hash` is a cryptographic hash function. In other words, if a resource has +a physical ID, its type and physical ID uniquely identify that resource. So we +compute the hash from these two fields. Otherwise, the hash is computed from its +type, its own properties (that is, excluding properties that refer to other +resources), and the digests of each of its dependencies. + +The digest of a resource, defined recursively this way, remains stable even if +one or more of its dependencies gets renamed. Since the resources in a +CloudFormation template form an acyclic graph, this function is well-defined. + +The equivalence relation then follows directly: two resources `r1` and `r2` +are equivalent if `d(r1) = d(r2)`. + + [pulumi-aliases]: https://www.pulumi.com/docs/iac/concepts/options/aliases/ [project board]: https://github.com/orgs/aws/projects/272 -[CDK Trigger]: https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.triggers-readme.html \ No newline at end of file +[equivalence relation]: https://en.wikipedia.org/wiki/Equivalence_relation \ No newline at end of file From 8495c69f9d5194affd9c618ab0c0074c6280ecef Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Tue, 25 Feb 2025 12:23:28 +0000 Subject: [PATCH 10/21] Project plan --- text/0162-refactoring-support.md | 34 +++++++++++++++++++++++++++++--- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index 5dfa63d4b..374280503 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -331,7 +331,37 @@ See the open issues section below. ### What is the high-level project plan? -See the [project board]. +#### Phase 1 (dry-run) + +In this phase we are going to implement the detection of resource moves, and +show the user what changes are going to be made. The only new command +available at this phase is `cdk refactor --dry-run`. Execution of this +command without the `--dry-run` flag will result in an error. + +High-level tasks: + +- Implement resource equivalence without physical ID. +- Implement the computation of mappings. +- Implement the display of the mappings to the user. +- Implement ambiguity detection. +- Implement the display of the ambiguities to the user. +- Add physical ID to resource equivalence. + +#### Phase 2 (application) + +Once the detection of all cases is implemented in phase 1, we are ready to +implement the application of the changes. + +High-level tasks: + +- Add new permissions to the bootstrap stack. +- Implement the actual refactoring. +- Implement rollback. +- Implement the progress bar to display the refactoring progress. +- Implement feature flags. +- Handle cross-stack references. +- Add the refactor step to the `deploy` command. +- Write a blog post. ### Are there any open issues that need to be addressed later? @@ -553,6 +583,4 @@ are equivalent if `d(r1) = d(r2)`. [pulumi-aliases]: https://www.pulumi.com/docs/iac/concepts/options/aliases/ -[project board]: https://github.com/orgs/aws/projects/272 - [equivalence relation]: https://en.wikipedia.org/wiki/Equivalence_relation \ No newline at end of file From 914bac314b08f2e90cdabc2868c3a99c1f0fa98d Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Thu, 27 Feb 2025 10:40:21 +0000 Subject: [PATCH 11/21] Addressed feedback --- text/0162-refactoring-support.md | 203 ++++++++++++++++--------------- 1 file changed, 106 insertions(+), 97 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index 374280503..6142ea0d6 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -16,16 +16,15 @@ CloudFormation will create a new resource with the new logical ID and possibly delete the old one. For stateful resources, this may cause interruption of service or data loss, or both. -Historically, our advice for developers has been to avoid changing logical IDs -of resources. In practice, however, this is not always possible, or goes against -good engineering practices. For example, you may have duplicated code across -different CDK applications which you want to consolidate into a single reusable -construct (usually referred to as an L3 construct). The very introduction of a -new node for the L3 construct in the construct tree will lead to the renaming of -the logical IDs of the resources in that subtree. You may also need to move -resources around in the tree to make it more readable, or even between stacks to -better isolate concerns. Not to mention accidental renames, which have also -impacted customers in the past. +Historically, we have advised developers to avoid changing logical IDs. However, +this is sometimes impractical or conflicts with good software engineering +practices. For instance, you may want to consolidate duplicated code across +different CDK applications into a single reusable construct (often called an L3 +construct). Introducing a new node for the L3 construct in the construct tree +will rename the logical IDs of the resources in that subtree. Additionally, you +might need to move resources within the tree for better readability or between +stacks to isolate concerns. Accidental renames have also caused issues for +customers in the past. To address all these problems, the CDK CLI now automatically detects these cases, and refactors the stack on your behalf, using the new CloudFormation @@ -41,7 +40,7 @@ then proceed with the deployment. For example, suppose your CDK application has a single stack, called `MyStack`, containing an S3 bucket, a CloudFront distribution and a Lambda function. The -construct tree (L1 constructs omitted for brevity) looks like this: +construct tree looks like this (L1 constructs omitted for brevity): App └ MyStack @@ -72,9 +71,8 @@ The refactored construct tree looks like this: Even though none of the resources have changed, their paths have (from `MyStack/Bucket/Resource` to `Web/Website/Origin/Resource` etc.) Since the -CDK computes the logical IDs of the resources based on their path in the tree, -all three resources will have different logical IDs in the synthesized template, -compared to what is already deployed. +CDK computes the logical IDs of the resources from their path in the tree, all +three resources will have different logical IDs changed. If you run `cdk deploy` now, by default the CLI will detect these changes and present you with a selection prompt: @@ -124,10 +122,11 @@ to the CLI, or by configuring this setting in the `cdk.json` file: ### Ambiguity In the unlikely event that there are two or more _equivalent_ resources -(see Appendix B) in the same template, and you rename or move them at the same +(see Appendix A) in the same template, and you rename or move them at the same time, the CLI will not be able to automatically determine which resource should -be replaced by which. For example, suppose you have two identical queues, named -`Queue1` and `Queue2`, in the same stack: +be replaced by which. For example, suppose you have two queues in the same +stack, named `Queue1` and `Queue2`, with the same properties, and without a +hard-coded physical ID: App └ Stack @@ -174,11 +173,29 @@ the `cdk.json` file: } ``` -If you want to execute only the automatic refactoring, use the `cdk -refactor` command. The behavior is basically the same as with `cdk deploy`: it -will detect whether there are refactors to be made, ask for confirmation if -necessary (depending on the flag values), and refactor the stacks involved. But -it will stop there and not proceed with the deployment. +If you want to execute only the automatic refactoring, use the `cdk refactor` +command. The behavior is basically the same as with `cdk deploy`: it will detect +whether there are refactors to be made, ask for confirmation if necessary ( +depending on the flag values), and refactor the stacks involved. But it will +stop there and not proceed with the deployment. If you only want to see what +changes would be made, use the `--dry-run` flag. + +### Programmatic access + +The same refactoring feature is also available in the CDK toolkit library: + +```typescript +declare const toolkit: Toolkit; +declare const cx: ICloudAssemblySource; + +// To execute possible refactors as part of the deploy operation: +await toolkit.deploy(cx, { + refactoring: RefactoringMode.EXECUTE_AND_DEPLOY +}); + +// Or, if you just want to refactor the stacks: +await toolkit.refactor(cxSource); +``` --- @@ -199,7 +216,7 @@ the RFC pull request): A new developer experience for CDK users, that allows them to change the location of a construct (stack plus logical ID) without causing resource replacement. This new experience is available in the CDK CLI `deploy` -and `refactor` commands. +and `refactor` commands, as well as in the toolkit library. ### Why should I use this feature? @@ -207,7 +224,7 @@ If you ever find yourself doing one of the following, you will benefit from stack refactoring support: - Renaming constructs, either intentionally or by mistake. -- Moving constructs within the same stack. This could be just for better +- Moving constructs within the construct tree. This could be just for better organization, or to create reusable components. - Moving constructs between different stacks. - Renaming stacks. @@ -219,16 +236,7 @@ should work in any environment, including CI/CD pipelines, where there is no user to answer questions. Although we could easily extend this feature to include ambiguity resolution for the interactive case, it wouldn't transfer well to the non-interactive case. If you are interested in an in-depth explanation of -the problem and a possible solution, check Appendix A. - -### What if I can't use the CDK CLI in my pipeline? - -Some customers have their own mechanisms for deploying stacks to AWS, that don't -use the CDK CLI. If that is your case, there is still a way you can use this -feature: the refactoring logic will also be released in the CDK toolkit library, -that can be used programmatically by your own tools. If you can incorporate it -into your deployment tooling, you will have the same functionality as the -`refactor` command in the CDK CLI. +the problem and a possible solution, check Appendix B. ### What if the deployment fails? @@ -269,9 +277,9 @@ High level description of the algorithm: First, list all the stacks: both local and deployed. Then build an index of all resources from all stacks. This index maps the _content_ address (physical ID or digest) of each resource to all the _location_ addresses (stack name + logical -ID) they can be found in. Resources that have different locations in new stacks -compared to the old ones are considered to have been moved. For each of those, -it creates a mapping from the old location to the new one. +ID) they can be found in. Resources that have different locations before and +after, are considered to have been moved. For each of those, create a mapping +from the old location to the new one. Since the CloudFormation API expects not only the mappings, but also the templates in their final states, we need to compute those as well. This is done @@ -310,7 +318,7 @@ the user when necessary. Another alternative is to use aliases, using [Pulumi's model] [pulumi-aliases] as inspiration. This feature would be similar to the `renameLogicalId` function, but operating on a higher level of abstraction, by -taking into account the construct tree and construct IDs. But, just like +taking into account the construct tree and construct IDs. And, just like `renameLogicalId`, it could be perceived as a workaround. However, we are open to revisiting this decision if enough customers indicate their preference for it in this RFC. @@ -333,10 +341,10 @@ See the open issues section below. #### Phase 1 (dry-run) -In this phase we are going to implement the detection of resource moves, and -show the user what changes are going to be made. The only new command -available at this phase is `cdk refactor --dry-run`. Execution of this -command without the `--dry-run` flag will result in an error. +In this phase we are going to implement the detection of resource moves, and +show the user what changes are going to be made. The only new command available +at this phase is `cdk refactor --dry-run`. Execution of this command without the +`--dry-run` flag will result in an error. High-level tasks: @@ -349,7 +357,7 @@ High-level tasks: #### Phase 2 (application) -Once the detection of all cases is implemented in phase 1, we are ready to +Once the detection of all cases is implemented in phase 1, we are ready to implement the application of the changes. High-level tasks: @@ -371,7 +379,7 @@ controlled by the CLI. As a result, this is not an atomic operation: it is possible that the refactoring step succeeds, but before the CLI has a chance to deploy the changes, it gets interrupted (computer crash, network failures, etc.) In this case, the user will be left with a stack that is neither in the original -state nor in the desired state. +nor in the desired state. In particular, the logical ID won't match the CDK construct path, stored in the resource's metadata. This has consequences for the CloudFormation console, which @@ -387,19 +395,48 @@ Possible solutions to consider, from more specific to more general: then have a new command to execute both in a single atomic operation (let's say, a `executeChangeSetAndRefactor()`). -Since all these solutions depend on changes on the CloudFormation side, and this +Since all these options depend on changes on the CloudFormation side, and this edge case is unlikely to happen, we are going to address it later. ## Appendix -### A. Ideas on ambiguity resolution +### A. Equivalence between resources + +To detect which resources should be refactored, we need to indentify which +resources have only changed their location, but have remained "the same", in +some sense. This can be made precise by defining an [equivalence relation] on +the set of resources. + +Before that, let's define a digest function, `d`: + + d(resource) = hash(type + physicalId) , if physicalId is defined + = hash(type + properties + dependencies.map(d)) , otherwise + +where `hash` is a cryptographic hash function. In other words, if a resource has +a physical ID, we can use the physical ID plus its type to uniquely identify +that resource. In this case, the digest can be computed from these two fields +alone. A corollary is that such resources can be renamed and have their +properties updated at the same time, and still be considered equivalent. + +Otherwise, the digest is computed from its type, its own properties (that is, +excluding properties that refer to other resources), and the digests of each of +its dependencies. + +The digest of a resource, defined recursively this way, remains stable even if +one or more of its dependencies gets renamed. Since the resources in a +CloudFormation template form a directed acyclic graph, this function is +well-defined. + +The equivalence relation then follows directly: two resources `r1` and `r2` +are equivalent if `d(r1) = d(r2)`. + +### B. Ideas on ambiguity resolution -The only safe way to resolve ambiguity in cases such as renaming multiple -identical resources, is to ask the developer what their intent is. But what if -the developer is not present to answer questions (in a CI/CD pipeline, for -instance)? A necessary condition in this case is that the developer's intent has -been captured earlier, encoded as a mapping between resource locations, and -stored somewhere. +Let's start with a basic premise: the only safe way to resolve ambiguity is to +ask the developer what their intent is. But what if they are not present to +answer questions (in a CI/CD pipeline, for instance)? A necessary condition in +this case is that the developer's intent has been captured earlier, encoded as a +mapping between resource locations, and stored somewhere. But this is not sufficient. Note that every mapping is created from a pair of source and target states, out of which the ambiguities arose. To be able to @@ -411,18 +448,16 @@ must be met: 2. The target state used to create the mapping should indeed be what the user wants as a result. -I am using the abstract term "state" here, but how could such a state be -instantiated in practice? Let's consider some options and see how they fail to -satisfy the conditions above. +How are these "states" instantiated in practice? Let's consider some options. First, we need to establish a point when the mapping is created (and the developer is involved to resolve possible ambiguities). Let's call this the "decision point". As a first attempt, let's try to use every deployment in the -development cycle as the decision point. In this solution, the development -account is the source state, and the cloud assembly to be deployed is the target -state. If any ambiguities were resolved, they are saved in a mapping file, under -version control. On every deployment to other environments, the mapping file is -used to perform the refactoring. +development cycle as a decision point. In this solution, the development account +is the source state, and the cloud assembly to be deployed is the target state. +If any ambiguities are resolved, they are saved in a mapping file, under version +control. On every deployment to other environments, the mapping file is used to +perform the refactoring. It sounds like this could work, but if the development environment is not in the same state as the one where the mapping is applied, condition 1 is violated. And @@ -471,7 +506,7 @@ In other words, the current state should be consistent with the history that led up to that state. One final piece to add to the system: every environment should also have its own -history file, which should also maintain a similar invariant (through CFN hooks, +history file, which should also maintain the same invariant (through CFN hooks, for example). Having all this in place, we can execute the following algorithm on every deployment: @@ -529,21 +564,21 @@ intent). #### The future -There is still some work to be done to prove this system works in practice. But -assuming it does, we could use it to expand the scope to which the automatic -refactoring applies. Consider the case in which you want to rename a certain -resource and, at the same time, make some minor changes, such as adding or -updating a couple of properties. This is another ambiguous case, because it's -not clear what the intent is: update with rename, or replacement? But with the -history system, we can detect such cases, interact with the developer, and store -the decision in the history. +There is still some work to be done to prove this out. But assuming it does +work, we could use it to expand the scope to which the automatic refactoring +applies. Consider the case in which you want to rename a certain resource and, +at the same time, make some minor changes, such as adding or updating a couple +of properties. This is another ambiguous case, because it's not clear what the +intent is: update with rename, or replacement? But with the history system, we +can detect such cases, interact with the developer, and store the decision in +the history. Since this historical model contains all the information about the state of the stacks in an environment, it could also be used for other purposes. For example, development tools could use the history to provide a "time machine" feature, -that allows developers to see the state of the infrastructure at any point in +that allows developers to see the state of their infrastructure at any point in time. CloudFormation itself could build on that, and provide a way to roll back -or forward to another state. +or forward to an arbitrary state. Another problem that could be solved with the historical model is the infamous "deadly embrace", where a consumer stack depends on a producer stack via @@ -555,32 +590,6 @@ this without user intervention. Potentially, this could also help with drift resolution (or prevention), if CloudFormation itself starts using the history internally. -### B. Equivalence between resources - -To detect which resources should be refactored, we need to indentify which -resources have only changed their location, but have remained "the same", in -some sense. This can be made precise by defining an [equivalence relation] on -the set of resources. - -Before that, let's define a digest function, `d`: - - d(resource) = hash(type + physicalId) , if physicalId is defined - = hash(type + properties + dependencies.map(d)) , otherwise - -where `hash` is a cryptographic hash function. In other words, if a resource has -a physical ID, its type and physical ID uniquely identify that resource. So we -compute the hash from these two fields. Otherwise, the hash is computed from its -type, its own properties (that is, excluding properties that refer to other -resources), and the digests of each of its dependencies. - -The digest of a resource, defined recursively this way, remains stable even if -one or more of its dependencies gets renamed. Since the resources in a -CloudFormation template form an acyclic graph, this function is well-defined. - -The equivalence relation then follows directly: two resources `r1` and `r2` -are equivalent if `d(r1) = d(r2)`. - - [pulumi-aliases]: https://www.pulumi.com/docs/iac/concepts/options/aliases/ [equivalence relation]: https://en.wikipedia.org/wiki/Equivalence_relation \ No newline at end of file From 304c79c0721af3cfda5a6e71455a834f0dd62054 Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Thu, 27 Feb 2025 14:38:07 +0000 Subject: [PATCH 12/21] Addressed feedback --- text/0162-refactoring-support.md | 83 ++++++++++++++++++-------------- 1 file changed, 46 insertions(+), 37 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index 6142ea0d6..56abb7ca1 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -21,10 +21,14 @@ this is sometimes impractical or conflicts with good software engineering practices. For instance, you may want to consolidate duplicated code across different CDK applications into a single reusable construct (often called an L3 construct). Introducing a new node for the L3 construct in the construct tree -will rename the logical IDs of the resources in that subtree. Additionally, you -might need to move resources within the tree for better readability or between -stacks to isolate concerns. Accidental renames have also caused issues for -customers in the past. +will rename the logical IDs of the resources in that subtree. + +Additionally, you might need to move resources within the tree for better +readability or between stacks to isolate concerns. Accidental renames have also +caused issues for customers in the past. Perhaps even worse, if you depend on a +third-party construct library, you are not in control of the logical IDs of +those resources. If the library changes the logical IDs from one version to +another, you will be affected without any action on your part. To address all these problems, the CDK CLI now automatically detects these cases, and refactors the stack on your behalf, using the new CloudFormation @@ -82,7 +86,7 @@ present you with a selection prompt: ┌───────────────────────────────┬───────────────────────────────┬──────────────────────────┐ │ Resource Type │ Old Logical ID │ New Logical ID │ ├───────────────────────────────┼───────────────────────────────┼──────────────────────────┤ - │ AWS::S3::Bucket │ MyStack.Queue1A4198146 │ Web.Bucket843D52FF │ + │ AWS::S3::Bucket │ MyStack.Bucket5766466B │ Web.Bucket843D52FF │ ├───────────────────────────────┼───────────────────────────────┼──────────────────────────┤ │ AWS::CloudFront::Distribution │ MyStack.DistributionE3BB089E │ Web.Distribution7142E1F1 │ ├───────────────────────────────┼───────────────────────────────┼──────────────────────────┤ @@ -119,6 +123,18 @@ to the CLI, or by configuring this setting in the `cdk.json` file: } ``` +Please note that the same CDK application can have multiple stacks for different +environments. In that case, the CLI will group the stacks by environment and +perform the refactoring separately in each one. Trying to move resources between +stacks that belong in different environments will result in an error. + +### Rollbacks + +After refactoring the stack, the CLI will proceed with the deployment +(assuming that is your choice). If the deployment fails, and CloudFormation +rolls it back, the CLI will execute a second refactor, in reverse, to bring the +resources back to their original locations. + ### Ambiguity In the unlikely event that there are two or more _equivalent_ resources @@ -157,21 +173,7 @@ names. In this case, it will ask you to confirm the changes: If you want to take advantage of automatic resource refactoring, avoid renaming or moving multiple identical resources at the same time. - If these changes were intentional, and you want to proceed with the - resource replacements, please confirm below. - - Do you wish to deploy these changes (y/n)? - -To skip this prompt and go straight to deployment, pass the -`--ignore-ambiguous-refactoring` flag to the CLI, or configure this setting in -the `cdk.json` file: - -```json -{ - "app": "...", - "ignoreAmbiguousRefactoring": true -} -``` +### Refactor only If you want to execute only the automatic refactoring, use the `cdk refactor` command. The behavior is basically the same as with `cdk deploy`: it will detect @@ -228,6 +230,7 @@ stack refactoring support: organization, or to create reusable components. - Moving constructs between different stacks. - Renaming stacks. +- Upgrading dependencies on construct libraries. ### Can the CLI help me resolve ambiguity when refactoring resources? @@ -238,13 +241,6 @@ include ambiguity resolution for the interactive case, it wouldn't transfer well to the non-interactive case. If you are interested in an in-depth explanation of the problem and a possible solution, check Appendix B. -### What if the deployment fails? - -After refactoring the stack, the CLI will proceed with the deployment -(assuming that is your choice). If the deployment fails, and CloudFormation -rolls it back, the CLI will execute a second refactor, in reverse, to bring the -resources back to their original locations. - ## Internal FAQ ### Why are we doing this? @@ -277,9 +273,28 @@ High level description of the algorithm: First, list all the stacks: both local and deployed. Then build an index of all resources from all stacks. This index maps the _content_ address (physical ID or digest) of each resource to all the _location_ addresses (stack name + logical -ID) they can be found in. Resources that have different locations before and -after, are considered to have been moved. For each of those, create a mapping -from the old location to the new one. +ID) they can be found in. + +Resources that have different locations before and after, are considered to have +been moved. For each of those, create a mapping from the source (the currently +deployed location) to the destination (the new location, in the local +template). Example: + + // Keys are the content address of the resources + // Values are the location addresses + { + "5e19886121239b7a": { // Moved and renamed -> goes to mapping + "before": ["stack1/logicalId1"], + "after": "["stack2/logicalId2"] + }, + "24ad8195002086b6": { // Removed -> ignored + "before": ["stack1/logicalId3"] + }, + "07266c4dd0146e8a": { // Unchanged -> ignored + "before": ["stack1/logicalId4"] + "after": ["stack1/logicalId4"] + } + } Since the CloudFormation API expects not only the mappings, but also the templates in their final states, we need to compute those as well. This is done @@ -297,12 +312,6 @@ Assuming there were no ambiguities, or the user wants to proceed anyway, it is ready to call the API to actually perform the refactor, using the mappings and templates computed previously. -As every AWS API call, refactoring is restricted to a given environment -(account and region). Given that one CDK app can have stacks for multiple -environments, the CLI will group the stacks by environment and perform the -refactoring separately in each one. Trying to move resources between stacks that -belong in different environments will result in an error. - ### Is this a breaking change? No. @@ -409,7 +418,7 @@ the set of resources. Before that, let's define a digest function, `d`: - d(resource) = hash(type + physicalId) , if physicalId is defined + d(resource) = hash(type + physicalId) , if physicalId was defined by the user = hash(type + properties + dependencies.map(d)) , otherwise where `hash` is a cryptographic hash function. In other words, if a resource has From 911878007abcbd0e2986994c540cbc50de69d661 Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Thu, 27 Feb 2025 15:27:58 +0000 Subject: [PATCH 13/21] Ambiguity and equivalence --- text/0162-refactoring-support.md | 37 +++++++++++++++++++++++--------- 1 file changed, 27 insertions(+), 10 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index 56abb7ca1..09feae3f4 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -137,12 +137,29 @@ resources back to their original locations. ### Ambiguity -In the unlikely event that there are two or more _equivalent_ resources -(see Appendix A) in the same template, and you rename or move them at the same -time, the CLI will not be able to automatically determine which resource should -be replaced by which. For example, suppose you have two queues in the same -stack, named `Queue1` and `Queue2`, with the same properties, and without a -hard-coded physical ID: +Imagine a person walking down the street, and someone takes two snapshots of +them, a few seconds apart. When you look at the snapshots, you can tell that +it's the _same person_, but at different locations, and not two different +people. What leads you this conclusion? The fact that the person's features +(face, clothes, height, etc.) are exactly the same in both photographs. But if +the photos are of two identical siblings wearing the same clothes, you won't be +able to tell which one is which, from one photo to the next. + +The same principle applies here: you can think of stack name + logical ID as a +place, and the resource properties as its "personal" features. If a resource has +different properties before and after a potential deployment, the best we can do +is assume they are different resources. If the properties are the same from one +point in time to the other, they very likely are the same resource (although the +developer still has the last word on this). But if there are two resources that +have the same properties in the same stack, they are like the twin siblings +case: there's no way to tell them which is which in case they both move to a +different place. + +Indistinguishable resources in this sense are said to be _equivalent_ (see +Appendix A for a more formal definition). In the unlikely event that two or more +equivalent resources move, the CLI won't be able to proceed. For example, +suppose you have two queues in the same stack, named `Queue1` and `Queue2`, with +the same properties, and without a hard-coded physical ID: App └ Stack @@ -156,8 +173,8 @@ If they get renamed to, let's say, `Queue3` and `Queue4`, ├ Queue3 └ Queue4 -Then the CLI will not be able to establish a 1:1 mapping between the old and new -names. In this case, it will ask you to confirm the changes: +then the CLI will not be able to establish a 1:1 mapping between the old and new +names. In this case, it will show you the ambiguity, and stop the deployment: Ambiguous Resource Name Changes ┌───┬──────────────────────┐ @@ -277,8 +294,8 @@ ID) they can be found in. Resources that have different locations before and after, are considered to have been moved. For each of those, create a mapping from the source (the currently -deployed location) to the destination (the new location, in the local -template). Example: +deployed location) to the destination (the new location, in the local template). +Example: // Keys are the content address of the resources // Values are the location addresses From fb7c47d1a8965c2568f630784d092bb9d3bca89f Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Thu, 27 Feb 2025 15:35:17 +0000 Subject: [PATCH 14/21] Ambiguity and equivalence --- text/0162-refactoring-support.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index 09feae3f4..fee9c821f 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -141,9 +141,10 @@ Imagine a person walking down the street, and someone takes two snapshots of them, a few seconds apart. When you look at the snapshots, you can tell that it's the _same person_, but at different locations, and not two different people. What leads you this conclusion? The fact that the person's features -(face, clothes, height, etc.) are exactly the same in both photographs. But if -the photos are of two identical siblings wearing the same clothes, you won't be -able to tell which one is which, from one photo to the next. +(face, clothes, height, etc.) are exactly the same in both photographs. But if, +instead of one person walking, you have two identical siblings, wearing the same +clothes, you won't be able to tell which one is which, from one photo to the +next. The same principle applies here: you can think of stack name + logical ID as a place, and the resource properties as its "personal" features. If a resource has From 6963761cfe84e033958eda24ddb1991876b9fb68 Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Thu, 27 Feb 2025 15:49:26 +0000 Subject: [PATCH 15/21] Minor changes --- text/0162-refactoring-support.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index fee9c821f..5018a9b04 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -23,12 +23,12 @@ different CDK applications into a single reusable construct (often called an L3 construct). Introducing a new node for the L3 construct in the construct tree will rename the logical IDs of the resources in that subtree. -Additionally, you might need to move resources within the tree for better -readability or between stacks to isolate concerns. Accidental renames have also -caused issues for customers in the past. Perhaps even worse, if you depend on a -third-party construct library, you are not in control of the logical IDs of -those resources. If the library changes the logical IDs from one version to -another, you will be affected without any action on your part. +Also, you might need to move resources within the tree for better readability or +between stacks to isolate concerns. Accidental renames have also caused issues +for customers in the past. Perhaps even worse, if you depend on a third-party +construct library, you are not in control of the logical IDs of those resources. +If the library changes the logical IDs from one version to another, you will be +affected without any action on your part. To address all these problems, the CDK CLI now automatically detects these cases, and refactors the stack on your behalf, using the new CloudFormation @@ -76,7 +76,7 @@ The refactored construct tree looks like this: Even though none of the resources have changed, their paths have (from `MyStack/Bucket/Resource` to `Web/Website/Origin/Resource` etc.) Since the CDK computes the logical IDs of the resources from their path in the tree, all -three resources will have different logical IDs changed. +three resources will have their logical IDs changed. If you run `cdk deploy` now, by default the CLI will detect these changes and present you with a selection prompt: @@ -343,7 +343,7 @@ better experience by automatically detecting these cases, and interacting with the user when necessary. Another alternative is to use aliases, using [Pulumi's model] -[pulumi-aliases] as inspiration. This feature would be similar to the +[pulumi-aliases] as inspiration. This feature would be similar to the `renameLogicalId` function, but operating on a higher level of abstraction, by taking into account the construct tree and construct IDs. And, just like `renameLogicalId`, it could be perceived as a workaround. However, we are open From a2cd014c298058689e9d85bbda0d61a5f7d81512 Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Thu, 27 Feb 2025 16:24:22 +0000 Subject: [PATCH 16/21] Update text/0162-refactoring-support.md Co-authored-by: Momo Kornher --- text/0162-refactoring-support.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index 5018a9b04..7da7b8a81 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -5,7 +5,7 @@ - **API Bar Raiser**: @rix0rrr An improvement to the CDK CLI and toolkit library, to protect against -replacement of resources when they change location. The CLI now detects this +replacement of resources when they change location. The Toolkit now detects this case, and automatically refactors the stack before the actual deployment. ## Working Backwards From 5f0305bc9bdecefa904b70b2c3898cfd909d1674 Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Thu, 27 Feb 2025 16:32:20 +0000 Subject: [PATCH 17/21] Update text/0162-refactoring-support.md Co-authored-by: Momo Kornher --- text/0162-refactoring-support.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index 7da7b8a81..b7fdb903b 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -58,7 +58,7 @@ account: - Rename the bucket from `Bucket` to the more descriptive name `Origin`. - Create a new L3 construct called `Website` that groups the bucket and the distribution, to make this pattern reusable in different applications. -- Move the web-related constructs (now under the `Website` L3) construct to a +- Move the web-related constructs (now under the `Website` L3) to a new stack called `Web`, for better separation of concerns. - Rename the original stack to `Service`, to better reflect its new specific role in the application. From fc7801e5f9a4221f52ed0e4964e26b244ff5c992 Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Fri, 28 Feb 2025 12:34:18 +0000 Subject: [PATCH 18/21] More feedback --- text/0162-refactoring-support.md | 94 ++++++++++++++++++++------------ 1 file changed, 58 insertions(+), 36 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index b7fdb903b..6aa2d48a6 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -47,10 +47,10 @@ containing an S3 bucket, a CloudFront distribution and a Lambda function. The construct tree looks like this (L1 constructs omitted for brevity): App - └ MyStack - ├ Bucket - ├ Distribution - └ Function + └─ MyStack + ├─ Bucket + ├─ Distribution + └─ Function Now suppose you make the following changes, after having deployed it to your AWS account: @@ -66,20 +66,22 @@ account: The refactored construct tree looks like this: App - ├ Web - │ └ Website - │ ├ Origin - │ └ Distribution - └ Service - └ Function + ├─ Web + │ └─ Website + │ ├─ Origin + │ └─ Distribution + └─ Service + └─ Function Even though none of the resources have changed, their paths have (from `MyStack/Bucket/Resource` to `Web/Website/Origin/Resource` etc.) Since the CDK computes the logical IDs of the resources from their path in the tree, all -three resources will have their logical IDs changed. +three resources will have their logical IDs changed. Without refactoring +support, all three resources would be replaced. -If you run `cdk deploy` now, by default the CLI will detect these changes and -present you with a selection prompt: +With refactoring support, if you run `cdk deploy` after making these changes, by +default the CLI will detect these changes and present you with a selection +prompt: The following resources were moved or renamed: @@ -125,8 +127,10 @@ to the CLI, or by configuring this setting in the `cdk.json` file: Please note that the same CDK application can have multiple stacks for different environments. In that case, the CLI will group the stacks by environment and -perform the refactoring separately in each one. Trying to move resources between -stacks that belong in different environments will result in an error. +perform the refactoring separately in each one. So, although you can move +resources between stacks, both stacks involved in the move must be in the same +environment. Trying to move resources across environments will result in an +error. ### Rollbacks @@ -135,14 +139,18 @@ After refactoring the stack, the CLI will proceed with the deployment rolls it back, the CLI will execute a second refactor, in reverse, to bring the resources back to their original locations. +If you don't want the CLI to perform the rollback refactor, you can use the +`--no-rollback` flag, which also controls the rollback behavior of the +deployment. + ### Ambiguity Imagine a person walking down the street, and someone takes two snapshots of them, a few seconds apart. When you look at the snapshots, you can tell that -it's the _same person_, but at different locations, and not two different -people. What leads you this conclusion? The fact that the person's features -(face, clothes, height, etc.) are exactly the same in both photographs. But if, -instead of one person walking, you have two identical siblings, wearing the same +it's the _same person_ at different places, rather than two different people. +You are justified in this conclusion because the person's features (face, +clothes, height, etc.) are exactly the same in both photographs. But if, instead +of one person walking, you have a pair of identical twins, wearing the same clothes, you won't be able to tell which one is which, from one photo to the next. @@ -154,25 +162,25 @@ point in time to the other, they very likely are the same resource (although the developer still has the last word on this). But if there are two resources that have the same properties in the same stack, they are like the twin siblings case: there's no way to tell them which is which in case they both move to a -different place. +different stack or get renamed. Indistinguishable resources in this sense are said to be _equivalent_ (see Appendix A for a more formal definition). In the unlikely event that two or more equivalent resources move, the CLI won't be able to proceed. For example, suppose you have two queues in the same stack, named `Queue1` and `Queue2`, with -the same properties, and without a hard-coded physical ID: +the same properties, and without a user defined physical ID: App - └ Stack - ├ Queue1 - └ Queue2 + └─ Stack + ├─ Queue1 + └─ Queue2 If they get renamed to, let's say, `Queue3` and `Queue4`, App - └ Stack - ├ Queue3 - └ Queue4 + └─ Stack + ├─ Queue3 + └─ Queue4 then the CLI will not be able to establish a 1:1 mapping between the old and new names. In this case, it will show you the ambiguity, and stop the deployment: @@ -214,11 +222,26 @@ await toolkit.deploy(cx, { }); // Or, if you just want to refactor the stacks: -await toolkit.refactor(cxSource); +await toolkit.refactor(cx); +``` + +In case of ambiguity, the toolkit will throw an `AmbiguityError`: + +```typescript +try { + await toolkit.refactor(cxSource); +} catch (e) { + if (e instanceof AmbiguityError) { + // Handle ambiguity + } else { + throw e; + } +} ``` --- + Ticking the box below indicates that the public API of this RFC has been signed-off by the API bar raiser (the `status/api-approved` label was applied to the RFC pull request): @@ -342,14 +365,6 @@ resources to move from which stack to which stack. But the CDK CLI can provide a better experience by automatically detecting these cases, and interacting with the user when necessary. -Another alternative is to use aliases, using [Pulumi's model] -[pulumi-aliases] as inspiration. This feature would be similar to the -`renameLogicalId` function, but operating on a higher level of abstraction, by -taking into account the construct tree and construct IDs. And, just like -`renameLogicalId`, it could be perceived as a workaround. However, we are open -to revisiting this decision if enough customers indicate their preference for it -in this RFC. - A possible variation of the solution presented in this RFC is to do something similar to resource lookup: for every environment where the application could be deployed to, the CLI would have configured in a file what refactors have to be @@ -360,6 +375,13 @@ administrators, security engineers, etc.), and is more error-prone: failure to record a refactor in the file could lead to inconsistencies between the environments, and even unintended resource replacements. +Customers have also suggested aliases, using [Pulumi's model][pulumi-aliases] as +inspiration. This feature would be similar to the `renameLogicalId` function, +but operating on a higher level of abstraction, by taking into account the +construct tree and construct IDs. And, just like `renameLogicalId`, it could be +perceived as a workaround. However, we are open to implementing this as an +additional feature if enough customers indicate their preference for it. + ### What are the drawbacks of this solution? See the open issues section below. From d184c22539b9b2bf68d0dfa4e01c2e4b28b9b271 Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Fri, 28 Feb 2025 16:02:30 +0000 Subject: [PATCH 19/21] Settings --- text/0162-refactoring-support.md | 85 ++++++++++++++++++++++++-------- 1 file changed, 64 insertions(+), 21 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index 6aa2d48a6..bbd770735 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -58,8 +58,8 @@ account: - Rename the bucket from `Bucket` to the more descriptive name `Origin`. - Create a new L3 construct called `Website` that groups the bucket and the distribution, to make this pattern reusable in different applications. -- Move the web-related constructs (now under the `Website` L3) to a - new stack called `Web`, for better separation of concerns. +- Move the web-related constructs (now under the `Website` L3) to a new stack + called `Web`, for better separation of concerns. - Rename the original stack to `Service`, to better reflect its new specific role in the application. @@ -115,16 +115,6 @@ refactor is executed: ✅ Stack refactor complete -You can configure the refactoring behavior by passing the `--refactoring` flag -to the CLI, or by configuring this setting in the `cdk.json` file: - -```json -{ - "app": "...", - "refactoring": "EXECUTE_AND_DEPLOY" -} -``` - Please note that the same CDK application can have multiple stacks for different environments. In that case, the CLI will group the stacks by environment and perform the refactoring separately in each one. So, although you can move @@ -132,6 +122,68 @@ resources between stacks, both stacks involved in the move must be in the same environment. Trying to move resources across environments will result in an error. +If you want to execute only the automatic refactoring, use the `cdk refactor` +command. The behavior is basically the same as with `cdk deploy`: it will detect +whether there are refactors to be made, ask for confirmation if necessary ( +depending on the flag values), and refactor the stacks involved. But it will +stop there and not proceed with the deployment. If you only want to see what +changes would be made, use the `--dry-run` flag. + +### Settings + +By default, refactoring is disabled on deployments. To override this behavior, +you can either use a command line flag (`--refactor-action`) or set the +`refactorAction` field in the `cdk.json` file. The flag/setting can have one of +the following values: + +- `CONFIRM`: this enables the behavior described in the previous section, where + the CLI shows the changes and asks for confirmation. This only works in + interactive mode (TTY). +- `REFACTOR`: automatically detects resource moves, executes the refactors, and + deploys the stacks. +- `SKIP`: goes straight to deployment, skipping refactoring. This is the default + behavior. +- `QUIT`: process stops, returning a non-zero exit status. + +The last three values are the same options as the prompt. + +If there are ambiguities, the CLI will look for a mapping file to resolve them. +If the file does not exist or doesn't apply, the CLI will fail. Otherwise, it +will apply the mappings and proceed with the deployment. + +These options are summarized in the following flowchart: + +```mermaid +flowchart TD + map{Mapping file} + amb{Ambiguities} + conf[/"Refactor action\n (cli flag or config)"/] + ask[/"Ask user"/] + comm>"Same options as the 'Refactor action' flag (except CONFIRM). + Omitted to avoid cluttering the diagram."] + tty{is TTY} + DEPLOY([Deploy only]) + APPLY([Apply mapping and deploy]) + FAIL([Fail]) + AUTO([Auto refactor and deploy]) + + conf -->|null or SKIP| DEPLOY + conf -->|CONFIRM| tty + map -->|No| FAIL + map -->|Yes| APPLY + conf -->|QUIT| FAIL + conf -->|REFACTOR| amb + amb -->|No| AUTO + amb -->|YES| map + tty -->|Yes| ask + tty -->|No| FAIL + ask --> comm +``` + +Also note that this feature is still experimental, so you have to pass +`--unstable=refactor` flag to confirm you are aware of this to both `deploy` +and `refactor` commands. + ### Rollbacks After refactoring the stack, the CLI will proceed with the deployment @@ -199,15 +251,6 @@ names. In this case, it will show you the ambiguity, and stop the deployment: If you want to take advantage of automatic resource refactoring, avoid renaming or moving multiple identical resources at the same time. -### Refactor only - -If you want to execute only the automatic refactoring, use the `cdk refactor` -command. The behavior is basically the same as with `cdk deploy`: it will detect -whether there are refactors to be made, ask for confirmation if necessary ( -depending on the flag values), and refactor the stacks involved. But it will -stop there and not proceed with the deployment. If you only want to see what -changes would be made, use the `--dry-run` flag. - ### Programmatic access The same refactoring feature is also available in the CDK toolkit library: From 5c8461bd396680a9978dc308f497b728ba93abf1 Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Mon, 3 Mar 2025 10:10:13 +0000 Subject: [PATCH 20/21] Explicit mapping and settings --- text/0162-refactoring-support.md | 370 ++++++++++++------------------- 1 file changed, 141 insertions(+), 229 deletions(-) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index bbd770735..8415d4ae9 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -115,6 +115,17 @@ refactor is executed: ✅ Stack refactor complete +If you want to execute only the automatic refactoring, use the `cdk refactor` +command. The behavior is basically the same as with `cdk deploy`: it will detect +whether there are refactors to be made, ask for confirmation if necessary +(depending on the flag values), and refactor the stacks involved. But it will +stop there and not proceed with the deployment. If you only want to see what +changes would be made, use the `--dry-run` flag. + +To perform refactoring, the CLI needs new permissions in the bootstrap stack. +Before using this feature, run `cdk bootstrap` for every target environment, to +add these new permissions. + Please note that the same CDK application can have multiple stacks for different environments. In that case, the CLI will group the stacks by environment and perform the refactoring separately in each one. So, although you can move @@ -122,69 +133,78 @@ resources between stacks, both stacks involved in the move must be in the same environment. Trying to move resources across environments will result in an error. -If you want to execute only the automatic refactoring, use the `cdk refactor` -command. The behavior is basically the same as with `cdk deploy`: it will detect -whether there are refactors to be made, ask for confirmation if necessary ( -depending on the flag values), and refactor the stacks involved. But it will -stop there and not proceed with the deployment. If you only want to see what -changes would be made, use the `--dry-run` flag. - ### Settings -By default, refactoring is disabled on deployments. To override this behavior, -you can either use a command line flag (`--refactor-action`) or set the -`refactorAction` field in the `cdk.json` file. The flag/setting can have one of -the following values: - -- `CONFIRM`: this enables the behavior described in the previous section, where - the CLI shows the changes and asks for confirmation. This only works in - interactive mode (TTY). -- `REFACTOR`: automatically detects resource moves, executes the refactors, and - deploys the stacks. -- `SKIP`: goes straight to deployment, skipping refactoring. This is the default - behavior. -- `QUIT`: process stops, returning a non-zero exit status. - -The last three values are the same options as the prompt. - -If there are ambiguities, the CLI will look for a mapping file to resolve them. -If the file does not exist or doesn't apply, the CLI will fail. Otherwise, it -will apply the mappings and proceed with the deployment. - -These options are summarized in the following flowchart: - -```mermaid -flowchart TD - map{Mapping file} - amb{Ambiguities} - conf[/"Refactor action\n (cli flag or config)"/] - ask[/"Ask user"/] - comm>"Same options as the 'Refactor action' flag (except CONFIRM). - Omitted to avoid cluttering the diagram."] - tty{is TTY} - DEPLOY([Deploy only]) - APPLY([Apply mapping and deploy]) - FAIL([Fail]) - AUTO([Auto refactor and deploy]) - - conf -->|null or SKIP| DEPLOY - conf -->|CONFIRM| tty - map -->|No| FAIL - map -->|Yes| APPLY - conf -->|QUIT| FAIL - conf -->|REFACTOR| amb - amb -->|No| AUTO - amb -->|YES| map - tty -->|Yes| ask - tty -->|No| FAIL - ask --> comm +The behavior of the refactoring feature can be controlled by a few settings. + +For both `deploy` and `refactor`: + +- `--export-mapping=`: writes the mapping to a file. The file can be used + later to apply the same refactors to other environments. +- `--import-mapping=`: use the mapping from a file, instead of computing + it. The file can be generated using the `--export-mapping` option. +- `--dry-run`: shows on stdout what would happen, but doesn't execute. +- `--unstable=refactor`: enables the feature. If the flag is not set, the + command fails with an error message explaining that the feature is + experimental. + +For `deploy` only: + +- `--refactoring-action=[ACTION]`: the action to take in case there is a + refactor to be made. Possible values for `ACTION` are: + - `confirm`: ask the user what to do. This is the scenario described in the + **How it works** section. + - `refactor`: automatically refactor and deploy. + - `quit`: stop with a non-zero exit code. + - `skip`: deploy without refactoring. This is the default value. + +All these settings are also available in the `cdk.json` file: + +```json +{ + "app": "...", + "refactor": { + "refactoringAction": "confirm", + "exportMapping": "output.json", + "dryRun": true + } +} ``` -Also note that this feature is still experimental, so you have to pass -`--unstable=refactor` flag to confirm you are aware of this to both `deploy` -and `refactor` commands. - -### Rollbacks +### Explicit mapping + +Although the CLI will automatically refactor only what is unambiguous, you may +still need to have more control over the refactoring process. Companies usually +have stringent policies on how changes are made to the production environment, +for example. One such policy is that every change must be explicitly declared in +some sort of code, including refactors. + +In a situation like this, you can import and export mapping files. Here is how +it works: at development time, you made a change that the CLI detected as a +refactor. Since you want that refactor to be propagated to other environments, +you export the mapping file, with the command +`cdk refactor --export-mapping=file.json`. + +There are at least two possible paths from here, depending on the company's +policies: + +1. You send the exported file to an operations team, who will review it. If + approved, they manually run the command + `cdk refactor --import-mapping=file.json` on every protected environment in + advance (i.e., before your changes get deployed to those environments). When + you import a mapping, the CLI won't try to detect refactors. +2. The `--apply-mapping` option is also available for the `deploy` command. So + you can commit the mapping file to version control, and configure the CLI to + use apply it on every deployment. This is a more convenient option, because + it requires less coordination between different roles, but the mapping file + must be removed from the repository once the refactor has been applied. + Otherwise, the next deployment will fail. + +In general, if the protected environment is not in the same state as the +environment where the mapping was generated, the `refactor --apply-mapping` +command will fail. + +### Rollback After refactoring the stack, the CLI will proceed with the deployment (assuming that is your choice). If the deployment fails, and CloudFormation @@ -251,6 +271,11 @@ names. In this case, it will show you the ambiguity, and stop the deployment: If you want to take advantage of automatic resource refactoring, avoid renaming or moving multiple identical resources at the same time. + If you want to provide an explicit mapping, use the --import-mapping option. + +As the message suggests, you can use the `--import-mapping` option as a way to +resolve ambiguities. + ### Programmatic access The same refactoring feature is also available in the CDK toolkit library: @@ -281,7 +306,6 @@ try { } } ``` - --- @@ -316,15 +340,6 @@ stack refactoring support: - Renaming stacks. - Upgrading dependencies on construct libraries. -### Can the CLI help me resolve ambiguity when refactoring resources? - -Not at the moment. One of the constraints we imposed on this feature is that it -should work in any environment, including CI/CD pipelines, where there is no -user to answer questions. Although we could easily extend this feature to -include ambiguity resolution for the interactive case, it wouldn't transfer well -to the non-interactive case. If you are interested in an in-depth explanation of -the problem and a possible solution, check Appendix B. - ## Internal FAQ ### Why are we doing this? @@ -398,7 +413,11 @@ templates computed previously. ### Is this a breaking change? -No. +No. By default, the CLI will skip refactoring on deployment. The user must +explicitly enable it by passing a value other than `skip` to the +`--refactoring-action` (or `refactoringAction` in `cdk.json`) option. Also, this +feature will initially be launched as experimental, and the user must +acknowledge this by passing the `--unstable=refactor` flag. ### What alternative solutions did you consider? @@ -522,165 +541,58 @@ well-defined. The equivalence relation then follows directly: two resources `r1` and `r2` are equivalent if `d(r1) = d(r2)`. -### B. Ideas on ambiguity resolution - -Let's start with a basic premise: the only safe way to resolve ambiguity is to -ask the developer what their intent is. But what if they are not present to -answer questions (in a CI/CD pipeline, for instance)? A necessary condition in -this case is that the developer's intent has been captured earlier, encoded as a -mapping between resource locations, and stored somewhere. - -But this is not sufficient. Note that every mapping is created from a pair of -source and target states, out of which the ambiguities arose. To be able to -safely carry a mapping over to other environments, two additional conditions -must be met: - -1. The source state on which a mapping is applied must be the same as the source - state where the mapping was captured. -2. The target state used to create the mapping should indeed be what the user - wants as a result. - -How are these "states" instantiated in practice? Let's consider some options. - -First, we need to establish a point when the mapping is created (and the -developer is involved to resolve possible ambiguities). Let's call this the -"decision point". As a first attempt, let's try to use every deployment in the -development cycle as a decision point. In this solution, the development account -is the source state, and the cloud assembly to be deployed is the target state. -If any ambiguities are resolved, they are saved in a mapping file, under version -control. On every deployment to other environments, the mapping file is used to -perform the refactoring. - -It sounds like this could work, but if the development environment is not in the -same state as the one where the mapping is applied, condition 1 is violated. And -if the developer fails, for whatever reason, to run a deployment against their -environment before commiting an ambiguous change to the version control system -(which I will henceforth assume is Git), condition 2 is violated. - -Since we are talking about Git, what about using each commit operation as a -decision point? In this case, the source and target states would come from the -synthesized cloud assemblies in the previous and current revision, respectively. -We still have a mapping file, containing ambiguity resolutions, which are added -to the commit, using a Git hook. For this solution to work, we need an -additional constraint, which can also be enforced with a Git hook: that every -revision produces a valid cloud assembly. - -Let's evaluate this solution in terms of the two conditions above. Because the -developer doesn't have a choice anymore of which target state to use (or source -state, for that matter), condition 2 is satisfied. But remember that the scope -of the mapping file is the difference between two consecutive revisions. If the -developer's local branch is multiple commits ahead of the revision that was -deployed to production, the source state in production is not the same as the -one in the mapping file, violating condition 1. - -#### Making history - -An improvement we can make is to turn this into an event sourcing system. -Instead of storing a single mapping between two states, we store the whole -history of the stacks. A **history** is a chain of events in chronological -order. An **event** is a set of operations (create, update, delete, and -refactor) on a set of stacks. - -The decision point remains the same, but now we append a new event to a version -controlled history file on every commit. This event includes all creates, -updates and deletes, plus all refactors, whether they were automatically -detected or manually resolved. - -As with any event sourcing system, if we want to produce a snapshot of the -stacks at a given point in time, all we need to do is replay the events in -order, up to that point. We are now ready to state the key invariant of this -system: - -> **Invariant**: For every revision `r`, the cloud assembly synthesized from -> `r` is equal to the snapshot at `r`. - -In other words, the current state should be consistent with the history that led -up to that state. - -One final piece to add to the system: every environment should also have its own -history file, which should also maintain the same invariant (through CFN hooks, -for example). Having all this in place, we can execute the following algorithm -on every deployment: - - --------------------------------- - Key: - H(E): environment history - H(A): application history - LCA: lowest common ancestor - --------------------------------- - - if H(E) is a prefix of H(A): - Compute the diff between H(A) and H(E); - Extract the mapping from the diff; - Apply the mapping to the stacks in the environment; - Deploy; - else: - a = LCA of H(A) and H(E); - Compute the sub-chain of H(A) from a to the end; - Extract the mapping from the diff; - if the mapping is empty: - Deploy; - else: - Error: source state doesn't match the mapping. - -For example, suppose the histories at play are (`*` denotes the current state): - - H(E) = e1 ◄── e2* - H(A) = e1 ◄── e2 ◄── e3 ◄── e4 - -Then the diff between them is `e3 ◄── e4`. If these events contain any refactor, -we just apply them, and then deploy the application. The resulting environment -history is the merge of the two: - - H(E) = e1 ◄── e2 ◄── e3 ◄── e4* - -Now suppose the histories involved are: - - H(E) = e1 ◄── e2 ◄── e3* - H(A) = e1 ◄── e2 ◄── e4 ◄── e5 - -In this case, `H(E)` is not a prefix of `H(A)`, but they have common ancestors. -Their LCA is `e2`. Computing the sub-chain from there we get `e4 ◄── e5`. If -there are no refactors to apply from this diff, we can go ahead and deploy the -application. Again, the new state results from the merge of `H(E)` and -`H(A)`: - - H(E) = e1 ◄── e2 ◄── e3 - ▲ - │ - └──── e4 ◄── e5* - -If there are refactors to be done, this is considered an error, because we can't -guarantee that the refactor makes sense (let alone that this was the developer's -intent). - -#### The future - -There is still some work to be done to prove this out. But assuming it does -work, we could use it to expand the scope to which the automatic refactoring -applies. Consider the case in which you want to rename a certain resource and, -at the same time, make some minor changes, such as adding or updating a couple -of properties. This is another ambiguous case, because it's not clear what the -intent is: update with rename, or replacement? But with the history system, we -can detect such cases, interact with the developer, and store the decision in -the history. - -Since this historical model contains all the information about the state of the -stacks in an environment, it could also be used for other purposes. For example, -development tools could use the history to provide a "time machine" feature, -that allows developers to see the state of their infrastructure at any point in -time. CloudFormation itself could build on that, and provide a way to roll back -or forward to an arbitrary state. - -Another problem that could be solved with the historical model is the infamous -"deadly embrace", where a consumer stack depends on a producer stack via -CloudFormation Exports, and you want to remove the use from the consumer. At the -moment, customers have to use the `stack.exportValue(...)` method, and do two -deployments. The history would give the CLI all the information it needs to do -this without user intervention. - -Potentially, this could also help with drift resolution (or prevention), if -CloudFormation itself starts using the history internally. +### B. Handling of settings + +Pseudocode for `deploy`: + + // either from CLI option or config file + switch (refactoring action): + case quit: + Stop with non-zero exit code; + + case refactor: + m = getMapping(); + if (not --dry-run): + Apply m; + Deploy; + + case skip or null: + Deploy; + + case confirm: + m = getMapping(). + if (not --dry-run): + Ask user what to do; + switch (user's choice): + case quit: + Stop with non-zero exit code; + case refactor: + Apply m; + Deploy; + case skip: + Deploy; + + function getMapping(): + if (not --unstable=refactor): + Fail with a specific error message; + + if (--import-mapping): + m = mapping from the file; + else: + m = compute the mapping; + + Render m to stdout; + + if (--export-mapping): + Write m to the file; + + return m; + +The pseudocode for the `refactor` command is simply: + + m = getMapping(); + if (not --dry-run): + Apply m; [pulumi-aliases]: https://www.pulumi.com/docs/iac/concepts/options/aliases/ From 95ba4f5e31b2523f75e71ff4cb9eeaacc7e87d48 Mon Sep 17 00:00:00 2001 From: Otavio Macedo <288203+otaviomacedo@users.noreply.github.com> Date: Mon, 3 Mar 2025 10:42:48 +0000 Subject: [PATCH 21/21] Filters --- text/0162-refactoring-support.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/text/0162-refactoring-support.md b/text/0162-refactoring-support.md index 8415d4ae9..29fc68f47 100644 --- a/text/0162-refactoring-support.md +++ b/text/0162-refactoring-support.md @@ -115,6 +115,12 @@ refactor is executed: ✅ Stack refactor complete +If you pass any filters to the `deploy` command, the refactor will work on those +stacks plus any other stacks the refactor touches. For example, if you choose to +only deploy stack A, and a resource was moved from stack A to stack B, the +refactor will involve both stacks. But even if there was a rename in, let's say, +stack C, it will not be refactored. + If you want to execute only the automatic refactoring, use the `cdk refactor` command. The behavior is basically the same as with `cdk deploy`: it will detect whether there are refactors to be made, ask for confirmation if necessary @@ -306,6 +312,7 @@ try { } } ``` + ---