Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage controller: API + CLI for migrating secondary locations #10284

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Jan 6, 2025

Problem

Currently, if we want to move a secondary there isn't a neat way to do that: we just have migration API for the attached location, and it is only clean to use that if you've manually created a secondary via pageserver API in the place you're going to move it to.

Secondary migration API enables:

  • Moving the secondary somewhere because we would like to later move the attached location there.
  • Move the secondary location because we just want to reclaim some disk space from its current location.

Summary of changes

  • Add /migrate_secondary API
  • Add tenant-shard-migrate-secondary CLI
  • Add tests for above

@jcsp jcsp added t/feature Issue type: feature, for new features or requests t/on_call_followup c/storage/controller Component: Storage Controller labels Jan 6, 2025
@jcsp jcsp requested a review from erikgrinaker January 6, 2025 12:35
@@ -690,7 +690,8 @@ async fn handle_node_list(req: Request<Body>) -> Result<Response<Body>, ApiError
};

let state = get_state(&req);
let nodes = state.service.node_list().await?;
let mut nodes = state.service.node_list().await?;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deterministic order to enable the storcon CLI test assertions to run reliably with multiple nodes

@@ -1742,42 +1760,88 @@ def storcon_cli(args):
storcon_cli(["node-configure", "--node-id", "1", "--availability", "offline"])
assert "Offline" in storcon_cli(["nodes"])[3]

# Restore node, verify status changes in CLI output
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary because we now have two nodes, so making one unavailable kicks off migrations

env.start()

tenant_id = TenantId.generate()
env.create_tenant(tenant_id, placement_policy='{"Attached":1}', shard_count=shard_count)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch to explicit tenant creation because the default tenant in NeonEnv doesn't have secondaries.

@@ -935,7 +935,7 @@ def test_storage_controller_debug_apis(neon_env_builder: NeonEnvBuilder):
that just hits the endpoints to check that they don't bitrot.
"""

neon_env_builder.num_pageservers = 2
neon_env_builder.num_pageservers = 3
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 pageservers because we need some node that contains neither an attached nor secondary location, in order to migrate a secondary there

Copy link

github-actions bot commented Jan 6, 2025

7227 tests run: 6875 passed, 0 failed, 352 skipped (full report)


Flaky tests (2)

Postgres 15

  • test_physical_replication_config_mismatch_max_locks_per_transaction: release-x86-64
  • test_physical_replication_config_mismatch_too_many_known_xids: release-arm64

Code coverage* (full report)

  • functions: 31.2% (8410 of 26967 functions)
  • lines: 47.9% (66772 of 139289 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
39a98f7 at 2025-01-06T13:33:57.566Z :recycle:

@jcsp jcsp marked this pull request as ready for review January 8, 2025 17:54
@jcsp jcsp requested a review from a team as a code owner January 8, 2025 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/controller Component: Storage Controller t/feature Issue type: feature, for new features or requests t/on_call_followup
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant