Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve cluster stability #93

Open
jeffpierce opened this issue Nov 4, 2015 · 9 comments
Open

Improve cluster stability #93

jeffpierce opened this issue Nov 4, 2015 · 9 comments
Assignees

Comments

@jeffpierce
Copy link
Owner

Right now, if we lose a node, we lose the stats that node handles the buckets for until it comes back up. Let's make it more robust so that we can lose some of a cluster and still retain stats.

@mredivo
Copy link
Collaborator

mredivo commented Nov 9, 2015

Design Brief

  • Introduce the concept of "my paths" vs "guest paths" in the internal data representation
  • Note when forwarding a path to its home server fails
  • On failure to forward a path to its home failure, notify all remaining servers of the failure for synchronization purposes
  • Recalculate the hash index based on the number of remaining servers, to determine which server will guest host each path until the dead server comes back up
  • When the dead server comes up, broadcast that information to the set
  • On receipt of the broadcast, all servers with guest rollups will transfer them to the restored server

That's the rough sketch; details will be worked out when implementing.

@jeffpierce
Copy link
Owner Author

@mredivo #100 addresses all of this, and this can be closed, right?

@mredivo
Copy link
Collaborator

mredivo commented Nov 25, 2015

The preliminary work is complete with #100; now the implementation can begin.

@jeffpierce
Copy link
Owner Author

Gotcha.

@jeffpierce
Copy link
Owner Author

@mredivo: What's left as far as implementation is concerned? Will you be able to get to this over the break, or should I take a stab at it?

@mredivo
Copy link
Collaborator

mredivo commented Dec 17, 2015

Go ahead and give it a shot; feel free to pry me for details. It will be a few weeks before my plate isn't completely full again.

The design brief a few comments up covers the general idea.

@jeffpierce
Copy link
Owner Author

@mredivo Question...

What if we used a gossip implementation instead? Seems like that could solve quite a few problems.

@jeffpierce
Copy link
Owner Author

jeffpierce commented Apr 22, 2016

There's Serf as well, which seems to be a more robust implementation of memberlist. It's a standalone agent, though.

@mredivo
Copy link
Collaborator

mredivo commented Apr 22, 2016

Looks worth studying, will take a detailed look later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants