Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Since 23.10.0, nimbus-eth2 won't send registrations for validators attached to a validator client anymore #5599

Closed
stephanep opened this issue Nov 14, 2023 · 7 comments

Comments

@stephanep
Copy link
Contributor

Describe the bug
Potential regression introduced in 23.10.0 (likely by #5431).
When using validators on a remote nimbus_validator_client, and with no validator directly attached to nimbus_beacon_node, registration for those validators are not sent by nimbus_beacon_node to mev-boost anymore, after upgrading from 23.9.1 to 23.10.0.

To Reproduce
I have a solo staking setup comprised of an nimbus_validator_client with a few validators, attached to two nimbus_beacon_node.
Each nimbus_beacon_node connects to a local mevboost instance (latest v1.16 from flashbots).
I've configured MEV as described in https://nimbus.guide/external-block-builder.html.

In particular:

  • i DO NOT have any validator directly attached to the beacon node;
  • and i'm NOT using multiple per-validator builder preferences.

Said otherwise, i run my nimbus_validator_client with --payload-builder=true, my nimbus_beacon_node with --payload-builder=true --payload-builder-url=http://127.0.0.1:18550 and that's about it.

Everything was working fine when i was using 23.9.1 and earlier releases, the registrations for all validators were successfully sent to mevboost and taken into account by the configured relays.

I then upgraded the beacon nodes and validator client to 23.10.0, and following this, added a couple of new validators to nimbus_validator_client.

After investigating why i failed to use MEV for a block proposed by one of the new validators, i noticed that those had never been registered successfully, contrary to the the older ones (added when i was still running 23.9.1 or earlier).

I am certain (from debug logs and traffic analysis) that nimbus_validator_client correctly builds and sends the registration messages to the /eth/v1/validator/register_validator endpoint of the beacon node API, that the POST request is correctly received by the beacon node and replied with 200 OK.

DBG 2023-11-14 02:58:35.033+01:00 Opened connection to remote server         remote=X.X.X.X:XXXX request=/eth/v1/validator/register_validator http_method=POST
DBG 2023-11-14 02:58:35.042+01:00 REST request body has been sent            remote=X.X.X.X:XXXX request=/eth/v1/validator/register_validator size=XXXX http_method=POST
DBG 2023-11-14 02:58:35.104+01:00 Got REST response headers from remote server remote=X.X.X.X:XXXX request=/eth/v1/validator/register_validator status=200 http_method=POST
DBG 2023-11-14 02:58:35.108+01:00 Received REST response body from remote server remote=X.X.X.X:XXXX request=/eth/v1/validator/register_validator contentType=text/plain size=0


POST /eth/v1/validator/register_validator HTTP/1.1
Accept: application/json
Content-Length: XXXX
Content-Type: application/json
Host: X.X.X.X
Connection: close
User-Agent: nim-presto/0.0.3 (amd64/linux)

[{"message":{"fee_recipient":"0xXX...","gas_limit":"30000000","timestamp":"1699927392","pubkey":"0xXX..."},"signature":"0xXX..."},{"message":...

HTTP/1.1 200 OK
Server: nim-presto/0.0.3 (amd64/linux)
Content-Length: 0
Content-Type: text/plain
Date: Tue, 14 Nov 2023 02:03:13 GMT
Connection: close

I also obtain the same response when sending the captured request (with one correct registration message for each of my validators) directly to the beacon node using curl.

I'm also certain (proven by mevboost logs and loopback traffic capture) that nimbus_beacon_node 23.10.0 DOES NOT send any registration message to mevboost.

Workaround

I downgraded one of my beacon nodes from 23.10.0 to 23.9.1 (without changing anything in the config). As soon as it finished restarting and that nimbus_validator_client had connected to it, it sent the registration messages. With version 23.9.1 of nimbus_beacon_node, those messages were successfully passed to mevboost and all my validators were successfully registered with all relays.

After confirming successful registration, i put back 23.10.0.

Additional context

The effect is not visible immediately because validators that were previously registered with the relays can remains so for a long time.
The issue is also present in 23.10.1.

After a brief review of the changes to the code, i'm suspecting pull request #5431 to be related with this.
More specifically:

  • at rest_validator_api.nim#L972 the signed registration messages sent by the validator client are correctly received and stored into node.externalBuilderRegistrations;
  • a search over the repo shows that the only place where those stored registration messages are processed is /beacon_validators.nim#L1612, inside registerValidatorsPerBuilder;
  • registerValidatorsPerBuilder is itself called from registerValidators at beacon_validators.nim#L1708, but if node.attachedValidators is empty, builderKeys remains empty as well and registerValidatorsPerBuilder is never called.
@stephanep stephanep changed the title Since 23.10.0, nimbus-eth2 won't send registration for validators attached to a validator client anymore Since 23.10.0, nimbus-eth2 won't send registrations for validators attached to a validator client anymore Nov 14, 2023
@stephanep
Copy link
Contributor Author

Had a closer look at the code today. I was able to fix the issue in my use case with the attached patch. It simply adds the pubkeys stored in externalBuilderRegistrations with the configured payload-builder-url to the builderKeys table (in addition to the locally attached validators) before the loop that calls registerValidatorsPerBuilder.

I could verify that with this in place, registration is working again, with mevboost receiving the expected messages once per epoch.

Nov 14 22:10:58 gvasrveth.intra.lan mev-boost[1624255]: time="2023-11-14T22:10:58.027+01:00" level=info msg="http: GET /eth/v1/builder/status 200" duration=0.000085 method=GET path=/eth/v1/builder/status status=200 version=v1.6
Nov 14 22:10:58 gvasrveth.intra.lan mev-boost[1624255]: time="2023-11-14T22:10:58.298+01:00" level=info msg="http: POST /eth/v1/builder/validators 200" duration=0.243075 method=POST path=/eth/v1/builder/validators status=200 version=v1.6
Nov 14 22:17:22 gvasrveth.intra.lan mev-boost[1624255]: time="2023-11-14T22:17:22.025+01:00" level=info msg="http: GET /eth/v1/builder/status 200" duration=0.000102 method=GET path=/eth/v1/builder/status status=200 version=v1.6
Nov 14 22:17:22 gvasrveth.intra.lan mev-boost[1624255]: time="2023-11-14T22:17:22.296+01:00" level=info msg="http: POST /eth/v1/builder/validators 200" duration=0.239043 method=POST path=/eth/v1/builder/validators status=200 version=v1.6
Nov 14 22:23:46 gvasrveth.intra.lan mev-boost[1624255]: time="2023-11-14T22:23:46.02+01:00" level=info msg="http: GET /eth/v1/builder/status 200" duration=0.000080 method=GET path=/eth/v1/builder/status status=200 version=v1.6
Nov 14 22:23:46 gvasrveth.intra.lan mev-boost[1624255]: time="2023-11-14T22:23:46.253+01:00" level=info msg="http: POST /eth/v1/builder/validators 200" duration=0.207774 method=POST path=/eth/v1/builder/validators status=200 version=v1.6
...

If you can please have a look in case it breaks something else or if there is a better way to address this. Please let me know what you think of it, and if you need me to fill a PR.

nimbus-fix-register-vc-validators.patch

@tersec
Copy link
Contributor

tersec commented Nov 15, 2023

You're correct, indeed, about the diagnosis of this issue. The refactoring linked missed VC-based validators, and your patch is a good way to add them back. Feel free to make a PR from it. Good to have split out the if foo in builderKeys PRs should target the unstable branch into the template, as you did.

Also, the attachedValidatorPubkeys parameter of

proc registerValidatorsPerBuilder(
    node: BeaconNode, payloadBuilderAddress: string, epoch: Epoch,
    attachedValidatorPubkeys: seq[ValidatorPubKey]) {.async.} =

is inaccurate/too specific/narrow after your patch; suggested name is something like just validatorPubkeys, because it's not only attached validator pubkeys anymore (and, then, registerValidatorsPerBuilder figures out for each one whether it is, in fact, attached.

@stephanep
Copy link
Contributor Author

Thanks, will do later today.

@stephanep
Copy link
Contributor Author

I found my previous patch still had certain cases uncovered and would introduce new issues, so i figured out a better way to address this:

  • If we have both VC and attached validators, we do not want the VC to be registered to all builders, but only to the one specified by payload-builder-url, so we should consider VC validators only when handling that builder's URL.
  • Adding VC pubkeys to the builderKeys table as proposed in my patch would cause erroneous registerValidators: same validator registered by beacon node and validator client warnings.
  • So eventually, i would just ensure that if a payload-builder-url is specified, it gets added to the table with an empty pubkey sequence before iterating over the locally attached validators, by doing so we make sure that registerValidatorsPerBuilder is called at least once with that builder's URL.

Another case that is not covered would be when we want to specify per VC validator builder URLs (instead of using the value of --payload-builder-url), which would be done at the VC keymanager level, but that would require adding builder URLs to the message sent to the /eth/v1/validator/register_validator route.

Pull request: #5603

@tersec
Copy link
Contributor

tersec commented Nov 16, 2023

I found my previous patch still had certain cases uncovered and would introduce new issues, so i figured out a better way to address this:

  • If we have both VC and attached validators, we do not want the VC to be registered to all builders, but only to the one specified by payload-builder-url, so we should consider VC validators only when handling that builder's URL.
  • Adding VC pubkeys to the builderKeys table as proposed in my patch would cause erroneous registerValidators: same validator registered by beacon node and validator client warnings.
  • So eventually, i would just ensure that if a payload-builder-url is specified, it gets added to the table with an empty pubkey sequence before iterating over the locally attached validators, by doing so we make sure that registerValidatorsPerBuilder is called at least once with that builder's URL.

Another case that is not covered would be when we want to specify per VC validator builder URLs (instead of using the value of --payload-builder-url), which would be done at the VC keymanager level, but that would require adding builder URLs to the message sent to the /eth/v1/validator/register_validator route.

Pull request: #5603

Specifying per-VC validator builder URLs is probably useful, but the rest of the infrastructure for that isn't there, so it'd be unused and untested code. ValidatorRegistration and that REST route, yeah, do not specify the builder API anywhere, so it's kind of a strangely fragmented set of configuration but it's what's currently specified.

There's also not really a mechanism currently to specify that the BN doesn't know/own the key pair of some VC/external/REST API validator, but does want to specify the route, which is another approach (still fragmented, but, would at least allow this to work).

So:

If we have both VC and attached validators, we do not want the VC to be registered to all builders, but only to the one specified by payload-builder-url, so we should consider VC validators only when handling that builder's URL.

isn't really true in a platonic ideal case, but practically, the most reasonable thing is to register all VC validators under the BN-default external payload builder.

It'd probably be a useful spec change, but for the moment, going by https://github.com/ethereum/beacon-APIs/blob/master/types/registration.yaml and https://github.com/ethereum/beacon-APIs/blob/master/apis/validator/register_validator.yaml is what should happen.

Shorter term, it might make sense to clarify, to the extent the VC pretends to process the --payload-builder-url, that it has no way of communicating this to the BN.

@stephanep
Copy link
Contributor Author

stephanep commented Nov 16, 2023

Yes this completely makes sense, i was thinking what could be needed in the future but that's seems the best thing to do for now. I don't think nimbus_validator_client even accepts a --payload-builder-url cmdline argument, but it does have a keymanager API that may let the user specify per-validator builders (though it would presently have no effect, as you explained).

One practical use case i can think of: let's say i'm sharing a nimbus_beacon_node with a friend. Each of us runs their own nimbus_validator_client because we want to retain full control on our respective validators' keys. But let's also say my friend absolutly wants to use only OFAC compliant builders, while i absolutly want to only use the non-censoring ones. Having a way to override builder sets at the VC level would be useful in that case.

Might be nice to have, but definitely beyond the scope of this 'fix' issue and related PR.

So i will add a comment pointing to this discussion to the proposed change in the PR as you suggested.

tersec pushed a commit that referenced this issue Nov 17, 2023
…validator registration (issue #5599). (#5603)

The proposed change ensures that VC validators are registered with the builder specified by the `--payload-builder-url` argument even if the beacon node has no attached validators. It also prevent such validators from being unintentionally registered with builders configured for specific attached validators by the keymanager api.
Per the current specs, the VC have no way to specify which builders the BN should use on a per-node basis, so for the time being we have to resort to using the BN fallback default builder URL for VC validators.
@tersec
Copy link
Contributor

tersec commented Nov 28, 2023

#5603

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants