Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10.2.1 mgmtd fails if you have RPKI configured #17794

Closed
2 tasks done
f0o opened this issue Jan 8, 2025 · 8 comments
Closed
2 tasks done

10.2.1 mgmtd fails if you have RPKI configured #17794

f0o opened this issue Jan 8, 2025 · 8 comments
Assignees
Labels
bgp triage Needs further investigation

Comments

@f0o
Copy link

f0o commented Jan 8, 2025

Description

FRR crash loops with:

Jan 08 09:28:03 rt1.sto1.se.as203038.net frrinit.sh[1831151]: [1831151|mgmtd] Configuration file[/etc/frr/frr.conf] processing failure: 2
Jan 08 09:28:03 rt1.sto1.se.as203038.net frrinit.sh[1831158]: line 1167: % Unknown command[87]:  rpki cache 127.0.0.1 3323 preference 1
Jan 08 09:28:03 rt1.sto1.se.as203038.net frrinit.sh[1831158]: line 1168: % Unknown command[87]:  rpki cache 10.x.x.x 3323 preference 255

Version

FRRouting 10.2.1 (rt1.sto1.se.as203038.net) on Linux(5.15.0-127-generic).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--sbindir=/usr/lib/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--enable-rpki' '--disable-scripting' '--enable-pim6d' '--disable-grpc' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_64-linux-gnu' 'PYTHON=python3'

How to reproduce

Simply configure RPKI like it was working since 9.x

Expected behavior

Not to crashloop

Actual behavior

Crashloop

Additional context

No response

Checklist

  • I have searched the open issues for this bug.
  • I have not included sensitive information in this report.
@f0o f0o added the triage Needs further investigation label Jan 8, 2025
@f0o
Copy link
Author

f0o commented Jan 8, 2025

A solution seems to downgrade to 10.1.1

Edit; 10.1.2 also works

@ton31337 ton31337 self-assigned this Jan 8, 2025
@ton31337 ton31337 added the bgp label Jan 8, 2025
@f0o
Copy link
Author

f0o commented Jan 8, 2025

Hrm getting a lot of vtysh: error reading from bgpd: Connection reset by peer (104)Warning: closing connection to bgpd because of an I/O error! on 10.1.2

Not sure if this is related. I got 10.1.1 on rt2 which is stable but it seems that updating rt1 from 10.0 to 10.2 and then downgrade to 10.1 messed things up

@ton31337
Copy link
Member

ton31337 commented Jan 8, 2025

Can we get the full log (debug) and configurations?

@f0o
Copy link
Author

f0o commented Jan 8, 2025

Unrelated or perhaps related; Did 10.1.2 change VRFs? I bgpd claims vrf-id is -1 on all vrfs. That shouldnt be the case...

I'm not using NETNS based VRFs.

@f0o
Copy link
Author

f0o commented Jan 8, 2025

Can we get the full log (debug) and configurations?

Yes once we find a reliable way to get this back into a running state we can break if again and slack you the logs/configs/anything :)

@f0o
Copy link
Author

f0o commented Jan 8, 2025

@ton31337 I noticed that 10.1.2 changed the RPKI config lines to be:

 rpki cache tcp 127.0.0.1 3323 preference 1

Noting that addition of tcp which wasnt there before. Could that have caused the mgmtd issues?

@ton31337
Copy link
Member

ton31337 commented Jan 8, 2025

Yes, the syntax is changed.

@f0o
Copy link
Author

f0o commented Jan 8, 2025

ok after a few further tests it seems that the crashloop is caused by the missing tcp in the rpki cache config line.

So the jump from 10.0 to 10.2 broke it; going 10.0.x to 10.1.x and then 10.2.x doing do wr mem after each step fixes/rewrites the config.

@f0o f0o closed this as completed Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bgp triage Needs further investigation
Projects
None yet
Development

No branches or pull requests

2 participants