Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WAN interface Losing IPv6 connectivity after 30-60 seconds since 25.1.1 upgrade (stateless NDP?) #242

Open
2 tasks done
funtowne opened this issue Feb 14, 2025 · 8 comments
Assignees
Labels
upstream Third party issue

Comments

@funtowne
Copy link

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

Zyxel 5G modem, IP Passthru mode (for v4); SLAAC for IPv6. Both to Opnsense WAN

On 24.7.x, I was able to have a stable IPv6 connection from the WAN interface of my Opnsense VM, which was pulling the V6 address from the Zyxel 5G modem via SLAAC. I would then NAT this connection via NAT66 to my LAN interfaces, each of which assigned a static ULA /64 range. I know NAT66 is naughty, but hey it works.

Since upgrading to 25.1 (and 25.1.1), I have been unable to have a stable v6 connection on the WAN side for more than a few pings. Running the command ndp -nc on the opnsense VM restores IPv6 via the WAN interface for a few seconds, but v6 pings fail again after a few seconds... usually about 5-8 seconds after running the command.

Tested versions:

24.7.12_2-amd64 -- WAN_SLAAC is only configured with a monitoring IP, no other configuration; IPv6 tunables are defaults. WAN IPv6 works as expected on opnsense.

25.1 and 25.1.1-amd64 -- same configuration would not respond to neighbor discoveries on WAN interface - a first discovery would work, but subsequent would appear to fail... If I am understanding the packet dump correctly. Configuring WAN_SLAAC with the gateway by hand and setting net.inet6.icmp6.nd6_onlink_ns_rfc4861 to 1 and rebooting fixed IPv6 - the connection stays active.

Per a reddit thread with Franco, it looks like there's need for a "Stateless ICMP ND" patch to prevent pf from interfering with this particular setup. I'm opening this bug report on his request (THANK YOU!)

To Reproduce

Steps to reproduce the behavior:

1.) Configure WAN to SLAAC
2.) Attempt to use any IPv6, connectivity to the public internet will always time out. Default gateway can be pinged by hand, however.

Expected behavior

As with 24.7.12, configuring WAN to SLAAC should keep a stable IPv6 connection without any additional manual intervention like setting a static gateway or other sysctls.

Describe alternatives you considered

N/A, I went a bit bananas getting this far!

Screenshots

Screenshots.zip

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 24.7.12_2 and 25.1 and 25.1.1
Intel J3455-based Mini PC (Compulab Fitlet 2)
Proxmox Hypervisor at latest patchset, Opnsense as a VM
2x VirtIO network

interfaces, 4 VLANs LAN-side
Zyxel 5G modem; O2 Germany SIM Card

@fichtner fichtner self-assigned this Feb 14, 2025
@fichtner fichtner added the upstream Third party issue label Feb 14, 2025
@fichtner
Copy link
Member

fichtner commented Feb 14, 2025

As discussed I chased the test patch from a while back to verify this is a kernel issue. It's ee7b012c54ae04 and I just need to build a kernel on top of the stable/25.1 branch to provide a matching test kernel. Not sure if that will happen today, but sharing the plan seems like a good thing to do. :)

@fichtner
Copy link
Member

fichtner commented Feb 14, 2025

Ok here is the test kernel:

# opnsense-update -zkr 25.1.1-nd

(needs a reboot to activate, if you want to go back just do opnsense-update -k)

Cheers,
Franco

@funtowne
Copy link
Author

# opnsense-patch -zkr 25.1.1-nd

I think you meant opnsense-update -zkr 25.1.1-nd ;) - I got it sorted! Thank you for the absurdly fast turnaround of what is likely not the most important issue.

I tested as follows:

1.) Reset net.inet6.icmp6.nd6_onlink_ns_rfc4861 to 0
2.) Deleted the manually-configured IPv6 Gateway (a Link-Local IP) on gateway configuration WAN_SLAAC (saved settings)
3.) Installed the Kernel patch above after a quick mirror swap
4.) Rebooted

Outcome:

IPv6 on a SLAAC WAN interface is behaving as expected: The IPv6 connection is stable after reboot and is not timing out after 30 odd seconds (or less). Tested with a simple ping6 google.com which was enough to trigger the failure state after the 25.1 and 25.1.1 updates.

It therefore appears that NDP is working as expected to maintain the connection without the manual fixes in place that I mentioned in my report. Clients configured with a ULA utilizing NAT66 on the firewall to reach the public IPv6 space can do so without issue; the firewall itself has stable access to the public IPv6 internet via the Zyxel 5G modem. Eg: Behavior is as it was in Opnsense 24.7.12_2.

@fichtner
Copy link
Member

@funtowne long day, sorry... thanks for that. Let me think about how to proceed. The easiest steps are either making the ICMPv6 requirements rules stateless or adding this patch as an adjustable sysctl. The better way forward would be debugging the state tracking but that will take some time so an interim solution would be nice.

Cheers,
Franco

@funtowne
Copy link
Author

No need to apologize. I wasn’t expecting a patch so fast!

It is an upstream change that you’d be fighting potentially indefinitely, no? SLAAC on WAN is probably pretty uncommon, given the intent of IPv6 addressing. Wouldn't some documentation suffice, or a flag in the code for something like:

`if WAN IPv6 = SLAAC

then

workaround()`?

@funtowne
Copy link
Author

Crap, fat fingered the GitHub UI on mobile. I didn’t mean to close this!

@fichtner fichtner reopened this Feb 14, 2025
@meyergru
Copy link

meyergru commented Feb 17, 2025

Just asking because I have been there and done that: Did you use "bridge-mcsnoop 0" on the WAN's bridge interface or did "echo -n 0 > /sys/class/net//bridge/multicast_snooping"?

I found out when I thought I had found a sure-fire way to make these ND problems reproducable via "ndisc6 -m -n -r 1 fe80::xxxxx eth0" from a Linux client, see: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281395 and https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281397

If you have not set that specific parameter, your findings are probably worthless because of a well-known bug in Linux:

https://forum.proxmox.com/threads/ipv6-neighbor-solicitation-not-forwarded-to-vm.96758/

That one shows as shortly after booting the VM, the neighbor discoveries will pass, but later on, they will get supressed.

@funtowne
Copy link
Author

Hi @meyergru -- I went quite deep down that rabbit hole. Good to call it out, though!

Here's my /etc/network/interfaces for the relevant bridge that connects to "WAN":

iface vmbr1 inet manual
bridge-ports enp2s0
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
bridge-mcsnoop 0

In short, only either the fix released by Franco --or-- my workaround would work regardless of the multicast snooping or other similar settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream Third party issue
Development

No branches or pull requests

3 participants