Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to hash IPv6 addresses in ::ffff:0:0/96 #33

Open
jachris opened this issue Aug 14, 2024 · 3 comments
Open

How to hash IPv6 addresses in ::ffff:0:0/96 #33

jachris opened this issue Aug 14, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request spec-ambiguity Something isn't spelled out in excrutiating detail in the spec

Comments

@jachris
Copy link

jachris commented Aug 14, 2024

The ::ffff:0:0/96 subnet is used to map IPv4 addresses (RFC4291). For example, ::ffff:1.2.3.4 maps to 1.2.3.4. However, hashing the 16-byte IPv6 address and its equivalent 4-byte IPv4 address does not give the same result. This raises the question: Should the IP address ::ffff:1.2.3.4 be hashed as an IPv6 address, or rather as an IPv4 address?


As the spec does not really go into this question, I looked at the implementations and discovered that there is some divergence between them. For example, the Python package (corelight/pycommunityid) says:

$ community-id tcp 1.2.3.4 10.0.0.2 10 20  
1:4eJ9wKNxkQ6vRNvW/B18p7xc090=

$ community-id tcp ::ffff:1.2.3.4 10.0.0.2 10 20 
1:5yNGUOovdeQh3HL08QI6NFXMq6Q=

Meanwhile, the Zeek plugin (corelight/zeek-community-id) derives the IP address type just like Zeek itself, which in turn looks for the ::ffff:0:0/96, and thus treats ::ffff:1.2.3.4 as 1.2.3.4. Therefore, the two implementations give different results for ::ffff:1.2.3.4.

@ckreibich ckreibich self-assigned this Nov 22, 2024
@ckreibich ckreibich added enhancement New feature or request spec-ambiguity Something isn't spelled out in excrutiating detail in the spec labels Nov 22, 2024
@ckreibich
Copy link
Member

This is far too good a question to have been sitting here for this long, apologies @jachris. The short answer is that according to the spec Community ID should follow the headers, not semantics. That is, since 10.0.0.1 comes out of a v4 header and ::ffff:10.0.0.1 out of a v6 one, the hashes should differ. That was by design, to keep that v1 as bare-bones as possible, and it's what I'm seeing in all of the implementations I just tried, other than Zeek.

Zeek is pretty unique in that it just handles all IPv4 as IPv4-mapped IPv6 (i.e., it says is_v4_addr([::ffff:10.0.0.1]) is true). We can fix the Community ID result in Zeek to align with the rest, with some caveats. We'll look into that.

Then there's the question what's conceptually the right thing to do here. Thoughts on this are very welcome. My hunch is that this could be another configurable setting for a v2, because it seems to me that the desired behavior might be site-dependent.

I'm attaching a pcap here that has a single TCP flow over IPv4-mapped v6 addresses, in case it comes in handy for anyone. If you see 1:jFyJvuCrooZ0eMuU1Yi8G2npZiU=, it's Zeek-style treatment as v4, if you see 1:vWzFozHlLRWcyrVJGhtxPY2C7GQ= it's driven by values on the wire. If anyone comes across other implementations that exhibit Zeek's behavior, I'd be very curious to hear about them.

ipv4-mapped-ipv6.pcap.zip

@mavam
Copy link
Contributor

mavam commented Nov 23, 2024

If anyone comes across other implementations that exhibit Zeek's behavior, I'd be very curious to hear about them.

Our implementation at Tenzir works likes Zeek's: look at whether the address is semantically an IPv4, not representationally. This is where the choice must be made. Do you honor IPv4-mapped IPv6 addresses or not?

As you said, you can argue both ways: treating such an address as IPv6 because it comes in 128 bits, or treating it as an IPv4 address because, well, it's tagged as such.

We think the latter approach is closer aligned to user expectations. Zeek too. But it doesn't matter if the rest of world doesn't. (If haven't checked Wireshark, Arkime, and others.)

One way to explicitly control is with a v2. But I think leaving it ambiguous for v1 is dangerous. I would make a call and amend the spec.

Here's a concrete example:

❯ cat ipv4-mapped-ipv6.pcap |
  tenzir --tql2 'read_pcap | this = decapsulate(this) | select ip, community_id | head 1'
{
  "ip": {
    "src": "10.0.0.1",
    "dst": "10.0.0.2",
    "type": 6
  },
  "community_id": "1:jFyJvuCrooZ0eMuU1Yi8G2npZiU="
}

@ckreibich
Copy link
Member

At least pycommunityid, the JavaScript implementation, Wireshark, and Suricata follow the header, not semantics, i.e. a v4 address is never the same as a v6 address. Adding a clarifying statement to the spec that this is the intended behavior for that version seems fine. There should also a data point in the reference baseline so one can verify implementations in this regard. But hoping that they'll all end up agreeing seems optimistic.

I think there's enough meat at this point for a v2 and making this behavior explicit in it is the right way forward. I've long been nervous about even calling the doc laying out v1 a "spec", because it was never formally rigid and has other ambiguities e.g. around endianness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request spec-ambiguity Something isn't spelled out in excrutiating detail in the spec
Projects
None yet
Development

No branches or pull requests

3 participants