How to hash IPv6 addresses in `::ffff:0:0/96` #33

jachris · 2024-08-14T09:36:35Z

The ::ffff:0:0/96 subnet is used to map IPv4 addresses (RFC4291). For example, ::ffff:1.2.3.4 maps to 1.2.3.4. However, hashing the 16-byte IPv6 address and its equivalent 4-byte IPv4 address does not give the same result. This raises the question: Should the IP address ::ffff:1.2.3.4 be hashed as an IPv6 address, or rather as an IPv4 address?

As the spec does not really go into this question, I looked at the implementations and discovered that there is some divergence between them. For example, the Python package (corelight/pycommunityid) says:

$ community-id tcp 1.2.3.4 10.0.0.2 10 20  
1:4eJ9wKNxkQ6vRNvW/B18p7xc090=

$ community-id tcp ::ffff:1.2.3.4 10.0.0.2 10 20 
1:5yNGUOovdeQh3HL08QI6NFXMq6Q=

Meanwhile, the Zeek plugin (corelight/zeek-community-id) derives the IP address type just like Zeek itself, which in turn looks for the ::ffff:0:0/96, and thus treats ::ffff:1.2.3.4 as 1.2.3.4. Therefore, the two implementations give different results for ::ffff:1.2.3.4.

The text was updated successfully, but these errors were encountered:

ckreibich · 2024-11-22T20:09:46Z

This is far too good a question to have been sitting here for this long, apologies @jachris. The short answer is that according to the spec Community ID should follow the headers, not semantics. That is, since 10.0.0.1 comes out of a v4 header and ::ffff:10.0.0.1 out of a v6 one, the hashes should differ. That was by design, to keep that v1 as bare-bones as possible, and it's what I'm seeing in all of the implementations I just tried, other than Zeek.

Zeek is pretty unique in that it just handles all IPv4 as IPv4-mapped IPv6 (i.e., it says is_v4_addr([::ffff:10.0.0.1]) is true). We can fix the Community ID result in Zeek to align with the rest, with some caveats. We'll look into that.

Then there's the question what's conceptually the right thing to do here. Thoughts on this are very welcome. My hunch is that this could be another configurable setting for a v2, because it seems to me that the desired behavior might be site-dependent.

I'm attaching a pcap here that has a single TCP flow over IPv4-mapped v6 addresses, in case it comes in handy for anyone. If you see 1:jFyJvuCrooZ0eMuU1Yi8G2npZiU=, it's Zeek-style treatment as v4, if you see 1:vWzFozHlLRWcyrVJGhtxPY2C7GQ= it's driven by values on the wire. If anyone comes across other implementations that exhibit Zeek's behavior, I'd be very curious to hear about them.

ipv4-mapped-ipv6.pcap.zip

mavam · 2024-11-23T08:35:46Z

If anyone comes across other implementations that exhibit Zeek's behavior, I'd be very curious to hear about them.

Our implementation at Tenzir works likes Zeek's: look at whether the address is semantically an IPv4, not representationally. This is where the choice must be made. Do you honor IPv4-mapped IPv6 addresses or not?

As you said, you can argue both ways: treating such an address as IPv6 because it comes in 128 bits, or treating it as an IPv4 address because, well, it's tagged as such.

We think the latter approach is closer aligned to user expectations. Zeek too. But it doesn't matter if the rest of world doesn't. (If haven't checked Wireshark, Arkime, and others.)

One way to explicitly control is with a v2. But I think leaving it ambiguous for v1 is dangerous. I would make a call and amend the spec.

Here's a concrete example:

❯ cat ipv4-mapped-ipv6.pcap |
  tenzir --tql2 'read_pcap | this = decapsulate(this) | select ip, community_id | head 1'
{
  "ip": {
    "src": "10.0.0.1",
    "dst": "10.0.0.2",
    "type": 6
  },
  "community_id": "1:jFyJvuCrooZ0eMuU1Yi8G2npZiU="
}

ckreibich · 2024-11-27T07:48:10Z

At least pycommunityid, the JavaScript implementation, Wireshark, and Suricata follow the header, not semantics, i.e. a v4 address is never the same as a v6 address. Adding a clarifying statement to the spec that this is the intended behavior for that version seems fine. There should also a data point in the reference baseline so one can verify implementations in this regard. But hoping that they'll all end up agreeing seems optimistic.

I think there's enough meat at this point for a v2 and making this behavior explicit in it is the right way forward. I've long been nervous about even calling the doc laying out v1 a "spec", because it was never formally rigid and has other ambiguities e.g. around endianness.

ckreibich self-assigned this Nov 22, 2024

ckreibich added enhancement New feature or request spec-ambiguity Something isn't spelled out in excrutiating detail in the spec labels Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to hash IPv6 addresses in `::ffff:0:0/96` #33

How to hash IPv6 addresses in `::ffff:0:0/96` #33

jachris commented Aug 14, 2024 •

edited

Loading

ckreibich commented Nov 22, 2024

mavam commented Nov 23, 2024 •

edited

Loading

ckreibich commented Nov 27, 2024

How to hash IPv6 addresses in ::ffff:0:0/96 #33

How to hash IPv6 addresses in ::ffff:0:0/96 #33

Comments

jachris commented Aug 14, 2024 • edited Loading

ckreibich commented Nov 22, 2024

mavam commented Nov 23, 2024 • edited Loading

ckreibich commented Nov 27, 2024

How to hash IPv6 addresses in `::ffff:0:0/96` #33

How to hash IPv6 addresses in `::ffff:0:0/96` #33

jachris commented Aug 14, 2024 •

edited

Loading

mavam commented Nov 23, 2024 •

edited

Loading