-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to hash IPv6 addresses in ::ffff:0:0/96
#33
Comments
This is far too good a question to have been sitting here for this long, apologies @jachris. The short answer is that according to the spec Community ID should follow the headers, not semantics. That is, since 10.0.0.1 comes out of a v4 header and ::ffff:10.0.0.1 out of a v6 one, the hashes should differ. That was by design, to keep that v1 as bare-bones as possible, and it's what I'm seeing in all of the implementations I just tried, other than Zeek. Zeek is pretty unique in that it just handles all IPv4 as IPv4-mapped IPv6 (i.e., it says Then there's the question what's conceptually the right thing to do here. Thoughts on this are very welcome. My hunch is that this could be another configurable setting for a v2, because it seems to me that the desired behavior might be site-dependent. I'm attaching a pcap here that has a single TCP flow over IPv4-mapped v6 addresses, in case it comes in handy for anyone. If you see |
Our implementation at Tenzir works likes Zeek's: look at whether the address is semantically an IPv4, not representationally. This is where the choice must be made. Do you honor IPv4-mapped IPv6 addresses or not? As you said, you can argue both ways: treating such an address as IPv6 because it comes in 128 bits, or treating it as an IPv4 address because, well, it's tagged as such. We think the latter approach is closer aligned to user expectations. Zeek too. But it doesn't matter if the rest of world doesn't. (If haven't checked Wireshark, Arkime, and others.) One way to explicitly control is with a v2. But I think leaving it ambiguous for v1 is dangerous. I would make a call and amend the spec. Here's a concrete example:
|
At least pycommunityid, the JavaScript implementation, Wireshark, and Suricata follow the header, not semantics, i.e. a v4 address is never the same as a v6 address. Adding a clarifying statement to the spec that this is the intended behavior for that version seems fine. There should also a data point in the reference baseline so one can verify implementations in this regard. But hoping that they'll all end up agreeing seems optimistic. I think there's enough meat at this point for a v2 and making this behavior explicit in it is the right way forward. I've long been nervous about even calling the doc laying out v1 a "spec", because it was never formally rigid and has other ambiguities e.g. around endianness. |
The
::ffff:0:0/96
subnet is used to map IPv4 addresses (RFC4291). For example,::ffff:1.2.3.4
maps to1.2.3.4
. However, hashing the 16-byte IPv6 address and its equivalent 4-byte IPv4 address does not give the same result. This raises the question: Should the IP address::ffff:1.2.3.4
be hashed as an IPv6 address, or rather as an IPv4 address?As the spec does not really go into this question, I looked at the implementations and discovered that there is some divergence between them. For example, the Python package (corelight/pycommunityid) says:
Meanwhile, the Zeek plugin (corelight/zeek-community-id) derives the IP address type just like Zeek itself, which in turn looks for the
::ffff:0:0/96
, and thus treats::ffff:1.2.3.4
as1.2.3.4
. Therefore, the two implementations give different results for::ffff:1.2.3.4
.The text was updated successfully, but these errors were encountered: