Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown process connections due to missing /proc/pid/task parsing #343

Closed
AlexFromChaos opened this issue Feb 9, 2021 · 17 comments
Closed
Milestone

Comments

@AlexFromChaos
Copy link

Currently in some situations (like multi-process Firefox/Chrome instances) OpenSnitch fails to determine process' PID from inode as the connection originating from a child process, whose PID entry is at /proc/<parent_PID>/task/<PID>, not /proc/<PID>.

Because of this, per-process rules cannot be applied to many connections from apps like Firefox/Chrome.

Is there some legitimate reason why we can't do /proc/pid/task parsing?

The comment in code says:

// lookupPidInProc searches for an inode in /proc.
// First it gets the running PIDs and obtains the opened sockets.
// TODO: If the inode is not found, search again in the task/threads
// of every PID (costly).

The question is, does it slow down things so badly? Perhaps allowing user to manually enable it via settings might be a good solution if so.

@themighty1
Copy link
Contributor

@AlexFromChaos, there is no compelling reason not to do it. We will probably not lose more than 5 milliseconds by parsing those.

@themighty1
Copy link
Contributor

oops, I misread, it says "of every PID", meaning every PID on the machine. Yeah, that can run up into 100s of ms.

@themighty1
Copy link
Contributor

Granted, we can optimize it to first lookup in the most-recent-used processes' /proc entries, though.

@gustavo-iniguez-goya
Copy link
Collaborator

I don't know now because we've optimized many aspects of that part of the code, but when I wrote that comment the parsing time was prohibitive.

@AlexFromChaos
Copy link
Author

AlexFromChaos commented Feb 10, 2021

hmm, maybe it won't be so bad considering PID cache.
I don't know Go, but I'll try to do a quick patch for /task parsing and measure average/worst timing for inode->PID lookup and collect some stats. If it will be bearable, I think it can be optionally enabled in settings.

@themighty1
Copy link
Contributor

@AlexFromChaos , yes, thanks, give it a shot.

@gustavo-iniguez-goya
Copy link
Collaborator

gustavo-iniguez-goya commented Feb 11, 2021

it can be optionally enabled in settings.

it would be interesting to have some way of enabling experimental features, like intercepting incoming connections (#283 ).

@AlexFromChaos
Copy link
Author

Looks like it could be something more than threads with their own (unshared) fd table - I still see "unknown process" connections even with /task lookup.

@sak96
Copy link

sak96 commented Mar 4, 2021

@AlexFromChaos

assumptions:
As far as my understanding goes the task file descriptors should be present in thread file descriptors.[1]
Also my understanding is these tasks are threads[2], so you should not find any extra descriptors.

conclusion:
using these understanding i would presume the process which made the connection/packet has closed the file descriptor (or might have died may be). this can be useful if you want to send some data but don't care if the data is actually received by someone. dropping these packets might be wise.

ask:
i am curious to know what the port used are and if the ip is of gateway or your machine or something else.

doubt:
if process closed the connection should it not be deleted from /proc/net/<protocol>. That is to say the inode itself should not be found.

ps:
i might be completely wrong.

@AlexFromChaos
Copy link
Author

AlexFromChaos commented Mar 6, 2021

As far as my understanding goes the task file descriptors should be present in thread file descriptors.[1]
Also my understanding is these tasks are threads[2], so you should not find any extra descriptors.

Threads can have their own descriptor table, either upon creation or later during lifetime via the unshare syscall with CLONE_FILES argument. The assumption was that it may be used as a part of browser sandboxing thing (the issue affects mostly browsers). Looks like it's not the case, according to tests.

conclusion:
using these understanding i would presume the process which made the connection/packet has closed the file descriptor (or might have died may be). this can be useful if you want to send some data but don't care if the data is actually received by someone. dropping these packets might be wise.

I did some brief stracing when it happens and for now can tell that the process which creates such connection doesn't quit - PID remains same and some other network activity continues after the connection. Looks like it happens with async sockets only, so quick socket close is an obvious rootcause candidate. I'll try to debug this issue more to understand the exact reason.

ask:
i am curious to know what the port used are and if the ip is of gateway or your machine or something else.

It happens randomly while browsing some heavy sites (eg. YouTube), IP-addresses for those unknown connections are normally some subdomains of visited sites, the port is https typically.

@gustavo-iniguez-goya
Copy link
Collaborator

I'm debugging this issue, and besides the problem of not intercepting connections of a task, I'm quite sure that there's a bug in the code or some network oddity.

It's easily reproducible using gnome-maps, and zooming in and out the map continously.

In this case, all the connections are opened by the pid 8849, but sometimes the inode is not present under /proc/8849/fd/ nor under /proc/8849/task/<for pid{}>/fd/

This was the missed connection:

2021-03-26 22:38:24]  DBG  new connection tcp => 44132:192.168.1.109 -> 13.224.113.73:443 uid: 1000
[2021-03-26 22:38:24]  DBG  [0/1] outgoing connection: 44132:192.168.1.109 -> 13.224.113.73:443 || netlink response: 44132:192.168.1.109 -> 13.224.113.73:443 inode: 152776842 - loopback: false multicast: false unspecified: false linklocalunicast: false ifaceLocalMulticast: false GlobalUni: true , state: syn_sent
[2021-03-26 22:38:24]  DBG  new pid lookup took (-1): 202.825572ms

right after getting the inode from the kernel, I refreshed and dumped the file descriptors/sockets from /proc/8849/fd/ and the inode 152776842 was not present, nor under any tid of 8849>

enabling the kprobe inet_sock_set_state revealed the pid that originated the connection (srcport == 44132), which is the one that appears thousand times (echo 1 > /sys/kernel/debug/tracing/events/sock/inet_sock_set_state/enable ; cat /sys/kernel/debug/tracing/trace_pipe)

org.gnome.Maps-8849    [002] .... 3155764.381892: inet_sock_set_state: family=AF_INET protocol=IPPROTO_TCP sport=44132 dport=443 saddr=192.168.1.109 daddr=13.224.113.73 saddrv6=::ffff:192.168.1.109 daddrv6=::ffff:13.224.113.73 oldstate=TCP_SYN_SENT newstate=TCP_CLOSE

so there's a mystery yet to be revealed.

@gustavo-iniguez-goya
Copy link
Collaborator

oldstate=TCP_SYN_SENT newstate=TCP_CLOSE

well, the usual order of states is: new syn -> oldstate=TCP_CLOSE newstate=TCP_SYN_SENT, close -> oldstate=TCP_SYN_SENT newstate=TCP_CLOSE

however when we queried the kernel for the connection it dumped the entry with the state syn_sent, so maybe as the kprobe log shows, the file descriptor/inode was already closed when we listed the list of fds of the process.

@gustavo-iniguez-goya
Copy link
Collaborator

many of the "unknown connections" that I used to have are know gone by using new monitor method eBPF.
There're still some connections (src port + src ip + dst ip + dst port) that are not found in kernel, so we need to investigate it. But all in all, it works much better.

@AlexFromChaos
Copy link
Author

AlexFromChaos commented Apr 5, 2021

Probably the most common reason why inode is not present in PID's fd table is a combination of a non-blocking socket + immediately closing the socket without waiting for connect to finish. Here is a strace log for Firefox (PID 1719, other PIDs are filtered):

1719  socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 183
1719  fcntl(183, F_GETFL)               = 0x2 (flags O_RDWR)
1719  fcntl(183, F_SETFL, O_RDWR|O_NONBLOCK) = 0
1719  connect(183, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("95.216.3.34")}, 16 <unfinished ...>
1719  <... connect resumed> )           = -1 EINPROGRESS (Operation now in progress)
1719  poll([{fd=21, events=POLLIN|POLLPRI}, {fd=82, events=POLLIN|POLLPRI}, {fd=128, events=POLLIN|POLLPRI}, {fd=134, events=POLLIN|POLLPRI}, {fd=164, events=POLLIN|POLLPRI}, {fd=143, events=POLLIN|POLLPRI}, {fd=142, events=POLLIN|POLLPRI}, {fd=39, events=POLLIN|POLLPRI}, {fd=108, events=POLLIN|POLLPRI}, {fd=137, events=POLLPRI|POLLOUT}, {fd=183, events=POLLPRI|POLLOUT}], 11, 0) = 1 ([{fd=21, revents=POLLIN}])
1719  read(21,  <unfinished ...>
1719  <... read resumed> "M", 2048)     = 1
1719  close(183 <unfinished ...>
1719  <... close resumed> )             = 0

So it created an O_NONBLOCK socket (fd 183), started connect attempt to 95.216.3.34:443. and then issued a poll for multiple sockets, including that fd 183.

From this list of polled fds poll returned fd 21. Afterwards Firefox started to read data from fd 21 and then immediately closed the fd 183, without waiting for its connect completion. So at least in this case close was called before waiting for connect results, causing to remove socket's inode from the pid 1719' fd table.

@gustavo-iniguez-goya
Copy link
Collaborator

good analysis alex!

That's what we saw a while back: gustavo-iniguez-goya#84 (comment) gustavo-iniguez-goya#10 (comment)

I created a tester to test this case, and now with kprobes we intercept it. However I still see some random missing connections from chrome (although a lot less). I don't know if it'll be related to have it jailed/containerized, besides what you said.

I'll keep an eye on all of this to keep improving it.

@AlexFromChaos
Copy link
Author

hmm, I think in theory it should be possible to use ftrace infrastructure to monitor socket and close syscalls as well.

This will allow to keep a consistent mapping between PIDs and socket fds (likely even to get inode at the first place), removing dead entries on close calls. So whenever GetPIDFromINode tries to get PID from inode it will always have up-to-date data (and probably a fast inode -> pid lookup).

@gustavo-iniguez-goya
Copy link
Collaborator

This problem should be fixed with version v1.4.0 and eBPF as process monitor method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants