Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about number of slots #26

Open
iddq opened this issue Oct 12, 2024 · 20 comments
Open

question about number of slots #26

iddq opened this issue Oct 12, 2024 · 20 comments

Comments

@iddq
Copy link

iddq commented Oct 12, 2024

Can I use -F option on live system? How many slots we have? I don't find the definitions regarding to slot numbers. THX

@iddq
Copy link
Author

iddq commented Oct 12, 2024

My current assumption is that the number of slots is not limited but depends on the number of running processes or threads. Furthermore, I assume that in the case of -F, only one thread is working, which is why an error message appeared in the log stating that no available slots were free. That's why the above questions arose.

@iddq
Copy link
Author

iddq commented Oct 12, 2024

I got the following error message
Dropping client because there are no available slots.
It is possible that this issue is completely unrelated to the "-F" option.

@cottsay
Copy link
Owner

cottsay commented Oct 12, 2024

Hi there, thanks for your interest.

The -F option only controls the behavior of the process, and doesn't affect the behavior of the proxy at all. It simply controls whether the process is forked to the background or remains active in the foreground. The same number of threads and slots are created in either case. Foreground mode is mainly used for troubleshooting, testing, and temporarily running the proxy, but if you intend the proxy to run long-term, you should use your platform's native daemon management system (i.e. systemd, Windows services, system V, etc).

Slots should be allocated for each interface on the system that has a distinct route to a public IP address. This proxy does not act like a Echo Link relay - only a proxy. You still need a free public IP address with properly forwarded ports for each slot. Unless specifically configured to do so with the AdditionalExternalBindAddresses option, there will be only a single slot on the proxy.

Configuring more than one slot on a Linux system is not trivial. It involves creating new routing tables for each publicly routeable IP address and listing the local IP for each of those interfaces in AdditionalExternalBindAddresses. You shouldn't need to mess with the routing tables to do the same thing on Windows, but I haven't tried for many years.

@iddq
Copy link
Author

iddq commented Oct 12, 2024

but what could be the reason that I get the following message just after connecting to port 8100, e.g. telnet 127.0.0.1 8100?

Oct 12 21:22:33 : Starting a processing run...
Oct 12 21:22:33 : Waiting for a client...
Oct 12 21:24:16 : Incoming connection from [::ffff:127.0.0.1]:11905.
Oct 12 21:24:16 : Dropping client because there are no available slots.
Oct 12 21:24:16 : Starting a processing run...
Oct 12 21:24:16 : Waiting for a client...

@cottsay
Copy link
Owner

cottsay commented Oct 12, 2024

This is the message I'd expect to see if the proxy was already in use, and there is a separate connection to 8100 active. This should be the same behavior as the official Java client if you tried to connect to a proxy that's already in use though I don't think it prints any messages to the log like OpenELP does.

If you're seeing this immediately after startup without having connected to the proxy at all, then I'd like to hear more about your system and configuration before making a recommendation.

@iddq
Copy link
Author

iddq commented Oct 12, 2024

So should I take this to mean that the proxy can't handle multiple connections at once?

@cottsay
Copy link
Owner

cottsay commented Oct 12, 2024

Unless your system has multiple public IP addresses and you configure the proxy to use them, it's not possible to create an EchoLink proxy that can handle more than one connection.

There is a concept known as a "relay" that takes multiple clients on a single public IP, but the trade off is that those clients can't accept connections from other users - it only supports outgoing connections from proxy clients.

@iddq
Copy link
Author

iddq commented Oct 12, 2024

Thank you for the explanation, I think I'm starting to understand. So if I understand correctly, the Echolink protocol does not send any Session ID within the message, and the port is always the same, so there is no way to separate the sessions, and thus, in the case of an incoming connection, it is not possible to know which client would be the recipient. Do I understand correctly?

@cottsay
Copy link
Owner

cottsay commented Oct 12, 2024

That's a pretty good summary, yes.

@iddq
Copy link
Author

iddq commented Oct 12, 2024

What do you think, would it be possible to allow the Authorization phase instead of immediately rejecting the new connection from the client, and if a client with the same callsign as the currently connected one wants to connect, allow it, and the new connection would take the place of the old one? This way, if the client reconnects, even due to a restart, it wouldn’t be rejected until the proxy-side timeout expires.

@cottsay
Copy link
Owner

cottsay commented Oct 12, 2024

Interesting idea.

Here are couple of things to consider:

  1. I'm not sure that the EchoLink client would act appropriately if the authentication phase were started and then dropped. It might tell the user that they're not authorized, or it might consider it to be a dropped connection and just attempt to reconnect. We'd have to give it a try.

  2. At a high level, the initial authentication phase for the proxy needs two things from the client:
    a. The user's callsign
    b. A password

    There is room for innovation here, because all of the EchoLink proxy implementations I'm aware of (including OpenELP) check these two values independently, so there can be a restriction on what callsigns can connect and a restriction on what password is accepted, but not callsign/password pairs.
    Also worth noting is that the proxy doesn't authorize the callsign against the EchoLink servers, so it's not very useful for determining with any level of certainty who an incoming connection is from. For example, though the client won't let you do this, it would theoretically be possible to connect to the proxy with one callsign and register with the EchoLink servers with a different one.

There is probably room for a configuration flag to indicate that a proxy effectively intended for a single user, so we should always attempt to authorize incoming connections and any successful authorization would immediately grab a slot, but my initial thoughts would be that this behavior would probably have too many adverse side effects to implement as a default behavior, especially for a public proxy.

@cottsay
Copy link
Owner

cottsay commented Oct 12, 2024

...so [the callsign] not very useful for determining with any level of certainty who an incoming connection is from...

An additional note on this. OpenELP does in fact use the callsign on an incoming connection to check if there was another recent connection from that same callsign on a free slot, and assigns the same slot to that user again. The public IP of the slot gets registered with the EchoLink servers with that callsign and if a proxy client reconnects, it would be better if the address that was already registered and distributed to other EchoLink users was still correct. For this reason, slots are allocated based on how long they've been idle so that there is more time for a user to reconnect and claim the same slot, or there is more time for other EchoLink users to refresh their catalog and drop the record for the callsign that has disconnected before someone new takes the slot (so you could say that the free slots are in a First In First Out queue).

@iddq
Copy link
Author

iddq commented Oct 13, 2024

Currently, I am experiencing the issue, at least on Linux, that anyone can simply open a TCP connection to port 8100, and even if they don't send any data, this alone prevents others from connecting to the proxy. This is because after the connection, they are immediately rejected with a message indicating that there are no free slots.

@iddq
Copy link
Author

iddq commented Oct 16, 2024

A bigger problem is that the Android client can also cause this situation. For some reason, it initiates multiple connections, and if the first one isn’t closed before the second one reaches its destination, it can’t connect to the proxy. I must note that the entire network handling of the Android client is terrible.

@cottsay
Copy link
Owner

cottsay commented Jan 6, 2025

A good portion of this sounds pretty inherent to the protocol. How much of this behavior can you reproduce with the official (Java) proxy?

anyone can simply open a TCP connection to port 8100, and even if they don't send any data, this alone prevents others from connecting to the proxy

I thought I remembered adding a timeout to the initial handshake, but I'm not seeing anything of the sort. It might be a good feature to have, please feel free to open a new issue specifically requesting that feature or open a PR to add it yourself.

A somewhat orthogonal way that this scenario is mitigated in multi-slot deployments is that OpenELP performs the authorization in the allocated slot, so a single inactive connection can't prevent other available slots from processing new clients.

@iddq
Copy link
Author

iddq commented Jan 6, 2025

Excuse me, but I don't understand what a multi-slot deployment is. Do I need to modify the configuration?

@iddq
Copy link
Author

iddq commented Jan 7, 2025

I would like to clarify the bug report a bit. If there is a TCP connection to the proxy and no traffic has occurred from the client side (thus the client hasn't been authenticated), this prevents another client from connecting to the proxy. More precisely, it's not the TCP connection itself that's blocked (because it succeeds), but there’s no communication initiated—the proxy doesn't send the initial hash to the client. I hope this is clear. I believe the problem occurs here, the priv->idle_workers_head is NULL:

Jan 07 00:56:59 : Incoming connection from [::ffff:127.0.0.1]:51394.
977             mutex_lock_shared(&priv->usable_clients_mutex);
(gdb) n
978             mutex_lock(&priv->idle_workers_mutex);
(gdb) n
979             if (priv->usable_clients > 0 && priv->idle_workers_head != NULL) {
(gdb) p priv->usable_clients
$1 = 1
(gdb) n
983             mutex_unlock(&priv->idle_workers_mutex);
(gdb) p priv->idle_workers_head
$2 = (struct proxy_worker *) 0x0
(gdb) l 979
974             proxy_log(ph, LOG_LEVEL_DEBUG, "Incoming connection from %s.\n",
975                       remote_addr);
976

@iddq
Copy link
Author

iddq commented Jan 7, 2025

Additionally, it is also interesting that although the message 'Dropping client because there are no available slots.' appears, this second TCP connection is not dropped but just waits and goes into CLOSE_WAIT state.

It could also be a problem that this single 'empty' TCP connection can block the client's termination processes, and the proxy will not exit until the client disconnects.

^CJan 07 01:23:50 : Caught signal
Jan 07 01:23:50 : Proxy shutdown requested.
Jan 07 01:23:50 : Sending update to registrar (1/0)
Jan 07 01:23:50 : Shutting down...
Jan 07 01:23:50 : Proxy shutdown requested.
Jan 07 01:23:50 : Sending update to registrar (1/0)
Jan 07 01:23:50 : Dropping all clients...
Jan 07 01:23:50 : Closing client connections...

<---- it waits until client disconnects

Jan 07 01:25:51 : Connection to client was lost before authorization could complete
Jan 07 01:25:51 : Closing listening connection... 
Jan 07 01:25:51 : Proxy is down - closing log.

@cottsay
Copy link
Owner

cottsay commented Jan 7, 2025

I don't understand what a multi-slot deployment is.

This is all about how many unique public IP addresses OpenELP is configured to use, and really only applies to multi-homed servers. Unless specifically configured with additional publicly-routable addresses to bind to, OpenELP behaves the same as the official Java proxy and uses only the system's default route, meaning that it has only a single slot to which clients can be assigned.

If there is a TCP connection to the proxy and no traffic has occurred from the client side (thus the client hasn't been authenticated), this prevents another client from connecting to the proxy.

At the moment, this is by-design.

Your proxy is configured with a single slot, meaning that it can only handle a connection to a single client. It is theoretically possible to allow multiple authorization attempts to begin and reject the "extra" clients (via a closed connection) only once one of the clients completed the authorization and starts using the slot. My main concern is that this is not how the official Java proxy behaves, and that the connecting EchoLink client won't behave how we want it to if the proxy connection were to suddenly close during the authorization process. I'm not sure what it would tell the user (we could conduct and experiment and find out), but it likely wouldn't indicate that the proxy is in use.

All that said, we could probably find a way to prevent a connection from intentionally or unintentionally tying up the proxy indefinitely. One possible path forward is to timebox the authorization process and close a client connection if it isn't able to make progress in some reasonable amount of time, maybe several seconds or so.

...this second TCP connection is not dropped but just waits and goes into CLOSE_WAIT state.

That is pretty interesting and may be indicative of a bug somewhere. Out of curiosity, what platform are you running OpenELP on?

It could also be a problem that this single 'empty' TCP connection can block the client's termination processes, and the proxy will not exit until the client disconnects.

Oooh, that's a bug, or more specifically a regression. Addressed by #27 - please give that a try if you're able.

@iddq
Copy link
Author

iddq commented Jan 7, 2025

Yes, #27 solves the issue at shutdown.

In response to your question, I'm using it on Linux and I'm not sure how the original Java proxy would react because it's not installed, and I didn't want to flood an open proxy. You see, due to the timeout, the only way to test it would be to continuously send connects to it. What I can say for sure is that I investigated this issue because the Echolink client on Android often fails to connect to the proxy due to some error, either initiating two connections in parallel or starting the next one so quickly that the previous one hasn't yet closed on the proxy, resulting in a situation where there are no free slots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants