The task is to maximize throughput over a proxy network. I am given 3 command line arguments pointing to 3 unique files.
- Proxy file path - This file contains a proxy url on every line. I can issue up to 30 concurrent requests to each of these proxies.
- Code file path - There is a code on every line. To solve a code I must send a request to any of the proxies with the code.
- Solution file path - This file should contain a code-solution pair on every line. (The order need not be the same as the code file path.
Considerations
- Proxies will respond to the solution of a code with a status code of 200 and a body containing the solution.
- Issuing any more than 30 concurent requests to a given proxy runs the risk of being blocked. Being blocked is equivalent to the proxy only responding with status code 503.
- Even without being blocked, a proxy may sometimes fail to respond correctly. The status code will be 503.
- If any of the proxies responds with anything but a 503 (fail) or a 200 (success) the whole application must be shut down immediately.
- The code should be written as to minimize data loss.
Below is an implementation of the above constraints.
chmod +x build.sh
./build.sh
cd untitled-1.0-SNAPSHOT/bin
./untitled input1 input2 input3
- Put all the input in a single channel
- Spawn parallel masters for every available proxy.
- Each master spawns as many as 30 parallel slaves (at a time) to call the proxy and handle the response.
- Each slave is responsible for issuing a request to the proxy and output the response to an output channel
- The output channel is consumed by a single process that writes the output to a file - The reason behind having a single process consuming the output channel is because handling concurrent writes to a file was left out of the scope of this solution.
- I had to call
coroutineContext.cancelChildren()
in therun
ofAsyncRunner
as I didn't have an elegant way to close the channels. If I did not do this, the function would have never returned as the code reading from the input channel would be blocking forever. Additionally, I only get to this part of the code after the process consuming the output channel has finished reading the number of codes I instantiateAsyncRunner
with. I feel this could be improved with a more elegant solution but due to time restraints I chose to leave it as is. - The channels themselves have an
UNLIMITED
buffer size. This can easily become a problem with large input sets. The challenge was that if I had a buffer size, then I would've had to come up with a smart solution that puts items on the input channel while also consuming. This can easily give rise to some race conditions and due to the time restraint I chose to leave it out of scope. - I have a single process writing the output to the file. Could I have improved the amount of information retained when an error is thrown with multiple processes?
- The tests I use to check no more than 30 concurrent requests can exist are based on the assumption that the 1 second delay is enough for at least 1 more request to come through. On slow systems this assumption breaks down. I would have liked to have a more elegant test with more time.
- The tests checking if the input-output pair is correct only checks this by the number of lines. This could easily have been done on the actual values.
- I had to write 2 instances of the mock service used in the tests. Namely,
MockInputService
andMockInputServiceRecovers
. This is a bit hard to read and a bit unintuitive. - The project is called
untitled
.