-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR: AbstractMultithreadedModule$ExceptionHandler:210 #314
Comments
Same issue here in at least two cities, so it is not only you @Mayoookh. |
Could you please provide more information? Minimally MATSim version you are using, and full log file. And did you follow the recommendations further up on the logfile (no links of length zero, no links with speed infinity, ...)? Evidently, it would be good if we caught the problem earlier in the execution. I opened an issue: matsim-org/matsim-libs#963 . The fact that it worked last year unfortunately is not an argument; given stochasticity, the routers may go in different ways through the network, and thus encounter a problematic link with one version of MATSim but not with another. :-(. (For the same reason, sometimes switching to landmarks helps.) There were also some problems with multithreading in some particular configuration: #258 . This had to do with a particular use of the vehicles source; as long as you are not using that, I would not assume that this is your problem (although I evidently don't know). |
I have access to some of the troublesome networks and am currently performing some kind of stress test by trying to route from each node in the network to every other node. Naturally, due to the size of the networks and the quadratic number of combinations I likely won't be able to run it completely. But still, it already managed to calculate several millions of routes without the error happening, and to at least route from one node to every other node of the network, which should pass more or less every 2nd link of the network (routing in one direction, assuming there is also always a reverse direction for each link). Reading through all of #258, was the problem actually fixed? At least from the comments I was not able to see a connection between the vehicle source and the exceptions. Trying to analyze the problem:
I can exclude the networks. In cases from both Mayookh and Josie there are no links with length = 0. In Mayookh's network there are links with freespeed infinity, but their Ids are all prefixed with My guess currently goes towards the travel time and travel cost calculators, and maybe a concurrency issue there (either when querying, or when collecting the data). But without additional information this seems difficult to debug. As a first step, I plan to include additional information in the log-message with currently states Any other ideas? |
@kainagel I don't believe we are using vehicles source. Skimming through #258 makes me think such. @mrieser I think you have all you need to rerun Atlanta and Miami with the newer version (assuming we can pull that in pretty easily being up to date on MATSim version now). The other thing I could do is rerun Atlanta and Miami without any changes and see if the problem occurs in the same iteration again. I'm not familiar enough yet with what happens when in AStar router to know if that will tell us anything. @mrieser let me know. |
I ran with 2 versions. Firstly, I was running with 12.0-2019w27-SNAPSHOT. I sent the log file of that to @mrieser . Yesterday I ran with version 11 (I also ran with 11 last year). The simulation just freezes at certain point without warning and error and doesn't go forward from there. You would notice that in the log file here : https://www.dropbox.com/sh/m0aun0gtdjypu67/AABtasX-k1Sj__6yzd-MDNaMa?dl=0 We do use modified vehicle file with emission related information and the file is uploaded too. |
I pushed a commit which includes additional logging output around the area where the router currently fails. I also attach a jar-file with the MATSim-build from that commit, so everybody could try it for themselves. Please be aware that this is based on the most recent MATSim code, so if you are running e.g. with 12.0-2019w27-SNAPSHOT, there might be some incompatibilities due to changes in the code during the last few months. I will continue my testing with this code and see if I figure something out. |
Just an observation: In one case, |
I have certainly been able to remove such errors by switching to landmarks. However, I _think_ that this also works in the opposite direction (switch from landmards to astar if problems with landmarks). And also, the situations where it worked for me may have been these other types of errors ((loop) links of length zero, and/or speed infinity).
|
@Mayoookh is the code of your main class that you use to start the simulation available somewhere in public, e.g. on GitHub? Considering how many people use MATSim, and hew few have reported this problem, I suspect something very scenario specific, which is why being able to look at the code would be helpful. |
Hi @mrieser , I have uploaded few files here, which I think were necessary : https://www.dropbox.com/home/MATSim If you need more information, I will push my code, for the above, on GitHub soon. I will also change the routing algorithm from astar to landmarks and keep you in the loop for results. |
Some Update: it's still work in progress. In one run with additional output, I received the following log statements while running
So there is a link with Length > 0, freespeed < Infinity, but the TravelTime is 0.0, although it should be at least 11 seconds. After some digging (and based on other, later messages), I've found a problem in the preparation of the network (in this case, the network is converted from some source data and directly used in the simulation, without it first being written to a
So it's actually the TravelTimeCalculator that already returns the bad travel times. The problem being in TravelTimeCalculator explains why both But I have not yet found the reason why the TravelTimeCalculator returns the bad data. An additional run with even more debugging output for this was started... |
In case this is useful information, over the weekend I re-ran a city that got caught in this "infinite loop" error to see if it happened in the same spot---it did not. The run has since been terminated due to cost, so I'm not certain if it would have run into the error again. |
@mrieser Thank you for a continuous update. I have pushed my code to start the simulation here : https://github.com/Mayoookh/Matsim-Scenarios/tree/master/src/main/java/org/matsim/project I have not pushed the input files though. |
@kainagel I tried running the model with 11.0 , the same version I used last year. I am still having the same problem. So maybe its also because of stochasticity of routers in same version too(?) (Even though I ran 16 scenarios last time and didn't run into this problem) |
I'm more and more confident that it's a multi-threading issue, and that there is some strange race-condition I don't yet understand. It's at least the only explanation why it sometimes works, sometimes fails early and sometimes fails late, although it is always the same input. There might be also other influences:
Did any of these things change in your case, Mayookh? In the case of Josie and Grant, we switched from Java 8 to Java 11.0.6, and run it on AWS inside a Docker container (we also upgraded the docker image to use the newer Java version). So there have been some changes, and there are parts we cannot control (e.g. the hardware AWS uses for the instances). |
@mrieser this actually helped. I was using JDK 12 this time. I switched back to 8 and the simulation worked. It ran all the iterations without running into this problem. I would go forward with it for now and dig into the problem later |
Update, still work in progress.
But:
The zeros also appearing on links that have no current measurement seems to rule out the problem of a concurrency issue with collecting the data (i.e. in the Events handler during the mobsim phase, when all the values are coming in). This pair-wise appearance is very strange, and I still have not yet figured out, how the zeros end up in those spaces... I've now basically added a logging statement to every place where the value is set, trying to figure out where these values come from. |
We have some new information. @gvermillion ran a smaller city (Flagstaff, AZ) yesterday for an experiment using a jar from Java 11. He kicked off 18 simulations at the same time using different scoring parameters for each. Of the 18, 4 got stuck in the infinite loop. Flagstaff has a network of ~150k links (all streets-- large rural area) and only ~140k agents (100% of the population) and usually takes about 3h to run. We originally thought this problem was limited to larger cities, but it turns out that assumption is not true. |
I checked my network using KNIME . There were no links with zero length (as the warning said). Only the PT links had infinite speed with modes "bus, artificial". Moreover, I tried both Astar landmarks and Dijkstra but the problem persists. |
In one case, we temporarily solved it by switching from Java 11 back to Java 8: In Java 11, 4 out of 18 runs produced the error, back on Java 8 the same scenario ran around 60 times without a problem. I was now able to reproduce the problem with a smaller scenario on a local server (no cloud, no Docker, but bare-metal). I'm currently running more tests with additional logging output, but as soon as I include too many conditional logging statements, the error no longer happens... which makes the debugging hard (my interpretation: with the additional logging statements, the methods e.g. become too large for inlining or other JVM-internal effects which then no longer trigger the bug). So I'm currently experimenting on how I can reproduce the bug but still include enough debugging to dig deeper. It's work in progress, and usually it runs for 2 days or so until I can analyze and restart it, so it takes some time. At the moment, the safest really seems to be to use Java 8. |
If you can somewhat deterministically reproduce the error: Have you tried replacing the |
this was an |
I too switched from 12 to 8 and then it worked fine for me
Just for information, I am running various scenarios. In one of the scenarios, I just have 3 modes, i.e. walk, car and bike (without PT ) - its running again into the same problem, even on Java 8. It doesn't throw error but it just freezes after warning of "Astar totalcost = 0.0". |
Just wanted to add a comment to this conversation. I'm facing the same problem now. I was moving a simulation from one server to another and suddenly I kept getting the |
I have now cleaned up my debugging work and created a pull-request. While doing so, I realized that I was not able to reproduce the issue anymore after I did some minor performance-optimizations to the code while debugging. So there is hope that these modifications actually prevent the issue from triggering, although I still do not yet understand how the unexpected values could have crept in. If you are affected by this bug:
If the problem occurs with Java 11 and a newer MATSim version, please leave a comment with details (at least MATSim version, Java version, helpful would also be information about the OS being used etc.) |
Hello everyone,
I have been facing an issue with PlanRouter . It appears after the simulation has already run some iterations. Here is a screenshot from the error:
(This has been happening for a while now but I upload pic of my recent simulation- after debugging it for certain iterations )
The warning says totalCost for ASTAR is 0 (which might be related to infinte free speed links on network) I have run the same simulation last year with the same network and there was no error. What could be the possible reason? Do I need to analyze my network again?
The text was updated successfully, but these errors were encountered: