-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
7 lines (6 loc) · 1.51 KB
/
TODO
1
2
3
4
5
6
1) Check negative time, atm only get_times.py from OverallTimeCache and LoadTimeNoCache got partially fixed. This must be included in all get_times.py files and for every date2-date1
2) Add more tests for m5_2xlarge
3) try increasing the number of executors, you should obtain better values. For example with 2 VM-4 executors cache and no cache
(tried with 2 Worker 4 Executors 4 cores for CarAccidentsCache but it didnt improve so much than 2 worker 2 executors 8 cores, it was 2.5min in the first case and 2.3m for the second case)
4) from test_m5_2xlarge plot_overall.py it is possible to see that the difference of time between cache and no cache decrease by increasing the number of cores and worker, maybe try with more cores and more workers. I bet it comes better idk why, i didnt expect this kind of behaviour
5) fix negative values for python3 ./plot_stacked_chart.py 6 NoCache 2M on m4_xlarge... probably the problem is related to a long load time, since we are talking about few seconds more test were needed or a higher dataset was needed. The load time for 2 cores is around 2.4 seconds while jobs is 10 seconds and output is 2.9 seconds so that makes computing=10-2.4*3-2.9 which gives a negative value, a possible solution may be to consider stage0, stage2, stage4 as load stage but again, this is not the best way because Spark Web UI and Rest API do not show the time related to each single operation since all is performed by the optimizer. The problem is present almost in all cases in which the dataset is less than 2M.