-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compute principal graph calculation struck at node 43 while using multiple cores(n_cores>1) #5
Comments
Thanks for the report! Using the example tree data, which has 492 points, I find no problem when uncommenting import multiprocessing as mp in BaseElpi.py and grammar_operations.py. You seem to have used a different dataset with 199 points and I can't reproduce your issue. I disabled the multi-cpu option for now because it created some issues when used in clusters. If you wish to try it yourself, to see any speedup you need a dataset with ~10**4 points minimum. A better option to speed up your code is to try the GPU version (import elpigraphgpu) using your own GPU or a free one on google Colaboratory. |
Thanks for the prompt response.I have tried again with uncommenting the import multiprocessing as mp in BaseElpi.py and grammar_operations.py. If you can provide some tips how to debug the issue i can provide more info about the issue. I am doing checks on multi core approach b/w python and elpigraph implementation. The above issue(Multi core issue) can be due to some issue in environment setup? any chance? |
Yes, if you provide more info I am happy to help - however I am quite puzzled what could be causing this, on my laptop with Ubuntu 18.04.4 LTS I have no issue. In any case the multi-core version is still work in progress and I strongly recommend you do not use it but instead try the GPU version. For 1 million+ datapoints it is the best solution. Using it requires the cupy package, after that using it is as easy as "import elpigraphgpu" instead of "import elpigraph". |
I tried elpigraph in Ubuntu 18.04 LTS and surprisingly it is not struck at node 43 while doing principal graph computation as u called out.I tried with windows again and computation of principal graph struck a Node 43.I have used the tree data while checking. |
I found this link which might be useful |
I found the python implementation using multi cores works for small data sets without any issue but when using a medium data(50k data points) hangs while doing computation in Ubuntu 18.04 LTS OS. Used Supermarket data for testing where used the Order.ID,Order.Date,Ship.Date,Ship.Mode,Customer.ID,Customer.Name,Segment,City,State,Country,Market,Region,Product.ID,Category,Sub.Category,Product.Name,Sales,Quantity,Discount,Profit,Shipping.Cost,Order.Priority columns where some of the columns are Label Encoded. I have forked your R implementation of elpigraph and its working fine with the above mentioned data in single core and multi core approach. |
I hope to set up a new version for multi-core Python soon and will look into this if I access a windows machine. |
I used tree data for checking multi core issue,got some interesting bugs..At Node 43 it is calling again the total code. import warnings C:\Users\tonystark\AppData\Local\conda\conda\envs\myenv\python.exe "C:/Users/tonystark/Documents/folder/elpigraph/testing/mutliprocessing check.py" Computing Elastic Principal Curve for file- C:\Users\tonystark\Documents\folder\elpigraph\testing\data\tree_data.csv Generating the initial configuration Computing Elastic Principal Curve for file- C:\Users\tonystark\Documents\folder\elpigraph\testing\data\tree_data.csv Generating the initial configuration Computing Elastic Principal Curve for file- C:\Users\tonystark\Documents\folder\elpigraph\testing\data\tree_data.csv Generating the initial configuration Computing Elastic Principal Curve for file- C:\Users\tonystark\Documents\folder\elpigraph\testing\data\tree_data.csv Generating the initial configuration Computing Elastic Principal Curve for file- C:\Users\tonystark\Documents\folder\elpigraph\testing\data\tree_data.csv Generating the initial configuration
File "C:\Users\tonystark\AppData\Local\conda\conda\envs\myenv\lib\site-packages\elpigraph\src\BaseElPi.py", line 184, in ElPrincGraph
File "C:\Users\tonystark\AppData\Local\conda\conda\envs\myenv\lib\multiprocessing\spawn.py", line 114, in _main
90% of the points have been used as initial conditions. Resetting.
Generating ElPiGraph output for file - C:\Users\tonystark\Documents\folder\elpigraph\testing\data\tree_data.csv Computing Elastic Principal Curve for file- C:\Users\tonystark\Documents\folder\elpigraph\testing\data\tree_data.csv Generating the initial configuration Computing Elastic Principal Curve for file- C:\Users\tonystark\Documents\folder\elpigraph\testing\data\tree_data.csv Generating the initial configuration Computing Elastic Principal Curve for file- C:\Users\tonystark\Documents\folder\elpigraph\testing\data\tree_data.csv Generating the initial configuration Computing Elastic Principal Curve for file- C:\Users\tonystark\Documents\folder\elpigraph\testing\data\tree_data.csv Generating the initial configuration
File "C:\Users\tonystark\AppData\Local\conda\conda\envs\myenv\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
90% of the points have been used as initial conditions. Resetting.
90% of the points have been used as initial conditions. Resetting.
|
I think link give some light for the current issue. |
Hi,
While using python implementation of elpigraph leveraging n_cores>1,the computeprincipalgraph calculation struck at Node 43 .Below attached the code output for reference.
Computing EPG with 60 nodes on 199 points and 3 dimensions
Creating a Pool with 2 processes
Nodes = 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
I have used the tree data which is in the same repository for doing sanity checks.
In the library I have removed the commentated part multiprocessing library for using multi cores.
The text was updated successfully, but these errors were encountered: