Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OSError: exception: access violation reading #266

Closed
TadeuNP opened this issue Feb 17, 2023 · 4 comments
Closed

[BUG] OSError: exception: access violation reading #266

TadeuNP opened this issue Feb 17, 2023 · 4 comments
Labels
bug Something isn't working priority: high

Comments

@TadeuNP
Copy link

TadeuNP commented Feb 17, 2023

PySR throws an "OSError: exception: access violation reading" error. It seems to occur often when fitting a model many times (tried with the exact same settings and data). Occurs both in Jupyter and when running from a Python file.

Visual Studio Code outputs the following:

Traceback (most recent call last):
File "c:\Users\Tadeu\Desktop\pysr-access-violation.py", line 44, in
model.fit(X_train , dx , variable_names=["x", "y", "z"])
File "C:\Users\Tadeu\anaconda3\envs\thesis\lib\site-packages\pysr\sr.py", line 1792, in fit
self._run(X, y, mutated_params, weights=weights, seed=seed)
File "C:\Users\Tadeu\anaconda3\envs\thesis\lib\site-packages\pysr\sr.py", line 1493, in _run
Main = init_julia(self.julia_project, julia_kwargs=julia_kwargs)
File "C:\Users\Tadeu\anaconda3\envs\thesis\lib\site-packages\pysr\julia_helpers.py", line 180, in init_julia
Julia(**julia_kwargs)
File "C:\Users\Tadeu\anaconda3\envs\thesis\lib\site-packages\julia\core.py", line 519, in init
self._call("const PyCall = Base.require({0})".format(PYCALL_PKGID))
File "C:\Users\Tadeu\anaconda3\envs\thesis\lib\site-packages\julia\core.py", line 554, in _call
ans = self.api.jl_eval_string(src.encode('utf-8'))

OSError: exception: access violation reading 0x0000025A5A9D1000
Exception ignored in atexit callback: <_FuncPtr object at 0x0000025A5A83AF60>
OSError: exception: access violation reading 0x0000025A5A9D1000

  • Windows 11
  • Julia 1.8.3
  • Python 3.10.9
  • Installed with pip
  • PySR 0.11.14 (just updated from .11 in an attempt to fix this)

PySR settings and a minimal example:

import numpy as np
from scipy.integrate import odeint
from pysr import PySRRegressor

model = PySRRegressor(
    model_selection="best", 
    niterations=30,
    population_size=90,
    binary_operators=["+", "*", "/",  "-"],
    loss="loss(x, y) = (x - y)^2",
    warm_start=True
)
goodwin = lambda x,  t , a1=5, a2=5, a3=5, c1=0.5, c2=0.5, c3 = 0.5, n=10, K=1: [
    
                     a1* K**n/(K**n  + x[2]**n ) - c1* x[0],
                     a2*x[0] - c2 * x[1],
                     a3*x[1] - c3*x[2]    ]

initial_cond = np.random.uniform(0, 5, 3) 
sol = odeint(goodwin, initial_cond, t) 
x = sol[:, 0]; y = sol[:, 1]; z = sol[:, 2];
X_train = np.column_stack((x, y, z))
dx = goodwin((x, y, z), 0)[0] 

model.fit(X_train, dx , variable_names=["x", "y", "z"])

As far as I can tell, this is enough to reproduce the error. It occurs often enough that I usually have to restart Jupyter after fitting twice (this was not the case a month ago, for some reason).

Note: I thought this could be caused by the different datasets that were being fed to the model, but locking it after the first run still leads to the bug.
I also removed the division and multiplication operator and somehow it managed to fit ~7 times before crashing, way more than the maximum of 2 that I was seeing when using a larger pool of binary_operators.

Let me know if there is something else I need to provide. I will also try to run this same example on a different computer to see if I get similar behaviour.

@TadeuNP TadeuNP added the bug Something isn't working label Feb 17, 2023
@MilesCranmer
Copy link
Owner

MilesCranmer commented Feb 17, 2023

Related to the third error I am seeing in the Windows tests: #238 (also posted here: JuliaLang/julia#47957). I unfortunately don't have a Windows machine where I can replicate this so it's a bit difficult for me to debug, but I can offer some questions which will help me track it down:

  • Does your machine have low RAM by any chance?
  • Are you running it inside a VM?
  • Does this error still occur in multiprocessing mode, with multithreading=False, procs=procs (where procs is the number of processors you have).
  • Does this error depend on how many procs you set? It could be a data race.
  • Does this error still occur in serial mode (multithreading=False, procs=0)?
  • If you run the pure-Julia example here: https://github.com/MilesCranmer/SymbolicRegression.jl/#quickstart, does the error still occur? If not, then it might be a PyJulia problem.
  • Does this error still occur if you do not import scipy and run odeint, but rather pre-compute that integral, and load it from a file? (sometimes Python libraries with C bindings can interfere).
  • Does the error frequency change if you pass julia_kwargs={"optimize": 0} to PySRRegressor?
  • Does the issue go away if you try PySR 0.10? I think this is where I noticed the access error in the tests.
  • Might also try Julia 1.8.5 but not sure it will fix things

The more information the better – these questions will help me figure out where the problem could be lurking. Thanks!
Miles

@TadeuNP
Copy link
Author

TadeuNP commented Feb 18, 2023

  • Memory usage is at around 11 out of 16 GB available.
  • Not using a VM.
  • The pure Julia example works fine.
  • Skipping Scipy and loading from a file did not fix it.
  • julia_kwargs={"optimize": 0} did not fix it. As far as I can tell, it showed a similar error frequency as before.
  • Error seems to disappear when setting multithreading=False
    • It seems to work independently of the number of procs set. Using 0, 1 or 12 procs all worked.
  • I have not yet tried updating Julia or PySR 0.10.

After running a bunch of successful tests with multithreading disabled, I decided to turn it on. To my surprise, it worked perfectly. Then I noticed a new warning message:

C:\Users\Tadeu\anaconda3\envs\thesis2\lib\site-packages\pysr\julia_helpers.py:217: UserWarning: Julia has already started. The new Julia options {'threads': 12} will be ignored.

I restarted the Jupyter kernel and attempted to fit a model twice, with multithreading enabled both times, and it failed.

Thanks! Let me know if there are more tests I can run.

@MilesCranmer
Copy link
Owner

Awesome, thanks for answering those. So indeed using multiprocessing instead of multithreading seems like a good workaround for now (via multithreading=False). It’s actually just as fast, if not faster, than multi-threading; it just takes a bit longer to start each search.

This is also very helpful for finding the bug, thanks. Because it only occurs for multithreading, but not multiprocessing, I think it is a data race issue. (Multiprocessing copies between processes, whereas threads can access the same resources). It’s interesting that it only seems to occur in PyJulia context though…

@MilesCranmer
Copy link
Owner

Should be fixed on most recent version. Ping me if not!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority: high
Projects
None yet
Development

No branches or pull requests

2 participants