Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer always segfaults #425

Open
Robertleoj opened this issue Feb 23, 2025 · 13 comments · May be fixed by #426
Open

Optimizer always segfaults #425

Robertleoj opened this issue Feb 23, 2025 · 13 comments · May be fixed by #426
Labels
bug Something isn't working

Comments

@Robertleoj
Copy link

Robertleoj commented Feb 23, 2025

Describe the bug
optimizer.optimize always segfaults in my environment.

To Reproduce
Steps to reproduce the behavior, e.g.:

  1. Create file
import symforce
symforce.set_epsilon_to_symbol()


import symforce.symbolic as sf
from symforce.values import Values
from symforce.opt.optimizer import Optimizer
from symforce.opt.factor import Factor


initial_values = Values(
    vec=sf.V2(0.0, 0.0),
)


def null_factor(v: sf.V2) -> sf.V2:
    return sf.V2(v.x - 0.0, v.y - 0.0)

optimizer = Optimizer(
    factors=[Factor(residual=null_factor, keys=['vec'])],
    optimized_keys=["vec"],
)

optimizer.optimize(initial_values)

  1. Run the code
  2. See error
❯ python3 symforce_test.py 
[1]    434097 segmentation fault (core dumped)  python3 symforce_test.py

Expected behavior
No segfault, successful optimization.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS and version: Pop_OS 22.04 LTS x86_64
  • Python version: 3.11.0
  • SymForce Version: 0.9.0

Additional context
I put in some breakpoints and saw that the segfault happens in the call

stats = self._cc_optimizer.optimize(cc_values, **kwargs)
@Robertleoj Robertleoj added the bug Something isn't working label Feb 23, 2025
@Robertleoj
Copy link
Author

It it's helpful, here is the stack trace of the core dump:

❯ gdb python3 core.python3.459455 
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...
[New LWP 459455]
[New LWP 459470]
[New LWP 459463]
[New LWP 459462]
[New LWP 459459]
[New LWP 459468]
[New LWP 459464]
[New LWP 459456]
[New LWP 459457]
[New LWP 459465]
[New LWP 459466]
[New LWP 459458]
[New LWP 459467]
[New LWP 459460]
[New LWP 459469]
[New LWP 459461]

warning: Section `.reg-xstate/459455' in core file too small.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `python3 notebooks/symforce_test.py'.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Section `.reg-xstate/459455' in core file too small.
#0  0x0000000000000000 in ?? ()
[Current thread is 1 (Thread 0x7d4f00df4b80 (LWP 459455))]
warning: File "/home/robert/.pyenv/versions/3.11.0/bin/python3.11-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
	add-auto-load-safe-path /home/robert/.pyenv/versions/3.11.0/bin/python3.11-gdb.py
line to your configuration file "/home/robert/.config/gdb/gdbinit".
To completely disable this security protection add
	set auto-load safe-path /
line to your configuration file "/home/robert/.config/gdb/gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
	info "(gdb)Auto-loading safe path"
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007d4ed109949f in ?? ()
   from /home/robert/learning/slambook/.venv/lib/python3.11/site-packages/cc_sym.cpython-311-x86_64-linux-gnu.so
#2  0x00007d4ed1099643 in ?? ()
   from /home/robert/learning/slambook/.venv/lib/python3.11/site-packages/cc_sym.cpython-311-x86_64-linux-gnu.so
#3  0x00007d4ed102b491 in ?? ()
   from /home/robert/learning/slambook/.venv/lib/python3.11/site-packages/cc_sym.cpython-311-x86_64-linux-gnu.so
#4  0x00007d4f009b3d43 in cfunction_call (func=0x7d4ed593b600, args=<optimized out>, 
    kwargs=<optimized out>) at Objects/methodobject.c:542
#5  0x00007d4f009628d7 in _PyObject_MakeTpCall (tstate=0x7d4f00d89ab8 <_PyRuntime+166328>, 
    callable=0x7d4ed593b600, args=<optimized out>, nargs=3, keywords=0x0)
    at Objects/call.c:214
#6  0x00007d4f009026db in _PyEval_EvalFrameDefault (tstate=<optimized out>, 
    frame=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:4772
#7  0x00007d4f00a5cc8f in _PyEval_EvalFrame (throwflag=0, frame=0x7d4f00ef2020, 
    tstate=0x7d4f00d89ab8 <_PyRuntime+166328>) at ./Include/internal/pycore_ceval.h:73
#8  _PyEval_Vector (args=0x0, argcount=0, kwnames=0x0, locals=0x7d4f007f2cc0, 
    func=0x7d4f007d1f80, tstate=0x7d4f00d89ab8 <_PyRuntime+166328>) at Python/ceval.c:6428
#9  PyEval_EvalCode (co=co@entry=0x7d4eff338eb0, globals=globals@entry=0x7d4f007f2cc0, 
    locals=locals@entry=0x7d4f007f2cc0) at Python/ceval.c:1154
#10 0x00007d4f00aa7dfd in run_eval_code_obj (locals=0x7d4f007f2cc0, globals=0x7d4f007f2cc0, 
    co=0x7d4eff338eb0, tstate=0x7d4f00d89ab8 <_PyRuntime+166328>) at Python/pythonrun.c:1714
#11 run_mod (mod=<optimized out>, filename=filename@entry=0x7d4f0072fb40, 
    globals=globals@entry=0x7d4f007f2cc0, locals=locals@entry=0x7d4f007f2cc0, 
    flags=flags@entry=0x7ffc706562f8, arena=arena@entry=0x7d4f0071b7b0)
    at Python/pythonrun.c:1735
#12 0x00007d4f00aa96b6 in pyrun_file (flags=0x7ffc706562f8, closeit=<optimized out>, 
    locals=0x7d4f007f2cc0, globals=0x7d4f007f2cc0, start=257, filename=0x7d4f0072fb40, 
    fp=0x578e63ab32d0) at Python/pythonrun.c:1630
#13 _PyRun_SimpleFileObject (fp=fp@entry=0x578e63ab32d0, 
    filename=filename@entry=0x7d4f0072fb40, closeit=closeit@entry=1, 
    flags=flags@entry=0x7ffc706562f8) at Python/pythonrun.c:440
#14 0x00007d4f00aa9c4f in _PyRun_AnyFileObject (fp=0x578e63ab32d0, 
    filename=filename@entry=0x7d4f0072fb40, closeit=closeit@entry=1, 
    flags=flags@entry=0x7ffc706562f8) at Python/pythonrun.c:79
#15 0x00007d4f00ac9d30 in pymain_run_file_obj (skip_source_first_line=0, 
    filename=0x7d4f0072fb40, program_name=0x7d4f007f2eb0) at Modules/main.c:360
#16 pymain_run_file (config=0x7d4f00d6fb00 <_PyRuntime+59904>) at Modules/main.c:379
#17 pymain_run_python (exitcode=0x7ffc706562f0) at Modules/main.c:601
#18 Py_RunMain () at Modules/main.c:680
#19 0x00007d4f00aca2be in pymain_main (args=0x7ffc70656410) at Modules/main.c:710
#20 Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:734
#21 0x00007d4f00429d90 in __libc_start_call_main (main=main@entry=0x578e56939060 <main>, 
    argc=argc@entry=2, argv=argv@entry=0x7ffc70656598)
    at ../sysdeps/nptl/libc_start_call_main.h:58
#22 0x00007d4f00429e40 in __libc_start_main_impl (main=0x578e56939060 <main>, argc=2, 
    argv=0x7ffc70656598, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7ffc70656588) at ../csu/libc-start.c:392
#23 0x0000578e56939095 in _start ()
(gdb) 

@Robertleoj
Copy link
Author

I also tried a simple copy-paste o the example in the README. Still segfaults.

@Robertleoj Robertleoj changed the title Optimizer always segfaults: Optimizer always segfaults Feb 23, 2025
aaron-skydio added a commit that referenced this issue Feb 24, 2025
Fixes #425

Our previous version of pybind didn't support py3.11.  pybind 2.13.6
supports all versions of python that we support.  There isn't much
downside to downloading an additional copy of pybind if you have an
older one installed that supports your version of python.  If we wanted
to we could write some table of minimum required versions based on
SYMFORCE_PYTHON's version

Topic: sf-pybind
@aaron-skydio aaron-skydio linked a pull request Feb 24, 2025 that will close this issue
@aaron-skydio
Copy link
Member

Will be fixed by #426, see comment there

aaron-skydio added a commit that referenced this issue Feb 24, 2025
Fixes #425

Our previous version of pybind didn't support py3.11.  pybind 2.13.6
supports all versions of python that we support.  There isn't much
downside to downloading an additional copy of pybind if you have an
older one installed that supports your version of python.  If we wanted
to we could write some table of minimum required versions based on
SYMFORCE_PYTHON's version

Topic: sf-pybind
aaron-skydio added a commit that referenced this issue Feb 24, 2025
Fixes #425

Our previous version of pybind didn't support py3.11.  pybind 2.13.6
supports all versions of python that we support.  There isn't much
downside to downloading an additional copy of pybind if you have an
older one installed that supports your version of python.  If we wanted
to we could write some table of minimum required versions based on
SYMFORCE_PYTHON's version

Topic: sf-pybind
@Robertleoj
Copy link
Author

Robertleoj commented Feb 24, 2025

@aaron-skydio

I installed the repo by linking to your PR branch, and ran the test again (identical to the example in the README). I get the same results, still segfaults.

Here is the stack trace:

lambook on  main [!⇡] is 📦 v0.1.0 via △ v3.22.1 via 🐍 v3.11.0 (.venv) 
❯ python3 notebooks/symforce_test.py
sys:1: FutureWarning: debug_stats argument is deprecated, use params.debug_stats
[1]    12938 segmentation fault (core dumped)  python3 notebooks/symforce_test.py

slambook on  main [!?⇡] is 📦 v0.1.0 via △ v3.22.1 via 🐍 v3.11.0 (.venv) 
❯ gdb python3 core.python3.12938                            
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...
[New LWP 12938]
[New LWP 12942]
[New LWP 12946]
[New LWP 12952]
[New LWP 12943]
[New LWP 12948]
[New LWP 12947]
[New LWP 12939]
[New LWP 12940]
[New LWP 12949]
[New LWP 12950]
[New LWP 12941]
[New LWP 12953]
[New LWP 12944]
[New LWP 12951]
[New LWP 12945]

warning: Section `.reg-xstate/12938' in core file too small.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `python3 notebooks/symforce_test.py'.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Section `.reg-xstate/12938' in core file too small.
#0  0x0000000000000000 in ?? ()
[Current thread is 1 (Thread 0x7fc9651feb80 (LWP 12938))]
warning: File "/home/robert/.pyenv/versions/3.11.0/bin/python3.11-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
	add-auto-load-safe-path /home/robert/.pyenv/versions/3.11.0/bin/python3.11-gdb.py
line to your configuration file "/home/robert/.config/gdb/gdbinit".
To completely disable this security protection add
	set auto-load safe-path /
line to your configuration file "/home/robert/.config/gdb/gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
	info "(gdb)Auto-loading safe path"
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007fc935d8390e in ?? ()
   from /home/robert/learning/slambook/.venv/lib/python3.11/site-packages/cc_sym.cpython-311-x86_64-linux-gnu.so
#2  0x00007fc935d14dc7 in ?? ()
   from /home/robert/learning/slambook/.venv/lib/python3.11/site-packages/cc_sym.cpython-311-x86_64-linux-gnu.so
#3  0x00007fc9653b3d43 in cfunction_call (func=0x7fc93b94a340, args=<optimized out>, kwargs=<optimized out>) at Objects/methodobject.c:542
#4  0x00007fc9653628d7 in _PyObject_MakeTpCall (tstate=0x7fc965789ab8 <_PyRuntime+166328>, callable=0x7fc93b94a340, args=<optimized out>, 
    nargs=3, keywords=0x0) at Objects/call.c:214
#5  0x00007fc9653026db in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>)
    at Python/ceval.c:4772
#6  0x00007fc96545cc8f in _PyEval_EvalFrame (throwflag=0, frame=0x7fc9658ca020, tstate=0x7fc965789ab8 <_PyRuntime+166328>)
    at ./Include/internal/pycore_ceval.h:73
#7  _PyEval_Vector (args=0x0, argcount=0, kwnames=0x0, locals=0x7fc9651f2cc0, func=0x7fc9651d1f80, 
    tstate=0x7fc965789ab8 <_PyRuntime+166328>) at Python/ceval.c:6428
#8  PyEval_EvalCode (co=co@entry=0x650744bdb630, globals=globals@entry=0x7fc9651f2cc0, locals=locals@entry=0x7fc9651f2cc0)
    at Python/ceval.c:1154
#9  0x00007fc9654a7dfd in run_eval_code_obj (locals=0x7fc9651f2cc0, globals=0x7fc9651f2cc0, co=0x650744bdb630, 
    tstate=0x7fc965789ab8 <_PyRuntime+166328>) at Python/pythonrun.c:1714
#10 run_mod (mod=<optimized out>, filename=filename@entry=0x7fc96512f2f0, globals=globals@entry=0x7fc9651f2cc0, 
    locals=locals@entry=0x7fc9651f2cc0, flags=flags@entry=0x7fffff411248, arena=arena@entry=0x7fc96511b7b0) at Python/pythonrun.c:1735
#11 0x00007fc9654a96b6 in pyrun_file (flags=0x7fffff411248, closeit=<optimized out>, locals=0x7fc9651f2cc0, globals=0x7fc9651f2cc0, 
    start=257, filename=0x7fc96512f2f0, fp=0x650744b7f300) at Python/pythonrun.c:1630
#12 _PyRun_SimpleFileObject (fp=fp@entry=0x650744b7f300, filename=filename@entry=0x7fc96512f2f0, closeit=closeit@entry=1, 
    flags=flags@entry=0x7fffff411248) at Python/pythonrun.c:440
#13 0x00007fc9654a9c4f in _PyRun_AnyFileObject (fp=0x650744b7f300, filename=filename@entry=0x7fc96512f2f0, closeit=closeit@entry=1, 
    flags=flags@entry=0x7fffff411248) at Python/pythonrun.c:79
#14 0x00007fc9654c9d30 in pymain_run_file_obj (skip_source_first_line=0, filename=0x7fc96512f2f0, program_name=0x7fc9651f2eb0)
    at Modules/main.c:360
#15 pymain_run_file (config=0x7fc96576fb00 <_PyRuntime+59904>) at Modules/main.c:379
#16 pymain_run_python (exitcode=0x7fffff411240) at Modules/main.c:601
#17 Py_RunMain () at Modules/main.c:680
#18 0x00007fc9654ca2be in pymain_main (args=0x7fffff411360) at Modules/main.c:710
#19 Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:734
#20 0x00007fc964e29d90 in __libc_start_call_main (main=main@entry=0x65070e590060 <main>, argc=argc@entry=2, argv=argv@entry=0x7fffff4114e8)
    at ../sysdeps/nptl/libc_start_call_main.h:58
#21 0x00007fc964e29e40 in __libc_start_main_impl (main=0x65070e590060 <main>, argc=2, argv=0x7fffff4114e8, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffff4114d8) at ../csu/libc-start.c:392
#22 0x000065070e590095 in _start ()
(gdb) 

@Robertleoj
Copy link
Author

Robertleoj commented Feb 24, 2025

Interestingly, I tried building with cmake from source on master, and then I don't encounter any issues. However, if I install with pip install . on master, I encounter the segfault again.

@aaron-skydio
Copy link
Member

Yeah this seems to only happen with the wheel build, I haven't figured out why that is.

How did you install from my branch? Did you pip install git+https://github.com/symforce-org/symforce@aaron/revup/main/sf-pybind? Can you post the log from the build? I'm curious if it found a copy of pybind on your system, or an old copy you had downloaded

@Robertleoj
Copy link
Author

I added "symforce@git+https://github.com/symforce-org/symforce.git#egg=refs/heads/aaron/revup/main", to my dependencies. However, I'll make a clean venv and install only your branch - I'll share the build log when it finishes :)

@Robertleoj
Copy link
Author

Here is the full log from

uv pip install -vvv "symforce@git+https://github.com/symforce-org/symforce.git#egg=refs/heads/aaron/revup/main" &> build_log.txt

In a fresh virtual environment https://gist.github.com/Robertleoj/dc01610d3f5bdddfa35f14d1cab6264d

(log was too long to paste here)

@aaron-skydio
Copy link
Member

This is installing the old version of pybind:

https://gist.github.com/Robertleoj/dc01610d3f5bdddfa35f14d1cab6264d#file-build_log-txt-L1629-L1630

              2.769727s   1s  DEBUG uv_build_frontend -- pybind11 not found, adding with FetchContent
              2.910354s   2s  DEBUG uv_build_frontend -- pybind11 v2.9.2 

It looks like you're installing from aaron/revup/main (which isn't a branch that exists, so I'm not sure what that's doing), should be aaron/revup/main/sf-pybind

@Robertleoj
Copy link
Author

oops - I corrected the link:

uv pip install -vvv "symforce@git+https://github.com/symforce-org/symforce.git#egg=refs/heads/aaron/revup/main/sf-pybind" 

but still

❯ cat build_log.txt | grep pybind11
    0.721259s DEBUG uv_installer::plan Unnecessary package: pybind11==2.13.6
              2.881908s   1s  DEBUG uv_build_frontend -- pybind11 not found, adding with FetchContent
              3.021895s   1s  DEBUG uv_build_frontend -- pybind11 v2.9.2 
             81.904338s   1m  DEBUG uv_build_frontend -- pybind11 not found, adding with FetchContent
             81.948580s   1m  DEBUG uv_build_frontend -- pybind11 v2.9.2 

@aaron-skydio
Copy link
Member

I don't think uv supports #egg refs: astral-sh/uv#2602

Can you just uv pip install git+https://github.com/symforce-org/symforce@aaron/revup/main/sf-pybind?

@Robertleoj
Copy link
Author

That works! Thank you very much for the support.

@aaron-skydio
Copy link
Member

Excellent! Will get that PR merged shortly

aaron-skydio added a commit that referenced this issue Feb 24, 2025
Fixes #425

Our previous version of pybind didn't support py3.11.  pybind 2.13.6
supports all versions of python that we support.  There isn't much
downside to downloading an additional copy of pybind if you have an
older one installed that supports your version of python.  If we wanted
to we could write some table of minimum required versions based on
SYMFORCE_PYTHON's version

Topic: sf-pybind
Relative: fix-wheels
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants