Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addressing Incorrect Handling of Python GIL that Leads to a SEGFAULT #361

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Scusemua
Copy link

@Scusemua Scusemua commented Jul 20, 2024

I found what appears to be a bug in bind/symbols.go.

Consider the following Go function that was generated by gopy:

func gopackage_GoObject_ExampleFunc(_handle CGoHandle, val CGoHandle, pythonCallback *C.PyObject, callbackArgument *C.char, goRun C.char) {
	_fun_arg := pythonCallback
	_saved_thread := C.PyEval_SaveThread() // Release GIL
	defer C.PyEval_RestoreThread(_saved_thread) // Acquire GIL (deferred) 
	vifc, __err := gopyh.VarFromHandleTry((gopyh.CGoHandle)(_handle), "*gopackage.GoObject")
	if __err != nil {
		return
	}
	if boolPyToGo(goRun) {
		go gopyh.Embed(vifc, reflect.TypeOf(gopackage.GoObject{})).(*gopackage.GoObject).ExampleFunc(*ptrFromHandle_gopackage_Bytes(val), func(arg_0 interface{}, arg_1 string) {
			if C.PyCallable_Check(_fun_arg) == 0 {
				return
			}
			_gstate := C.PyGILState_Ensure() // Acquire GIL
			_fcargs := C.PyTuple_New(2)
			C.PyTuple_SetItem(_fcargs, 0, C.gopy_build_string(C.CString(fmt.Sprintf("%s", (arg_0)))))
			C.PyTuple_SetItem(_fcargs, 1, C.gopy_build_string(C.CString(arg_1)))
			C.PyObject_CallObject(_fun_arg, _fcargs)
			C.gopy_decref(_fcargs)
			C.gopy_err_handle()
			C.PyGILState_Release(_gstate) // Release GIL
		}, C.GoString(callbackArgument))
	} else {
		gopyh.Embed(vifc, reflect.TypeOf(gopackage.GoObject{})).(*gopackage.GoObject).ExampleFunc(*ptrFromHandle_gopackage_Bytes(val), func(arg_0 interface{}, arg_1 string) {
			if C.PyCallable_Check(_fun_arg) == 0 {
				return
			}
			_gstate := C.PyGILState_Ensure() // Acquire GIL
			_fcargs := C.PyTuple_New(2)
			C.PyTuple_SetItem(_fcargs, 0, C.gopy_build_string(C.CString(fmt.Sprintf("%s", (arg_0)))))
			C.PyTuple_SetItem(_fcargs, 1, C.gopy_build_string(C.CString(arg_1)))
			C.PyObject_CallObject(_fun_arg, _fcargs)
			C.gopy_decref(_fcargs)
			C.gopy_err_handle()
			C.PyGILState_Release(_gstate) // Release GIL
		}, C.GoString(callbackArgument))
	}
}

This is generated for a Go method ExampleFunction of a GoObject struct. The intent is for this method to be called from Python.

The bug is that we access the C/Python API function C.PyCallable_Check before acquiring the GIL, leading to a SEGFAULT. The problem lies in the following excerpt of the above function:

			if C.PyCallable_Check(_fun_arg) == 0 {
				return
			}
			_gstate := C.PyGILState_Ensure() // Acquire GIL
			_fcargs := C.PyTuple_New(2)

Notice that we only acquire the GIL via PyGILState_Ensure after the call to C.PyCallable_Check(_fun_arg), which leads to a SEGFAULT.

The changes in this PR will result in the above Go function being generated differently as:

func gopackage_GoObject_ExampleFunc(_handle CGoHandle, val CGoHandle, pythonCallback *C.PyObject, callbackArgument *C.char, goRun C.char) {
	_fun_arg := pythonCallback
	_saved_thread := C.PyEval_SaveThread()  // Release GIL
	defer C.PyEval_RestoreThread(_saved_thread) // Acquire GIL (deferred) 
	vifc, __err := gopyh.VarFromHandleTry((gopyh.CGoHandle)(_handle), "*gopackage.GoObject")
	if __err != nil {
		return
	}
	if boolPyToGo(goRun) {
		go gopyh.Embed(vifc, reflect.TypeOf(gopackage.GoObject{})).(*gopackage.GoObject).ExampleFunc(*ptrFromHandle_gopackage_Bytes(val), func(arg_0 interface{}, arg_1 string) {
			_gstate := C.PyGILState_Ensure() // Acquire GIL
			if C.PyCallable_Check(_fun_arg) == 0 {
				C.PyGILState_Release(_gstate) // Release GIL
				return
			}
			_fcargs := C.PyTuple_New(2)
			C.PyTuple_SetItem(_fcargs, 0, C.gopy_build_string(C.CString(fmt.Sprintf("%s", (arg_0)))))
			C.PyTuple_SetItem(_fcargs, 1, C.gopy_build_string(C.CString(arg_1)))
			C.PyObject_CallObject(_fun_arg, _fcargs)
			C.gopy_decref(_fcargs)
			C.gopy_err_handle()
			C.PyGILState_Release(_gstate) // Release GIL
		}, C.GoString(callbackArgument))
	} else {
		gopyh.Embed(vifc, reflect.TypeOf(gopackage.GoObject{})).(*gopackage.GoObject).ExampleFunc(*ptrFromHandle_gopackage_Bytes(val), func(arg_0 interface{}, arg_1 string) {
			_gstate := C.PyGILState_Ensure() // Acquire GIL
			if C.PyCallable_Check(_fun_arg) == 0 {
				C.PyGILState_Release(_gstate) // Release GIL
				return
			}
			_fcargs := C.PyTuple_New(2)
			C.PyTuple_SetItem(_fcargs, 0, C.gopy_build_string(C.CString(fmt.Sprintf("%s", (arg_0)))))
			C.PyTuple_SetItem(_fcargs, 1, C.gopy_build_string(C.CString(arg_1)))
			C.PyObject_CallObject(_fun_arg, _fcargs)
			C.gopy_decref(_fcargs)
			C.gopy_err_handle()
			C.PyGILState_Release(_gstate) // Release GIL
		}, C.GoString(callbackArgument))
	}
}

Notice that we now call C.PyGILState_Ensure() before the call to C.PyCallable_Check(_fun_arg). Likewise, we call C.PyGILState_Release(_gstate) before returning within the body of the associated if-statement to ensure that there is a matching call (to match the _gstate := C.PyGILState_Ensure()).

In terms of testing, I've just been using the modified gopy in an application I'm developing. I'm in the process of doing more testing in the meantime; however, I am confident that this is a bug, as you must hold the Python GIL when interfacing/invoking any of the C/Python API.

From the "Thread State and the Global Interpreter Lock" subsection of the "Initialization, Finalization, and Threads" Python C-API documentation:

"Therefore, the rule exists that only the thread that has acquired the GIL may operate on Python objects or call Python/C API functions."

On a separate (but related) note -- since we defer C.PyEval_RestoreThread(_saved_thread) at the beginning of the method's execution, it's a little silly to release the GIL via C.PyGILState_Release(_gstate) immediately before returning (even in the case where we don't call into Go code); however, we need to have matching calls between C.PyEval_SaveThread() and C.PyEval_RestoreThread() as well as PyGILState_Ensure() and PyGILState_Release(), so I think it ultimately makes sense to do things this way for now. In theory, we could provide more complex logic to handle the GIL more efficiently in the future.

@Scusemua Scusemua changed the title Addressing Incorrect Handling of GIL Addressing Incorrect Handling of GIL that Leads to a SEGFAULT Jul 20, 2024
@Scusemua Scusemua changed the title Addressing Incorrect Handling of GIL that Leads to a SEGFAULT Addressing Incorrect Handling of Python GIL that Leads to a SEGFAULT Jul 20, 2024
@nickeygit
Copy link

Hi @Scusemua !

Thanks for your patch! You saved me several hours of life! :))
And now my turn.

PyGILState locking is needed for sure, but Go is insidious too :)
Goroutines can run on different OS threads
and switch them anytime...

even between PyGILState_Ensure/PyGILState_Release calls!

And sure, PyGILState_Release will panic that current thread has no python thread info and SEGFAULT etc.

My simple two clients-loaded Web server was crashing on this even with your patch.

To fix - we need to lock current goroutine to OS thread during the PyGILState_Ensure/PyGILState_Release section.

Simply add runtime.LockOSThread() call before PyGILState_Ensure (1 place in bind/symbols.go)

and

runtime.UnlockOSThread() after every PyGILState_Release call (3 places in bind/symbols.go)

And now my test server is running fine, calling Python callback in the web handler :)

Feel free to add this to your branch and be brave to propose your PR for merge!

Have nice days guys!

@Scusemua
Copy link
Author

Scusemua commented Dec 9, 2024

Hi @nickeygit,

Yeah, I also found that I needed to pin the goroutines to OS threads, or else I would get really strange errors -- just like you're describing. I just think I forgot to add an update about this to my original post. 😅

With these two fixes, everything has been working fine on my end with no more issues for months! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants