Report of Python driver crash with libev reactor

Description

Python driver 3.25.0. User is reporting crashing of Python apps using the driver with a stack trace like the following:

 

gdb) thread apply all bt Thread 2 (Thread 0x7f29cade7740 (LWP 520044)): #0 0x00007f29c9344230 in ?? () from /usr/lib/python3/dist-packages/cassandra/cython_utils.cpython-311-x86_64-linux-gnu.so #1 0x00007f29cb10312a in ?? () from /lib64/ld-linux-x86-64.so.2 #2 0x00007f29cb10681e in ?? () from /lib64/ld-linux-x86-64.so.2 #3 0x00007f29cae2a55d in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x00007f29cae2a69a in exit () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x00007f29cae13251 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x00007f29cae13305 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x000000000062a3e1 in _start () Thread 1 (Thread 0x7f29a37fe6c0 (LWP 3706514)): #0 0x000000000062f530 in ?? () #1 0x000000000062f4f6 in PyThreadState_New () #2 0x000000000046125f in ?? () #3 0x00007f29c8fc06bf in ?? () from /usr/lib/python3/dist-packages/cassandra/io/libevwrapper.cpython-311-x86_64-linux-gnu.so #4 0x00007f29c8fac633 in ev_invoke_pending () from /lib/x86_64-linux-gnu/libev.so.4 #5 0x00007f29c8fafe71 in ev_run () from /lib/x86_64-linux-gnu/libev.so.4 #6 0x00007f29c8fc05fc in ?? () from /usr/lib/python3/dist-packages/cassandra/io/libevwrapper.cpython-311-x86_64-linux-gnu.so #7 0x000000000055da30 in ?? () #8 0x000000000053b94c in PyObject_Vectorcall () #9 0x000000000052c6a0 in _PyEval_EvalFrameDefault () #10 0x00000000005860d4 in ?? () #11 0x0000000000585118 in ?? () #12 0x00000000005306f7 in _PyEval_EvalFrameDefault () #13 0x00000000005860d4 in ?? () #14 0x0000000000585118 in ?? () 0000015 0x000000000067bf0c in ?? () 0000016 0x0000000000656cb4 in ?? () 0000017 0x00007f29cae75134 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 0000018 0x00007f29caef57dc in ?? () from /lib/x86_64-linux-gnu/libc.so.6

 

Best guess is it’s a libev issue manifesting itself in odd ways.

Environment

None

Pull Requests

None

Activity

Show:

Bret McGuire 
December 13, 2024 at 9:42 PM

After some investigation the root cause of this issue was identified as an issue in the Python interpreter. See issue #127893 for more information.

Steve Lacerda 
December 13, 2024 at 6:36 PM

It’s a shortcut because they have ssl and secure info that they don’t want exposed. I have this test running in the lab. I’ll let you know if it recreates.

Bret McGuire 
December 11, 2024 at 11:38 PM

“cql” in the bash script below is… straight cqlsh? Or is there some other intermediary there?

Steve Lacerda 
December 11, 2024 at 4:27 PM

It can be recreated with the following:

#!/bin/bash # Infinite loop while true; do # Run the first CQL command cql -e "SELECT * from storagegrid.object_by_uuid WHERE uuid=5A9EDA45-0C45-46D6-ACA9-8343FA9093A9" # Run the second CQL command and discard its output cql -e "SELECT * FROM system.local" > /dev/null 2>&1 done

Bret McGuire 
November 1, 2024 at 10:54 PM

Expanding just a bit now that we have the correct Python version.

 

First stack trace below fails this assert in is_tstate_valid() while trying to grab the GIL. The “interp” field in PyThreadState is an instance of PyInterpreterState. It’s not immediately clear where this PyThreadState came from but obviously it’s missing at least one of its fields.

 

There are a number of reports similar to the second stack trace, most notably here. In that case client code using the Python API was failing to cleanup a thread key.

Won't Fix

Details

Assignee

Reporter

Priority

Created September 10, 2024 at 5:01 PM
Updated December 13, 2024 at 9:43 PM
Resolved December 13, 2024 at 9:43 PM

Flag notifications