Extend driver vector support to arbitrary subtypes and fix handling of variable length types (OSS C* 5.0)

Description

Python version of . Copying relevant text from that ticket so this guy can stand on his own.

 

In conversation around a different issue mentioned that OSS Cassandra 5.0 will support vectors with arbitrary subtypes. More info is available in CASSANDRA-18960.

 

We also need to add support for varint types as subtypes. There has been some initial discussion of this on CASSANDRA-18504 but it didn’t really go anywhere.

 

Unlike the Java driver the Python driver has at least some built-in assumptions about subtype size. These were put in place to get vector support for Astra out quickly but need to be generalized to support arbitrary subtypes.

Environment

None

Pull Requests

None

Activity

Bret McGuire August 29, 2024 at 5:15 PM

Note: the tuple encoding issue referenced in my comment below has been moved into it’s own ticket

Bret McGuire August 26, 2024 at 9:16 PM
Edited

While testing this work I came upon an interesting limitation of the current Python driver. When using positional parameters in simple (i.e. non-prepared) statements to populate vectors of tuples one comes across an interesting error:

 

 

 

This is a server-side error triggered when handling the array literal in a query string. In the context of this code the receiver is the type that will be receiving the input value. Clearly something is off here as the code is interpreting the passed query string as an attempt to set the value of a vector using a tuple as the param… and that’s definitely not what we’re intending.

 

I observe no problem when executing these commands via cqlsh so the issue pretty clearly appears to be somewhere in what the Python driver is doing to generate these strings. Logging the raw outgoing query made it clear the driver is doing something odd when we encode those params:

 

 

which led directly to:

 

 

which in turn led to this bit of code indicating that tuples are currently encoded as lists. Changing this could be quite disruptive (presumably that’s why the plan was to change at the next major) so for now we just note a constraint that positional params of this type don’t work for vectors of tuples.

Andres de la Peña December 6, 2023 at 4:28 PM

While we get full support for those vector data types, we might want to throw an error if the vector type isn’t supported. Otherwise, wrong deserialization can either fail or silently provide wrong results. For example:

I don’t know if we want to do that on a separate ticket.

I think the problematic vector subtypes are those having variable size, like varint, text, etc.

It also seems that nested vectors such as vector<vector<int,2>,2> fail deserialization, even if they have fixed size.

Fixed

Details

Assignee

Reporter

Fix versions

Priority

Created November 6, 2023 at 7:40 PM
Updated January 17, 2025 at 12:39 AM
Resolved September 4, 2024 at 4:27 PM