Add metadata.schema.ignored-keyspaces option, and ignore all system keyspaces by default
Description
Environment
Pull Requests
Activity
Olivier Michallat September 1, 2020 at 11:20 PM
For refreshed_keyspaces I'd like to keep the server-side filtering in order to avoid fetching too much data
Hrmm I don't like the asymmetry of having patterns on one side and not the other... OK, I'll allow it, but I'll add a recommendation in the docs to prefer name inclusions if possible.
Olivier Michallat August 28, 2020 at 9:47 PM
No. One risk is that the pattern could be too eager, but with something like system
, system_.*
and dse_.*
it should be pretty safe.
For refreshed_keyspaces
I'd like to keep the server-side filtering in order to avoid fetching too much data.
Alex Dutra August 28, 2020 at 8:29 AM
That wouldn’t future-proof the token map against more system keyspaces that could be added in the future, would it?
Olivier Michallat August 27, 2020 at 7:17 PM
It's an exclusion, we can add all the names that have ever existed.
Alex Dutra August 26, 2020 at 10:19 AM
By default it should be set to every system keyspace in Cassandra and DSE.
The exact contents and names of system keyspaces evolved across C* versions (e.g. system_schema
appeared in C* 3.0).
I think it would be safer and more future-proof to introduce the ability to filter by regular expressions, that is, ignored-keyspaces = [ “system.*” ]
.
I’d also add that ability to refreshed-keyspaces
for consistency. However in this case I guess all the filtering will have to be done client-side.
The token map is often a source of memory issues with large clusters. The
refreshed-keyspaces
configuration option can help, but it's opt-in, the driver loads everything by default. It would be better to take a more proactive approach.It's probably safe to assume that applications don't usually need metadata about system keyspaces. So one thing we could do is ignore them by default. We need a new option that is the opposite of
refreshed-keyspaces
, proposed name:advanced.metadata.schema.ignored-keyspaces
. By default it should be set to every system keyspace in Cassandra and DSE.If a keyspace is present in both
refreshed-keyspaces
andignored-keyspaces
, we should include it, but log a warning.From an implementation perspective, I don't think we can handle excludes in the WHERE clause like we do for includes. But we can filter on the client side, possibly in
CassandraSchemaRows.Builder
.