Add metadata.schema.ignored-keyspaces option, and ignore all system keyspaces by default

Description

The token map is often a source of memory issues with large clusters. The refreshed-keyspaces configuration option can help, but it's opt-in, the driver loads everything by default. It would be better to take a more proactive approach.

It's probably safe to assume that applications don't usually need metadata about system keyspaces. So one thing we could do is ignore them by default. We need a new option that is the opposite of refreshed-keyspaces, proposed name: advanced.metadata.schema.ignored-keyspaces. By default it should be set to every system keyspace in Cassandra and DSE.
If a keyspace is present in both refreshed-keyspaces and ignored-keyspaces, we should include it, but log a warning.

From an implementation perspective, I don't think we can handle excludes in the WHERE clause like we do for includes. But we can filter on the client side, possibly in CassandraSchemaRows.Builder.

Environment

None

Pull Requests

None

Activity

Show:
Olivier Michallat
September 1, 2020, 11:20 PM

For refreshed_keyspaces I'd like to keep the server-side filtering in order to avoid fetching too much data

Hrmm I don't like the asymmetry of having patterns on one side and not the other... OK, I'll allow it, but I'll add a recommendation in the docs to prefer name inclusions if possible.

Olivier Michallat
August 28, 2020, 9:47 PM

No. One risk is that the pattern could be too eager, but with something like system, system_.* and dse_.* it should be pretty safe.

For refreshed_keyspaces I'd like to keep the server-side filtering in order to avoid fetching too much data.

Alex Dutra
August 28, 2020, 8:29 AM

That wouldn’t future-proof the token map against more system keyspaces that could be added in the future, would it?

Olivier Michallat
August 27, 2020, 7:17 PM

It's an exclusion, we can add all the names that have ever existed.

Alex Dutra
August 26, 2020, 10:19 AM

By default it should be set to every system keyspace in Cassandra and DSE.

The exact contents and names of system keyspaces evolved across C* versions (e.g. system_schema appeared in C* 3.0).

I think it would be safer and more future-proof to introduce the ability to filter by regular expressions, that is, ignored-keyspaces = [ “system.*” ].

I’d also add that ability to refreshed-keyspaces for consistency. However in this case I guess all the filtering will have to be done client-side.

Fixed

Assignee

Olivier Michallat

Reporter

Olivier Michallat

Fix versions