Add metadata.schema.ignored-keyspaces option, and ignore all system keyspaces by default

Description

The token map is often a source of memory issues with large clusters. The refreshed-keyspaces configuration option can help, but it's opt-in, the driver loads everything by default. It would be better to take a more proactive approach.

It's probably safe to assume that applications don't usually need metadata about system keyspaces. So one thing we could do is ignore them by default. We need a new option that is the opposite of refreshed-keyspaces, proposed name: advanced.metadata.schema.ignored-keyspaces. By default it should be set to every system keyspace in Cassandra and DSE.
If a keyspace is present in both refreshed-keyspaces and ignored-keyspaces, we should include it, but log a warning.

From an implementation perspective, I don't think we can handle excludes in the WHERE clause like we do for includes. But we can filter on the client side, possibly in CassandraSchemaRows.Builder.

Environment

None

Pull Requests

None

Activity

Show:

Olivier Michallat September 1, 2020 at 11:20 PM

For refreshed_keyspaces I'd like to keep the server-side filtering in order to avoid fetching too much data

Hrmm I don't like the asymmetry of having patterns on one side and not the other... OK, I'll allow it, but I'll add a recommendation in the docs to prefer name inclusions if possible.

Olivier Michallat August 28, 2020 at 9:47 PM

No. One risk is that the pattern could be too eager, but with something like system, system_.* and dse_.* it should be pretty safe.

For refreshed_keyspaces I'd like to keep the server-side filtering in order to avoid fetching too much data.

Alex Dutra August 28, 2020 at 8:29 AM

That wouldn’t future-proof the token map against more system keyspaces that could be added in the future, would it?

Olivier Michallat August 27, 2020 at 7:17 PM

It's an exclusion, we can add all the names that have ever existed.

Alex Dutra August 26, 2020 at 10:19 AM

By default it should be set to every system keyspace in Cassandra and DSE.

The exact contents and names of system keyspaces evolved across C* versions (e.g. system_schema appeared in C* 3.0).

I think it would be safer and more future-proof to introduce the ability to filter by regular expressions, that is, ignored-keyspaces = [ “system.*” ].

I’d also add that ability to refreshed-keyspaces for consistency. However in this case I guess all the filtering will have to be done client-side.

Fixed

Details

Assignee

Reporter

Fix versions

Priority

Created August 25, 2020 at 8:03 PM
Updated December 18, 2020 at 4:45 PM
Resolved December 18, 2020 at 4:45 PM