Unregister old metrics when a node gets removed or changes RPC address

Description

When a node gets removed (or gets a new broadcastRpcAddress, see https://datastax-oss.atlassian.net/browse/JAVA-2323#icft=JAVA-2323), its metrics stay there with the last known value.

We might want to unregister them. This is a bit tricky other components might still update them if they didn't get the node change yet, and the registry will recreate a metric if it does not exist.

Environment

None

Pull Requests

None

Activity

Show:
Tomasz Lelek
January 23, 2020 at 2:27 PM

I was trying the approach with unregistering on this PR:

https://github.com/riptano/java-driver/pull/29

but approach looks a lot better and cleaner.

The one thing that maybe useful from my work is MetricsSimulacronIT:

https://github.com/riptano/java-driver/pull/29/files#diff-94fc9461f0077238bb1db878f7439fa4R38

Alex Dutra
January 7, 2020 at 1:45 PM
(edited)

Wouldn’t it be possible to wrap the registry in a Guava Cache with eviction policy based on access?

Tentative implementation:

 

@ThreadSafe public abstract class DropwizardMetricUpdater<MetricT> implements MetricUpdater<MetricT> { private static final Logger LOG = LoggerFactory.getLogger(DropwizardMetricUpdater.class); protected final Set<MetricT> enabledMetrics; protected final MetricRegistry registry; protected final Cache<String, Metric> metricsCache; protected DropwizardMetricUpdater(Set<MetricT> enabledMetrics, MetricRegistry registry) { this(enabledMetrics, registry, Duration.ofHours(1)); } protected DropwizardMetricUpdater( Set<MetricT> enabledMetrics, MetricRegistry registry, Duration expiresAfter) { this.enabledMetrics = enabledMetrics; this.registry = registry; metricsCache = CacheBuilder.newBuilder() .expireAfterAccess(expiresAfter) .removalListener( (RemovalNotification<String, Metric> notif) -> { LOG.debug("Removing {} from cache after {}", notif.getKey(), expiresAfter); registry.remove(notif.getKey()); }) .build(); } protected abstract String buildFullName(@NonNull MetricT metric, @Nullable String profileName); @Override public void incrementCounter(MetricT metric, String profileName, long amount) { if (isEnabled(metric, profileName)) { getOrCreateMetric(metric, profileName, registry::counter).inc(); } } @Override public void updateHistogram(MetricT metric, String profileName, long value) { if (isEnabled(metric, profileName)) { getOrCreateMetric(metric, profileName, registry::histogram).update(value); } } @Override public void markMeter(MetricT metric, String profileName, long amount) { if (isEnabled(metric, profileName)) { getOrCreateMetric(metric, profileName, registry::meter).mark(amount); } } @Override public void updateTimer(MetricT metric, String profileName, long duration, TimeUnit unit) { if (isEnabled(metric, profileName)) { getOrCreateMetric(metric, profileName, registry::timer).update(duration, unit); } } @SuppressWarnings({"unchecked", "TypeParameterUnusedInFormals"}) @Nullable public <T extends Metric> T getMetric(@NonNull MetricT metric, @Nullable String profileName) { String name = buildFullName(metric, profileName); return (T) metricsCache.getIfPresent(name); } @SuppressWarnings("unchecked") @NonNull protected <T extends Metric> T getOrCreateMetric( @NonNull MetricT metric, @Nullable String profileName, @NonNull Function<String, T> metricBuilder) { String name = buildFullName(metric, profileName); try { return (T) metricsCache.get(name, () -> metricBuilder.apply(name)); } catch (ExecutionException ignored) { // should never happen throw new AssertionError(); } catch (UncheckedExecutionException e) { throw (RuntimeException) e.getCause(); } } @Override public boolean isEnabled(@NonNull MetricT metric, @Nullable String profileName) { return enabledMetrics.contains(metric); } protected void initializeDefaultCounter(@NonNull MetricT metric, @Nullable String profileName) { if (isEnabled(metric, profileName)) { // Just initialize eagerly so that the metric appears even when it has no data yet getOrCreateMetric(metric, profileName, registry::counter); } } protected void initializeHdrTimer( MetricT metric, DriverExecutionProfile config, DriverOption highestLatencyOption, DriverOption significantDigitsOption, DriverOption intervalOption) { String profileName = config.getName(); if (isEnabled(metric, profileName)) { // Initialize eagerly to use the custom implementation getOrCreateMetric( metric, profileName, metricName -> { Duration highestLatency = config.getDuration(highestLatencyOption); final int significantDigits; int d = config.getInt(significantDigitsOption); if (d >= 0 && d <= 5) { significantDigits = d; } else { LOG.warn( "[{}] Configuration option {} is out of range (expected between 0 and 5, found {}); " + "using 3 instead.", metricName, significantDigitsOption, d); significantDigits = 3; } Duration refreshInterval = config.getDuration(intervalOption); HdrReservoir reservoir = new HdrReservoir(highestLatency, significantDigits, refreshInterval, metricName); return registry.register(metricName, new Timer(reservoir)); }); } } }

 

Tomasz Lelek
January 7, 2020 at 11:10 AM

After discussion during the standup, we concluded that keeping metrics for nodes that were DELETED or have a new IP address may be beneficial for the clients.

In the case of debugging issues with a discarded node, having historical metrics exposed via MetricRegistry may be beneficial for users.

let me know if you agree with that - in such a case I will close this ticket.

 

Fixed

Details

Assignee

Reporter

Fix versions

Reviewer

Sprint

Priority

Created July 11, 2019 at 1:58 AM
Updated March 17, 2021 at 12:35 PM
Resolved July 24, 2020 at 8:22 AM