Unregister old metrics when a node gets removed or changes RPC address
Description
Environment
Pull Requests
Activity
I was trying the approach with unregistering on this PR:
https://github.com/riptano/java-driver/pull/29
but @Alex Dutra approach looks a lot better and cleaner.
The one thing that maybe useful from my work is MetricsSimulacronIT:
https://github.com/riptano/java-driver/pull/29/files#diff-94fc9461f0077238bb1db878f7439fa4R38
Wouldn’t it be possible to wrap the registry in a Guava Cache with eviction policy based on access?
Tentative implementation:
@ThreadSafe
public abstract class DropwizardMetricUpdater<MetricT> implements MetricUpdater<MetricT> {
private static final Logger LOG = LoggerFactory.getLogger(DropwizardMetricUpdater.class);
protected final Set<MetricT> enabledMetrics;
protected final MetricRegistry registry;
protected final Cache<String, Metric> metricsCache;
protected DropwizardMetricUpdater(Set<MetricT> enabledMetrics, MetricRegistry registry) {
this(enabledMetrics, registry, Duration.ofHours(1));
}
protected DropwizardMetricUpdater(
Set<MetricT> enabledMetrics, MetricRegistry registry, Duration expiresAfter) {
this.enabledMetrics = enabledMetrics;
this.registry = registry;
metricsCache =
CacheBuilder.newBuilder()
.expireAfterAccess(expiresAfter)
.removalListener(
(RemovalNotification<String, Metric> notif) -> {
LOG.debug("Removing {} from cache after {}", notif.getKey(), expiresAfter);
registry.remove(notif.getKey());
})
.build();
}
protected abstract String buildFullName(@NonNull MetricT metric, @Nullable String profileName);
@Override
public void incrementCounter(MetricT metric, String profileName, long amount) {
if (isEnabled(metric, profileName)) {
getOrCreateMetric(metric, profileName, registry::counter).inc();
}
}
@Override
public void updateHistogram(MetricT metric, String profileName, long value) {
if (isEnabled(metric, profileName)) {
getOrCreateMetric(metric, profileName, registry::histogram).update(value);
}
}
@Override
public void markMeter(MetricT metric, String profileName, long amount) {
if (isEnabled(metric, profileName)) {
getOrCreateMetric(metric, profileName, registry::meter).mark(amount);
}
}
@Override
public void updateTimer(MetricT metric, String profileName, long duration, TimeUnit unit) {
if (isEnabled(metric, profileName)) {
getOrCreateMetric(metric, profileName, registry::timer).update(duration, unit);
}
}
@SuppressWarnings({"unchecked", "TypeParameterUnusedInFormals"})
@Nullable
public <T extends Metric> T getMetric(@NonNull MetricT metric, @Nullable String profileName) {
String name = buildFullName(metric, profileName);
return (T) metricsCache.getIfPresent(name);
}
@SuppressWarnings("unchecked")
@NonNull
protected <T extends Metric> T getOrCreateMetric(
@NonNull MetricT metric,
@Nullable String profileName,
@NonNull Function<String, T> metricBuilder) {
String name = buildFullName(metric, profileName);
try {
return (T) metricsCache.get(name, () -> metricBuilder.apply(name));
} catch (ExecutionException ignored) {
// should never happen
throw new AssertionError();
} catch (UncheckedExecutionException e) {
throw (RuntimeException) e.getCause();
}
}
@Override
public boolean isEnabled(@NonNull MetricT metric, @Nullable String profileName) {
return enabledMetrics.contains(metric);
}
protected void initializeDefaultCounter(@NonNull MetricT metric, @Nullable String profileName) {
if (isEnabled(metric, profileName)) {
// Just initialize eagerly so that the metric appears even when it has no data yet
getOrCreateMetric(metric, profileName, registry::counter);
}
}
protected void initializeHdrTimer(
MetricT metric,
DriverExecutionProfile config,
DriverOption highestLatencyOption,
DriverOption significantDigitsOption,
DriverOption intervalOption) {
String profileName = config.getName();
if (isEnabled(metric, profileName)) {
// Initialize eagerly to use the custom implementation
getOrCreateMetric(
metric,
profileName,
metricName -> {
Duration highestLatency = config.getDuration(highestLatencyOption);
final int significantDigits;
int d = config.getInt(significantDigitsOption);
if (d >= 0 && d <= 5) {
significantDigits = d;
} else {
LOG.warn(
"[{}] Configuration option {} is out of range (expected between 0 and 5, found {}); "
+ "using 3 instead.",
metricName,
significantDigitsOption,
d);
significantDigits = 3;
}
Duration refreshInterval = config.getDuration(intervalOption);
HdrReservoir reservoir =
new HdrReservoir(highestLatency, significantDigits, refreshInterval, metricName);
return registry.register(metricName, new Timer(reservoir));
});
}
}
}
After discussion during the standup, we concluded that keeping metrics for nodes that were DELETED or have a new IP address may be beneficial for the clients.
In the case of debugging issues with a discarded node, having historical metrics exposed via MetricRegistry
may be beneficial for users.
@Olivier Michallat @Alex Dutra let me know if you agree with that - in such a case I will close this ticket.
When a node gets removed (or gets a new broadcastRpcAddress, see https://datastax-oss.atlassian.net/browse/JAVA-2323#icft=JAVA-2323), its metrics stay there with the last known value.
We might want to unregister them. This is a bit tricky other components might still update them if they didn't get the node change yet, and the registry will recreate a metric if it does not exist.