Skip to content

Periodic GraphiteReporter Exception: Unable to report to Graphite Errors while running Circus Train #120

@abhimanyugupta07

Description

@abhimanyugupta07

When I run Circus Train with Graphite Configured for a table which takes a while to sync, I am seeing the following error periodically every 4 minutes in the run logs:

19/02/19 14:51:57 INFO s3s3copier.S3S3Copier: Replicating...': 30% complete
19/02/19 14:51:57 INFO s3s3copier.S3S3Copier: Replicating...': 30% complete

19/02/19 14:53:51 WARN graphite.GraphiteReporter: Unable to report to Graphite
java.net.SocketException: Broken pipe (Write failed)
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
	at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
	at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
	at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
	at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
	at java.io.BufferedWriter.flush(BufferedWriter.java:254)
	at com.hotels.shaded.com.codahale.metrics.graphite.Graphite.flush(Graphite.java:151)
	at com.hotels.shaded.com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:190)
	at com.hotels.shaded.com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162)
	at com.hotels.shaded.com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

19/02/19 14:54:04 INFO s3s3copier.S3S3Copier: Replicating...': 30% complete
19/02/19 14:54:04 INFO s3s3copier.S3S3Copier: Replicating...': 31% complete

Possible reason:
There are two GraphiteReporter beans instantiated within the CT Spring application. One that we create and configure for Graphite and a default GraphiteReporter bean created by Spring which points to localhost. The default bean keeps retrying the connection to flush the metrics and fails because there is no connection.

In Waggle-Dance, the way this issue is handled by disabling the default GraphiteReporter.

https://github.com/HotelsDotCom/waggle-dance/blob/master/waggle-dance-core/src/main/java/com/hotels/bdp/waggledance/metrics/MonitoringConfiguration.java#L37

Further investigation will be required to be sure as Circus Train has not been migrated to Micrometer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions