Conversation
Signed-off-by: Matt Plachter <matthew.plachter@zapier.com>
bbe7325 to
bdbab36
Compare
|
|
||
| rootCmd.PersistentFlags().StringSliceVar(&cfg.AuthUsers, "AuthUsers", []string{}, "List of allowed auth users and their passwords comma separated\n Example: \"user1=pass1,user2=pass2\"") | ||
| rootCmd.PersistentFlags().StringVar(&cfg.ApiListen, "apiListen", ":80", "Listen for API requests on this host/port.") | ||
| rootCmd.PersistentFlags().IntVar(&cfg.MetricBatchInterval, "metricBatchInterval", 5, "The amount of seconds to batch the metrics collected") |
There was a problem hiding this comment.
resetting on a timer requires us to sync this up w/ prometheus' scraping schedule, right? Default seems to be 60 seconds, which means we would by (by default) averaging the last 5 seconds worth of gauge stats, and ignoring the 55 seconds before that.
wouldn't resetting the stats whenever the metrics got scraped avoid this issue?
There was a problem hiding this comment.
Technically but than if someone/something just hits the endpoint for testing purposes all the metrics get jacked up
There was a problem hiding this comment.
What we could do is check if it's been scraped during the interval and if so then we can reset the gauges if the endpoint hasn't been scraped yet we can merge in gauges and if it has we can reset gauges....
Signed-off-by: Matt Plachter <matthew.plachter@zapier.com>
1c4f10a to
2be4980
Compare
Signed-off-by: Matt Plachter <matthew.plachter@zapier.com>
|
Hey, I see people are trying to solve the same problem I wandered over here from. The solution I had was also "callback on GET /metrics that wipes after a scrape." If you're aggregating and squashing labels, that's the only way to prevent double counting. As far as "what if a user touches the endpoint," that's easily solved with HTTP Basic Auth. I'm sure everybody's in the same boat with "they don't build computers with the amount of memory we need, so let's aggregate some stuff." Wipe-on-scrape has the added benefit of being homeostatic. Specifically, as the fleet that's being aggregated horizontally scales, memory usage increases at the aggregator level. You could tune memory usage by scraping more often, causing the aggregators to wipe more often. But this would increase memory requirements at the central server. But then you just aggregate again through another layer of aggregators. From there, it's aggregators all the way down and the SREs are happy! |
|
@faangbait, thanks for your insight here. I agree, I don't think duplicating all the metrics is valuable. We do have a few issues with gauges tho, as we don't want to add them up we need to calculate a floating average for a given probed period. We also can't just keep averaging them out between probe intervals as thats dependent on the endpoint being scrapped, which could lead to an average over different time durations. |
TODO: