WIP Add batching for gauge metrics by mplachter · Pull Request #57 · zapier/prom-aggregation-gateway

mplachter · 2023-01-06T22:27:44Z

POC Adding batching for metrics on a interval

TODO:

Add AVG during the batch interval
Add TimeStamp Filtering for old metrics

Signed-off-by: Matt Plachter <matthew.plachter@zapier.com>

djeebus · 2023-01-06T22:48:45Z

cmd/root.go


 	rootCmd.PersistentFlags().StringSliceVar(&cfg.AuthUsers, "AuthUsers", []string{}, "List of allowed auth users and their passwords comma separated\n Example: \"user1=pass1,user2=pass2\"")
 	rootCmd.PersistentFlags().StringVar(&cfg.ApiListen, "apiListen", ":80", "Listen for API requests on this host/port.")
+	rootCmd.PersistentFlags().IntVar(&cfg.MetricBatchInterval, "metricBatchInterval", 5, "The amount of seconds to batch the metrics collected")


resetting on a timer requires us to sync this up w/ prometheus' scraping schedule, right? Default seems to be 60 seconds, which means we would by (by default) averaging the last 5 seconds worth of gauge stats, and ignoring the 55 seconds before that.

wouldn't resetting the stats whenever the metrics got scraped avoid this issue?

Technically but than if someone/something just hits the endpoint for testing purposes all the metrics get jacked up

What we could do is check if it's been scraped during the interval and if so then we can reset the gauges if the endpoint hasn't been scraped yet we can merge in gauges and if it has we can reset gauges....

Signed-off-by: Matt Plachter <matthew.plachter@zapier.com>

faangbait · 2023-03-11T00:57:55Z

Hey, I see people are trying to solve the same problem I wandered over here from.

The solution I had was also "callback on GET /metrics that wipes after a scrape." If you're aggregating and squashing labels, that's the only way to prevent double counting.

As far as "what if a user touches the endpoint," that's easily solved with HTTP Basic Auth.

I'm sure everybody's in the same boat with "they don't build computers with the amount of memory we need, so let's aggregate some stuff." Wipe-on-scrape has the added benefit of being homeostatic.

Specifically, as the fleet that's being aggregated horizontally scales, memory usage increases at the aggregator level. You could tune memory usage by scraping more often, causing the aggregators to wipe more often. But this would increase memory requirements at the central server.

But then you just aggregate again through another layer of aggregators. From there, it's aggregators all the way down and the SREs are happy!

mplachter · 2023-03-15T18:37:24Z

@faangbait, thanks for your insight here. I agree, I don't think duplicating all the metrics is valuable.

We do have a few issues with gauges tho, as we don't want to add them up we need to calculate a floating average for a given probed period. We also can't just keep averaging them out between probe intervals as thats dependent on the endpoint being scrapped, which could lead to an average over different time durations.

WIP Add batching for gauge metrics

bdbab36

Signed-off-by: Matt Plachter <matthew.plachter@zapier.com>

mplachter force-pushed the add-batching-for-gauge-metrics branch from bbe7325 to bdbab36 Compare January 6, 2023 22:28

djeebus reviewed Jan 6, 2023

View reviewed changes

fix lint issue

2be4980

Signed-off-by: Matt Plachter <matthew.plachter@zapier.com>

mplachter force-pushed the add-batching-for-gauge-metrics branch from 1c4f10a to 2be4980 Compare January 6, 2023 22:50

fix linting issue on double import

6dda956

Signed-off-by: Matt Plachter <matthew.plachter@zapier.com>

mplachter mentioned this pull request Sep 1, 2023

feat: flag to enable gauge replacement #77

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP Add batching for gauge metrics#57

WIP Add batching for gauge metrics#57
mplachter wants to merge 3 commits intomainfrom
add-batching-for-gauge-metrics

mplachter commented Jan 6, 2023

Uh oh!

djeebus Jan 6, 2023

Uh oh!

mplachter Jan 6, 2023 •

edited

Loading

Uh oh!

mplachter Jan 6, 2023

Uh oh!

faangbait commented Mar 11, 2023 •

edited

Loading

Uh oh!

mplachter commented Mar 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mplachter commented Jan 6, 2023

Uh oh!

djeebus Jan 6, 2023

Choose a reason for hiding this comment

Uh oh!

mplachter Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mplachter Jan 6, 2023

Choose a reason for hiding this comment

Uh oh!

faangbait commented Mar 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mplachter commented Mar 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mplachter Jan 6, 2023 •

edited

Loading

faangbait commented Mar 11, 2023 •

edited

Loading