Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have the async sink itself write an error log event once it's buffer capacity is at, say, 10%? #77

Open
chtenb opened this issue Sep 17, 2021 · 12 comments

Comments

@chtenb
Copy link

chtenb commented Sep 17, 2021

Is it possible to have the async sink itself write an error log event once it's buffer capacity is at 10%?

@bartelink
Copy link
Member

bartelink commented Sep 17, 2021

There's no feature to directly do exactly what you asked, but the monitoring mechanism is intended to provide you the ability to do anything you desire. (I recommend reading the source of the Async Sink - its extremely small, and you'll get a clear picture of all it does and how in minimum time)

@chtenb
Copy link
Author

chtenb commented Sep 18, 2021

Yeah I read the sample code, but I have no clue as to what that HealthMonitor thing is. We have no such thing in our app.

@bartelink
Copy link
Member

That's it - you need to decide what to plug in - when you configure the WriteTo.Async, you supply the callback and can do the logging etc (note of course you need to be careful not to fill the log etc - i.e. you need to make sure not to write too frequently). You could for instance take the number of waiting items and emit that as a metric using Prometheus or similar. IIRC the tests might show you some example wiring, but really it's up to you what you want to do.

@chtenb
Copy link
Author

chtenb commented Sep 18, 2021

but really it's up to you what you want to do.

I want an error log event to be inserted whenever a threshold is surpassed. The log stream generated by serilog is under scrutiny, so if any warnings related to buffer capacity would go in there it would immediately be seen. We don't have any other error reporting mechanisms beside our serilog stream.

@bartelink
Copy link
Member

I get that - I was giving you suggestions for code for you to write though - you know your circumstances best. All you have to do is write the code (one reason that's not done generically is for example you would not want lots of noise from the backlog oscillating up and down around the 10% mark etc). For avoidance of doubt: there is no built in behaviour that does exactly what you are asking (and it would not make sense to add one unless a proper spec can be formulated for something which is sufficiently generic that pretty much anyone would switch it on).

@chtenb
Copy link
Author

chtenb commented Sep 18, 2021

I understand your point. The trouble however of writing some external health check mechanism manually is not small though and comes with extra concerns

  • How often do you check for health?
  • Do you have the health checker running on a dedicated thread?
  • How do we know the health check thread is (still) running, should we implement a heart beat mechanism?
  • etc

and it would not make sense to add one unless a proper spec can be formulated for something which is sufficiently generic that pretty much anyone would switch it on

I'll give this some thought

@chtenb
Copy link
Author

chtenb commented Sep 20, 2021

Another issue with periodic checking is that you may easily miss spikes that were resolved too quick for the monitor to notice.

(one reason that's not done generically is for example you would not want lots of noise from the backlog oscillating up and down around the 10% mark etc)

What about if we provide two configuration options, int capacityPercentageWarningThreshold = 50, int minSecondsBetweenWarningEvents = 60.
Then in the Emit method of the backgroundworkersink have a simple and cheap check that writes to SelfLog if it fails, consistent with other failure scenarios.

@bartelink
Copy link
Member

You're right about the fact that the detection is only going to be as good as the frequency of your polling - the mechanism does not offer a push / blocking based callback mechanism. For me, that's a good thing - the possibility of a callback hanging/blocking/delaying my logging is one thing I don't want to constantly be considering.

In my usage, I wrote warning messages regarding slowness writing to my Async-over-File logging to a separate healthchecks-only Console sink without buffering / Async behaviors in play - that way the diagnostic messages don't get stuck at the end of the backlog. (some might do that for SelfLog too)

I'd suggest experimenting with this locally in your environment first - if you arrive at something that works well, by all means propose it after you've satisfied yourself and see if others have identical requirements. However, I think a universally applicable rule is going to be hard to define given how many different app and hosting models there are (as mentioned, some people may wish to expose the buffer usage as a metric, others may hook it into healthchecks, bt the bulk simply won't care)

@lucacestola
Copy link

I followed more or less the same path as @chtenb and implemented some monitoring mechanisms that allow a minimum time between alerts to avoid flooding the log with warning messages. I also use the asynchronous logger in non-blocking mode and therefore accept that I may miss some messages.

In the monitoring class, I have therefore delegated the task of issuing the warning to Serilog.Log.Warning() so that I can keep track of the event within other logs.

This allows me to have evidence in other logs of messages lost on the asynchronous sink. However, I would prefer to be able to queue the Warning message in the underlying wrapped sinks as well, so that I can track the lost log events in the same log, but this is not possible because the wrapped sinks are not visible and the BackgroundWorkerSink class is sealed.

Do you think there might be a workaround for this?

@chtenb
Copy link
Author

chtenb commented Jan 12, 2022

What I did in the end is implement a file sink for the self log and have the warning messages I proposed write to that self log.

@lucacestola
Copy link

It is a good workaround in the short term even if I find it limiting

On the other hand, it should be noted that what I'm trying to do isn't always applicable and we need to distinguish between two cases where the queue fills up:

  • The wrapped sink is flooded relative to the input rate, but is still operational
  • The wrapped sink is stuck and will not be available anytime soon

In the second case, trying to queue logs to the sink below is useless but it's still the first case.

@lonix1
Copy link

lonix1 commented Nov 26, 2024

See my sink health monitoring and reporting implementation here.

It should be generic enough for general use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants