-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use runner pkg in loki source file #2428
base: main
Are you sure you want to change the base?
Conversation
func (c *Component) reportSize(path, labels string) { | ||
// Ask the reader to update the size if a reader exists, this keeps | ||
// position and size metrics in sync. | ||
if reader, ok := c.readers[positions.Entry{Path: path, Labels: labels}]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic was wrong because the readers were reset at the start of every update and because the readers were dedup before the size report so I just got rid of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would ideally like to see it running for some time in a dev cluster so we can verify there are no obvious issues with it. Do you think we could set something up? 🤔
Co-authored-by: Piotr <[email protected]>
return | ||
} | ||
|
||
if fi.Size() < pos { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a file is rotated but the new rotated file is larger than the previous, do we miss the first pos
lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we do. That would be a problem for Windows because on Unix the file rotation does not exit the readlines function. But it also a problem in general: if you stop Alloy and replace the file with another file of the same name but bigger, it will also only start reading at the saved pos.
I suggest to add a comment there and see whether this becomes a problem or not. If it does, saving and comparing the file creation timestamps might be a good approach to solve this
level.Error(t.logger).Log("msg", "error marking file position when stopping tailer", "path", t.path, "error", err) | ||
} | ||
// Save the current position before shutting down tailer | ||
err := t.markPositionAndSize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mark after we wait for readLines
to complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it for the decompressor but for the tailer we can't do it directly because you cannot get the tail size after you stopped the tail
Co-authored-by: Sam DeHaan <[email protected]>
Co-authored-by: Sam DeHaan <[email protected]>
@thampiotr @dehaansa while going over the review from Sam, I noticed that there were some race conditions between the Stop and the Run functions. To address it:
I hope that the logic is still ok to follow. The added complexity is not ideal but all the different go routines that this component is spawning are not making the job easy. LMK what you think I ran it with 4000 files and did not see any problem + there was still a 50% CPU improvement |
PR Description
Why is a rework needed?
The current implementation of the loki.source.file has a bug on Windows: if you rotate a file by deleting it and recreating it with the same name, the file won't be tailed anymore unless a discovery component had time to update that the file was every missing (which does not usually happen because the sync period is usually much higher that the time it takes to rotate the file).
An attempt to fix this bug was done here: #2282, but the fix was not satisfying:
From there it was decided to rework the component to add a proper retry mechanism.
What are the changes?
Previously, all readers were stopped and deleted on every update. The tailing was also started in the Update function.
Now the readers are created unstarted in the Update function and the runner pkg that we use in other components is responsible for running them in the run function.
What are the gains from the rework?
What about the performances?
I ran locally in parallel two Alloy (latest main vs rework). They tailed 2000 files each containing a few lines of logs. Every 3 seconds 10 of these files were deleted and 10 new ones were created with different names.
The number of goroutines and the amount of memory are still the same:
But in terms of CPU, there seems to be a significant improvement (50% according to the flamegraph):
Before
After
Which issue(s) this PR fixes
Fixes #2345
PR Checklist