Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checking in changes in verify-loader script for log loss simulation #17

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions verify-loader
Original file line number Diff line number Diff line change
Expand Up @@ -122,8 +122,10 @@ def verify(input_gen, report_interval=REPORT_INTERVAL):

ignored_bytes = 0
ignored_count = 0
loss_count = 0

report_bytes = 0
# report stats after every <report_interval> MB
report_bytes_target = report_interval * MB
report_ignored_bytes = 0
report_ignored_count = 0
Expand All @@ -135,16 +137,21 @@ def verify(input_gen, report_interval=REPORT_INTERVAL):
try:
for line in input_gen:
line_len = len(line)
if not line.startswith("loader seq - "):
#find the log header: loader seq -
#for container logs the line will not start with header - instead there's a timestamp; if header not present ignore this line
if "loader seq - " not in line:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the log lines read by verify-loader have a fixed prefix added to them, then something else should process that header away before sending the log lines to the verify-loader.

I don't think we should try to endow verify-loader with an understanding of how to pull out the log lines.

report_ignored_bytes += line_len
report_ignored_count += 1
else:
#check that line read has constituent parts after header (<header> - <uuid> - <seq_num> - <payload>)
try:
_, invocid, seqval, payload = line.split('-', 4)
indx = line.find('loader seq -')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, we don't want to add support for prefixed data of the log lines generated by the loader. If there is a prefix, let's have another tool strip it before sending the data to verify-loader.

_, invocid, seqval, payload = line[indx:].split('-', 4)
except Exception:
report_ignored_bytes += line_len
report_ignored_count += 1
else:
#check if seq_num is valid
try:
seq = int(seqval)
except Exception:
Expand All @@ -155,9 +162,11 @@ def verify(input_gen, report_interval=REPORT_INTERVAL):
invocid = invocid.strip()
ctx = contexts[invocid]
prev = ctx.msg(seq, line_len)
# check for out of order lines - this implies LOG LOSS
if prev is not None:
# Bad record encountered, flag it
print("%s: %d %d <-" % (invocid, seq, prev))
loss_count += (seq-prev)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if seq is less than prev this will be a negative value. I don't think we want to account for loss that way.

Instead we might want to consider two conditions: seq > prev + 1 and seq <= prev.

The add the distance between seq and prev makes sense on the first condition. But for the second condition we'll want to keep track of contiguous ranges, expanding the range as the sequence grows, creating a new sequence when a gap is encountered, and looking for duplicates by seeing if the new seq is in any known ranges.

sys.stdout.flush()
if payload.startswith(" (stats:"):
print_stats(invocid, ctx, payload)
Expand Down Expand Up @@ -192,6 +201,7 @@ def verify(input_gen, report_interval=REPORT_INTERVAL):
(total_count / (now - start)),
(ignored_bytes / MB) / (now - start),
(ignored_count / (now - start))))
print("interval stats:: total bytes: %d, total lines: %d, ignored: %d" % (report_bytes, report_count, report_ignored_count))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add this data to the previous print statement?

print("--- verify-loader\n")
sys.stdout.flush()

Expand Down Expand Up @@ -226,6 +236,8 @@ def verify(input_gen, report_interval=REPORT_INTERVAL):
(total_count / (now - start)),
(ignored_bytes / MB) / (now - start),
(ignored_count / (now - start))))
print("total bytes: %d, total lines: %d, ignored lines: %d, lost(out-ofseq) lines: %d" % (total_bytes, total_count, ignored_count, loss_count))
print("overall loss percentage = %.3f" %(loss_count*100.0/total_count))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add this data to the previous print statement?

print("--- verify-loader\n")
if tot_skips + tot_dupes > 0:
ret_val = 1
Expand Down