-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Svnloadsub skipping load of revisions #17
base: master
Are you sure you want to change the base?
Conversation
The |
Regarding the lock on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the late review.
svndumpsub.py
Outdated
logging.info('Dumping and uploading rev: %s from repo: %s' % (self.rev, self.repo)) | ||
self.dump_zip_upload(self._get_svn_dump_args(self.rev, self.rev), self.rev) | ||
|
||
|
||
def dump_zip_upload(self, dump_args, rev): | ||
def dump_zip_upload(self, dump_args, rev) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we modifying the dump_zip_upload code? I really like that it can succeed even if the disk is full...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to use the common and improved execute() function instead of the subprocess calls. If you remember our discussion, communicate() buffers the entire the data to memory but the execute() now reads from the disk and writes to the disk directly or at least that's the goal.
Now regarding the disk being full, it can easily be fixed by a catch clause. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to use the common and improved execute() function instead of the subprocess calls. If you remember our discussion, communicate() buffers the entire the data to memory but the execute() now reads from the disk and writes to the disk directly or at least that's the goal.
Are you saying that the previous version of dump_zip_upload placed the dump file in RAM before uploading to S3? So, committing a 1GB file would never be able to dump to S3 on our current production VMs (since all our VMs are quite low on RAM)?
Now regarding the disk being full, it can easily be fixed by a catch clause. What do you think?
That would still fail, right? I would like it to stream to S3 and get the dump backed up.
Consider the scenario:
- A VM is low on disk space and we have failed to notice the alarms.
- User commits successfully but consumes almost all the remaining disk space.
- svndumpsub should not use disk and successfully stream the commit to S3
- VM fails unrecoverably
- We can restore to a new VM including the last commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying that the previous version of dump_zip_upload placed the dump file in RAM before uploading to S3? So, committing a 1GB file would never be able to dump to S3 on our current production VMs (since all our VMs are quite low on RAM)?
Yes that's my conclusion reading the documentation and similar reports. Now as to why it still works in production, couldn't it be due to the OS swapping the overhead to the disk? Also it is the compressed gzip archive that is buffered.
That would still fail, right? I would like it to stream to S3 and get the dump backed up.
Consider the scenario:
* A VM is low on disk space and we have failed to notice the alarms. * User commits successfully but consumes almost all the remaining disk space. * svndumpsub should not use disk and successfully stream the commit to S3 * VM fails unrecoverably * We can restore to a new VM including the last commit.
In this scenario the previous approach is unpredictable at best. Perhaps it would be safest to avoid using the disk altogether and dump, compress and upload the data in memory and chunk by chunk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've thought some more about this and I think what happens here is that the s3client.upload_fileobj() reads and thus removes all the returned data from the stdout buffer so p2.communicate()[] will not have returned any or much data. I will revert this function to its original form with some improvements.
svndumpsub.py
Outdated
logging.error('No shards were loaded') | ||
return start, end | ||
youngest = self._get_head(self.repo) | ||
if youngest != end: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this validation should simply compare youngest with to_rev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
svndumpsub.py
Outdated
change = str(match.group(2).rstrip().rstrip('/')) | ||
changed_paths_after.add(change) | ||
logging.debug(change) | ||
if changed_paths_before != changed_paths_after: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this validates the paths that were just loaded. Correct?
The ticket specified:
"Verify previous changed paths against previous shard paths before loading a shard0 dump"
It is quite important to do the validation before loading because it must block from doing any additional loading. Otherwise it will keep loading a revision at a time, risk of not noticing the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify previous changed paths against previous shard paths before loading a shard0 dump [if the revision does not end with 001 because then we might not have a shard0 dump].
- This is somewhat confusing to me. So if we're loading shard
91005
I need to also retrieve shard91004
and compare its changed paths againstsvnlook changes -r 91004
? - Also,
91005
does not end with001
but it is ashard0
. Could you clarify what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify previous changed paths against previous shard paths before loading a shard0 dump [if the revision does not end with 001 because then we might not have a shard0 dump].
- This is somewhat confusing to me. So if we're loading shard
91005
I need to also retrieve shard91004
and compare its changed paths againstsvnlook changes -r 91004
?
Correct. This is the only way that I can see where by block further (incorrect) loading.
- Also,
91005
does not end with001
but it is ashard0
. Could you clarify what you mean?
Yes, so we should validate 91004.
But when you should load 91000, the 90999 might not exist so we skip validation. (I got it wrong with 001 in the ticket)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done?
gz_args = [gz, '-c'] | ||
|
||
from_rev = shard | ||
to_rev = ((int(shard / self.shard_div) + 1) * self.shard_div) - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we ever validating that youngest is NNNN000 (cleanly divisible with 1000) before loading a shard3?
That would be a great validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we are :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a logical issue with this check. Isn't it sufficient to check whether the youngest = from_rev - 1
? Although this will have one exception which is when youngest is 0.
2024-05-17 14:20:12 [INFO] Repository: documentation youngest: 999
2024-05-17 14:20:12 [INFO] Loaded revs: (0-999) from shard: 0 to repo: documentation
2024-05-17 14:20:12 [DEBUG] Shard key exists: v1/travelonium/documentation/shard3/0000001000/documentation-0000001000.svndump.gz
2024-05-17 14:20:12 [INFO] Shard exists, will load shard: 1000
2024-05-17 14:20:12 [INFO] Loading shard: 1000 to repo: documentation
2024-05-17 14:20:12 [DEBUG] Running: /usr/bin/svnlook youngest /srv/cms/svn/documentation
2024-05-17 14:20:12 [INFO] Repository: documentation youngest: 999
2024-05-17 14:20:12 [ERROR] Unable to load shard3 as the youngest revision: 999 is not a multiple of 1000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it look?
Yes, we should probably do that. I was under the incorrect assumption that cron did not start multiple instances. |
…stored from the dump.
…te large Even the previous approach using the communicate() function could potentially cause memory issues as the function buffers the entire stdout before returning.
…ad of entirely loading it to the memory.
…emory to keep the data.
…d0 revision matches those in the dump file.
…m reading the entire stdout/stderr.
…uldn't crash the daemon.
6c03fd2
to
c2d04fe
Compare
…ween the previous shard and the last rev.
Done. |
Fixes: https://pds-svn-gbg.pdsvision.net/trac/SubversionCMS/ticket/1783
Changes: