Svnloadsub skipping load of revisions #17

omanikhi · 2024-03-07T11:17:13Z

Fixes: https://pds-svn-gbg.pdsvision.net/trac/SubversionCMS/ticket/1783

Changes:

Verify youngest after loading shard.
Verify previous changed paths against previous shard paths before loading a shard0 dump.
Clean up corresponding shard0 after successfully dumping a shard3.

omanikhi · 2024-03-26T10:44:29Z

The Maximum number of shards processed was unreachable presumably due to an indentation error if not intentionally. I have revived it now but I'm not sure how it's supposed to behave in this case:

omanikhi · 2024-03-26T10:47:52Z

Regarding the lock on the svnloadsub, the way to do it is to use a pid file which is already somewhat implemented but the --pidfile parameter passed to the script is blank so it's not being used. In that case the next instances of the script will exit with an error if an instance is already running. If this is the desired behavior I can make it happen.

takesson

Sorry about the late review.

takesson · 2024-05-06T12:14:49Z

svndumpsub.py

        logging.info('Dumping and uploading rev: %s from repo: %s' % (self.rev, self.repo))
        self.dump_zip_upload(self._get_svn_dump_args(self.rev, self.rev), self.rev)

-
-    def dump_zip_upload(self, dump_args, rev):
+    def dump_zip_upload(self, dump_args, rev) -> bool:


Why are we modifying the dump_zip_upload code? I really like that it can succeed even if the disk is full...

I've been trying to use the common and improved execute() function instead of the subprocess calls. If you remember our discussion, communicate() buffers the entire the data to memory but the execute() now reads from the disk and writes to the disk directly or at least that's the goal.

Now regarding the disk being full, it can easily be fixed by a catch clause. What do you think?

I've been trying to use the common and improved execute() function instead of the subprocess calls. If you remember our discussion, communicate() buffers the entire the data to memory but the execute() now reads from the disk and writes to the disk directly or at least that's the goal.

Are you saying that the previous version of dump_zip_upload placed the dump file in RAM before uploading to S3? So, committing a 1GB file would never be able to dump to S3 on our current production VMs (since all our VMs are quite low on RAM)?

Now regarding the disk being full, it can easily be fixed by a catch clause. What do you think?

That would still fail, right? I would like it to stream to S3 and get the dump backed up.

Consider the scenario:

A VM is low on disk space and we have failed to notice the alarms.

User commits successfully but consumes almost all the remaining disk space.

svndumpsub should not use disk and successfully stream the commit to S3

VM fails unrecoverably

We can restore to a new VM including the last commit.

Are you saying that the previous version of dump_zip_upload placed the dump file in RAM before uploading to S3? So, committing a 1GB file would never be able to dump to S3 on our current production VMs (since all our VMs are quite low on RAM)?

Yes that's my conclusion reading the documentation and similar reports. Now as to why it still works in production, couldn't it be due to the OS swapping the overhead to the disk? Also it is the compressed gzip archive that is buffered.

That would still fail, right? I would like it to stream to S3 and get the dump backed up.

Consider the scenario:

* A VM is low on disk space and we have failed to notice the alarms. * User commits successfully but consumes almost all the remaining disk space. * svndumpsub should not use disk and successfully stream the commit to S3 * VM fails unrecoverably * We can restore to a new VM including the last commit.

In this scenario the previous approach is unpredictable at best. Perhaps it would be safest to avoid using the disk altogether and dump, compress and upload the data in memory and chunk by chunk?

OK, I've thought some more about this and I think what happens here is that the s3client.upload_fileobj() reads and thus removes all the returned data from the stdout buffer so p2.communicate()[] will not have returned any or much data. I will revert this function to its original form with some improvements.

svndumpsub.py

takesson · 2024-05-06T12:27:14Z

svndumpsub.py

+                    logging.error('No shards were loaded')
+                    return start, end
+                youngest = self._get_head(self.repo)
+                if youngest != end:


I think this validation should simply compare youngest with to_rev.

Makes sense.

takesson · 2024-05-06T12:53:08Z

svndumpsub.py

+                            change = str(match.group(2).rstrip().rstrip('/'))
+                            changed_paths_after.add(change)
+                            logging.debug(change)
+                    if changed_paths_before != changed_paths_after:


Looks like this validates the paths that were just loaded. Correct?

The ticket specified:
"Verify previous changed paths against previous shard paths before loading a shard0 dump"

It is quite important to do the validation before loading because it must block from doing any additional loading. Otherwise it will keep loading a revision at a time, risk of not noticing the issue.

Verify previous changed paths against previous shard paths before loading a shard0 dump [if the revision does not end with 001 because then we might not have a shard0 dump].

This is somewhat confusing to me. So if we're loading shard 91005 I need to also retrieve shard 91004 and compare its changed paths against svnlook changes -r 91004?

Also, 91005 does not end with 001 but it is a shard0. Could you clarify what you mean?

Verify previous changed paths against previous shard paths before loading a shard0 dump [if the revision does not end with 001 because then we might not have a shard0 dump].

This is somewhat confusing to me. So if we're loading shard 91005 I need to also retrieve shard 91004 and compare its changed paths against svnlook changes -r 91004?

Correct. This is the only way that I can see where by block further (incorrect) loading.

Also, 91005 does not end with 001 but it is a shard0. Could you clarify what you mean?

Yes, so we should validate 91004.

But when you should load 91000, the 90999 might not exist so we skip validation. (I got it wrong with 001 in the ticket)

takesson · 2024-05-06T13:08:11Z

svndumpsub.py

-        gz_args = [gz, '-c']
-
+        from_rev = shard
+        to_rev = ((int(shard / self.shard_div) + 1) * self.shard_div) - 1


Are we ever validating that youngest is NNNN000 (cleanly divisible with 1000) before loading a shard3?

That would be a great validation.

Now we are :)

There seems to be a logical issue with this check. Isn't it sufficient to check whether the youngest = from_rev - 1? Although this will have one exception which is when youngest is 0.

2024-05-17 14:20:12 [INFO] Repository: documentation youngest: 999
2024-05-17 14:20:12 [INFO] Loaded revs: (0-999) from shard: 0 to repo: documentation
2024-05-17 14:20:12 [DEBUG] Shard key exists: v1/travelonium/documentation/shard3/0000001000/documentation-0000001000.svndump.gz
2024-05-17 14:20:12 [INFO] Shard exists, will load shard: 1000
2024-05-17 14:20:12 [INFO] Loading shard: 1000 to repo: documentation
2024-05-17 14:20:12 [DEBUG] Running: /usr/bin/svnlook youngest /srv/cms/svn/documentation
2024-05-17 14:20:12 [INFO] Repository: documentation youngest: 999
2024-05-17 14:20:12 [ERROR] Unable to load shard3 as the youngest revision: 999 is not a multiple of 1000

How does it look?

takesson · 2024-05-06T13:10:10Z

Regarding the lock on the svnloadsub, the way to do it is to use a pid file which is already somewhat implemented but the --pidfile parameter passed to the script is blank so it's not being used. In that case the next instances of the script will exit with an error if an instance is already running. If this is the desired behavior I can make it happen.

Yes, we should probably do that. I was under the incorrect assumption that cron did not start multiple instances.

… 8 warnings.

…ug logs.

…stored from the dump.

…te large Even the previous approach using the communicate() function could potentially cause memory issues as the function buffers the entire stdout before returning.

…ad of entirely loading it to the memory.

…emory to keep the data.

…d0 revision matches those in the dump file.

…m reading the entire stdout/stderr.

…code.

…uldn't crash the daemon.

…using the disk.

…ween the previous shard and the last rev.

omanikhi · 2024-05-22T10:44:48Z

Regarding the lock on the svnloadsub, the way to do it is to use a pid file which is already somewhat implemented but the --pidfile parameter passed to the script is blank so it's not being used. In that case the next instances of the script will exit with an error if an instance is already running. If this is the desired behavior I can make it happen.

Yes, we should probably do that. I was under the incorrect assumption that cron did not start multiple instances.

Done.

omanikhi requested a review from takesson March 26, 2024 10:40

takesson requested changes May 6, 2024

View reviewed changes

omanikhi added 16 commits May 7, 2024 11:06

Improve the execute() function allowing it to take in stdin data.

6a4443c

Use execute() to run commands and refactored the file fixing some PEP…

7dbc867

… 8 warnings.

Some more refactoring and PEP 8 fixes.

b32e96b

Added a --log-level argument and suppressed the unnecessary boto3 deb…

fdbdc4c

…ug logs.

Fixed a logical issue and revived the maximum number of shards error.

c87fa89

Check the youngest revision and compare it against the last commit re…

9c12346

…stored from the dump.

Rewrote the _load_zip method avoiding buffers as the dumps can be qui…

e833c30

…te large Even the previous approach using the communicate() function could potentially cause memory issues as the function buffers the entire stdout before returning.

Re-implemented the execute() function reading it chunk by chunk inste…

2242c72

…ad of entirely loading it to the memory.

Re-wrote the dump_zip_upload() method using the disk instead of the m…

c5740d0

…emory to keep the data.

Added a check that makes sure the modified paths in the restored shar…

7d2cb04

…d0 revision matches those in the dump file.

Improved the execute() function fixing an issue that prevented it fro…

4c1cfe7

…m reading the entire stdout/stderr.

Simplified the execute() logic.

1d67c5f

Clean up corresponding shard0 after successfully dumping a shard3.

a33d773

Use the common bgworker and daemon modules instead of the duplicated …

f328aad

…code.

Improved exception handling when a commit fails to be queued so it wo…

a480a06

…uldn't crash the daemon.

Improved the validation following the review comments.

c2d04fe

omanikhi force-pushed the svnloadsub-rev-skipping-fix branch from 6c03fd2 to c2d04fe Compare May 8, 2024 12:08

omanikhi added 3 commits May 17, 2024 18:31

Reverted the dump_zip_upload to its previous implementation to avoid …

aba5b3c

…using the disk.

Comparing changed paths is now done before restoring a shard0 and bet…

020870c

…ween the previous shard and the last rev.

Added support for the PID file locking to svnloadsub.

517b9fc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Svnloadsub skipping load of revisions #17

Svnloadsub skipping load of revisions #17

omanikhi commented Mar 7, 2024 •

edited

Loading

omanikhi commented Mar 26, 2024 •

edited

Loading

omanikhi commented Mar 26, 2024 •

edited

Loading

takesson left a comment

takesson May 6, 2024

omanikhi May 7, 2024

takesson May 15, 2024

omanikhi May 17, 2024

omanikhi May 17, 2024

takesson May 6, 2024

omanikhi May 8, 2024

takesson May 6, 2024

omanikhi May 8, 2024

takesson May 15, 2024

omanikhi May 21, 2024

takesson May 6, 2024

omanikhi May 8, 2024

omanikhi May 21, 2024

omanikhi May 21, 2024

takesson commented May 6, 2024

omanikhi commented May 22, 2024

Svnloadsub skipping load of revisions #17

Are you sure you want to change the base?

Svnloadsub skipping load of revisions #17

Conversation

omanikhi commented Mar 7, 2024 • edited Loading

omanikhi commented Mar 26, 2024 • edited Loading

omanikhi commented Mar 26, 2024 • edited Loading

takesson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

takesson commented May 6, 2024

omanikhi commented May 22, 2024

omanikhi commented Mar 7, 2024 •

edited

Loading

omanikhi commented Mar 26, 2024 •

edited

Loading

omanikhi commented Mar 26, 2024 •

edited

Loading