Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get duplicate files new output format #40

Closed
Cedric-Boucher opened this issue Jul 27, 2023 · 2 comments
Closed

get duplicate files new output format #40

Cedric-Boucher opened this issue Jul 27, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@Cedric-Boucher
Copy link
Owner

returns a tuple of all the files that are duplicated between path1 and path2,
as a tuple of tuples of the filepaths of all matches.
for example: ( (match1, match1, match1), (match2, match2) )
@Cedric-Boucher Cedric-Boucher added the enhancement New feature or request label Jul 27, 2023
@Cedric-Boucher
Copy link
Owner Author

step 1: group files by matching size in bytes
step 2: group files by matching size in bytes AND matching hash of first 1MB
step 3: group files by matching size in bytes AND matching hash of first 1MB AND matching hash of whole file

for steps 2 and 3, only calculate hashes if there are more than 1 file in the group that could potentially match with other files

@Cedric-Boucher
Copy link
Owner Author

I updated the string in the actual function header as the output format was incorrect lol, but this is now correct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant