Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I see somewhere, if there is a new version? #29

Open
jobifis opened this issue Mar 14, 2023 · 16 comments
Open

Can I see somewhere, if there is a new version? #29

jobifis opened this issue Mar 14, 2023 · 16 comments

Comments

@jobifis
Copy link

jobifis commented Mar 14, 2023

Hi, can I see somewhere if there is a new version of password database before I download 30 GByte or above?

@jobifis jobifis changed the title Can I see everywhere, if there is a new version? Can I see somewhere, if there is a new version? Mar 14, 2023
@cnseubert
Copy link

I would also like to be able to detect that new passwords were added before re-downloading and re-importing the entire 30GB blob of data. I supposed that it's likely that new data is added frequently, however if we could access this corpus with something like rsync which could only pull down new data instead of the whole shebang, that would certainly be a time savings.

@cnseubert
Copy link

you have closed this one and not answered it. Issue #31 is also closed and unanswered even though you've marked it a duplicate of this bug, so there is no clear answer to this issue?

@FreifunkerEZ
Copy link

This issue seems open to me.

@jobifis
Copy link
Author

jobifis commented Mar 18, 2023 via email

@stebet
Copy link
Contributor

stebet commented Mar 19, 2023

We don't currently have incremental updates. Hash ranges do return ETags but we haven't added support for that to the downloader yet.

@eizedev
Copy link

eizedev commented Mar 29, 2023

We are just in the middle of migrating our downloading tasks from the old cloudflare single version ( 7z) files to the api using this downloader. Thanks for that, great work!!
As also already mentioned by @FreifunkerEZ in #31 we are also currently using the last-modified timestamp of the response header when requesting a specific range.

We would also be very happy to receive incremental updates.

@stebet or @troyhunt My question is, will the last-modified timestamp on a specific range also be updated on all others ranges, if a hash is added anywhere?

For example with powershell:

# specific range
$Url = 'https://api.pwnedpasswords.com/range/9A674'
# example date 30 days ago
$DBLastModifiedDate = (Get-Date).AddDays(-30)
$Request = Invoke-WebRequest $url
# get last-modified timestamp from response header as powershell datetime
$RemoteLastModifiedDate = [datetime][string]$Request.Headers.'Last-Modified'

# if last-modified timestamp is newer than the specified date start a new download
if ($RemoteLastModifiedDate -gt $DBLastModifiedDate) {
....
}

Thanks for your help

@stebet
Copy link
Contributor

stebet commented Mar 30, 2023

It should only update on the affected range. We also have ETags that you can use to detect changes.

@eizedev
Copy link

eizedev commented Mar 31, 2023

@stebet Thanks!
So i could also use the http etag to detect the changes. Works like a charm, http status code 304 will returned if the etag does not have changed.
But, just to be sure, this etag is for current range only, right? So it is currently not possible to detect changes over all ranges/hashes?

Then currently for my purpose the only way would be to trigger the update on a regular basis, e.g. every 30 days because I have no way to detect general changes in the database, right?
(of course, this is not an issue with the downloader)

image

@stebet
Copy link
Contributor

stebet commented Mar 31, 2023

But, just to be sure, this etag is for current range only, right?

Correct

@flexxxxer
Copy link

Same problem for me. Blindly downloading 30 GB without knowing if updates are available is unacceptable for me.

@ezekielnewren
Copy link

ezekielnewren commented Dec 24, 2023

I think a new api endpoint is needed. e.g.

curl https://api.pwnedpasswords.com/count
00000:3000:51324
00001:5214:95743
00002:2045:13254
...
FFFFE:9534:62335
FFFFF:2945:98564

Where the first number is how many hashes contain that prefix and the second number is the sum of the occurrences of each hash.

@oyeaussie
Copy link

Hello All,

I have created a tool that does the same job using PHP:
https://github.com/oyeaussie/PHPPwnedPasswordsDownloader

I hope someone finds it useful.

Thanks.

@troyhunt
Copy link
Contributor

I have created a tool that does the same job using PHP: https://github.com/oyeaussie/PHPPwnedPasswordsDownloader

Nice one, gave you a shout-out here: https://twitter.com/troyhunt/status/1803413986785870323

@oyeaussie
Copy link

I have updated my downloader tool with a lot of options. You can now download, update, sort, cache, index hash files with the tool. I have also added a password lookup tool from CLI, which you can integrate into your PHP code.
See the wiki page for more information:
https://github.com/oyeaussie/PHPPwnedPasswordsDownloader/wiki/1.-Description

Also, regarding this issue, read this wiki page:
https://github.com/oyeaussie/PHPPwnedPasswordsDownloader/wiki/9.-Update

I hope the tool helps someone. Cheers!

@rwiesbach
Copy link

rwiesbach commented Aug 29, 2024

It seems that https://hibp-downloader.readthedocs.io/en/latest/ manages to only download ranges that have changed based on the etag. https://github.com/threatpatrols/hibp-downloader

@rwiesbach
Copy link

Update: hibp-downloader takes days for the initial download (the same connection and the official downloader takes about 1hour) and resume is broken at least on windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants