-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recurrence by email address / username #9
Comments
Hello! Yes, i did capture username/email tuples in my data. It is a great idea, however it is extremely time consuming to do a large-scale analysis on both username and password, because it requires doing a join operation on 1 billion rows. But it is not as impactful as you might think.
So, i've decided not to process that metric, because it will be too computationally heavy with minimal impact. If you disagree, please feel free to write so! Cheers! |
Interesting. For the emails used many thousands of times, I wonder if those should be blacklisted (along with any accounts created using those as secondary accounts) - probably fraud related. What if you limited it to say accounts which appeared within a smaller range of occurrences - say 10 to 500 times? This could substantially reduce the computational cost and would seem to still provide important information about reuse of passwords Thanks for doing the important work you do! |
I've been checking passwords from mystery lists frantically, i was really excited there was something to possibly explain that, but it looks like just a fraction of these passwords are from these spam accounts. |
i need the commands for this how do i search for passwords |
Did you capture usernames / email addresses in your data set? Can you determine uniqueness or lack thereof by email addresses? For example, what fraction of the passwords associated with a specific username (email address if relevant) are unique, and how does that vary with the number of duplicates of the username (i.e., reuse of passwords vs # of times the username is matched in the data set). Thanks!
The text was updated successfully, but these errors were encountered: