Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/3922 change recaptcha to honeypot #2404

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

amdomanska
Copy link
Contributor

@amdomanska amdomanska commented Jul 26, 2024

Replace Google ReCaptcha with Honeypot.

Please don't delete any sections when completing this PR template; instead enter N/A for checkboxes or sections which are not applicable, unless otherwise stated below

See #3922

Description of Honeypot Technique Implementation

Regarding Google ReCaptcha: the whole code related to it is now removed from the codebase.

General Overview

The Honeypot Technique is a common method used to distinguish between human users and automated bots during form submissions. This technique involves adding a hidden field to the form that is visible to bots but not to human users. The idea is to trap bots that automatically fill in this field, while legitimate users will leave it empty.

To enhance the effectiveness of this technique, we use two key elements:

  • Bot-trap field: A hidden field that bots are likely to fill out.
  • Timer: Measures the time taken to complete the form. If the form is submitted too quickly, it indicates that a bot may have filled it out.

Implementation Details

  • Bot-trap Field:
    • The bot-trap field is implemented as an "email" field in the form template. This choice is intentional, as bots often target fields with names like "email" or "contact" due to their common usage. The appropriate comment is added to the code to make the purpose of the field clear to developers. It is intentionally placed at the very beginning of the form, while the "genuine" email field has been changed to sender-email and is placed after the trap.

Hiding the Bot-trap Field from Screen Readers and Keyboard Users

The bot-trap field is designed to be hidden from human users while remaining part of the form for detection purposes. To achieve this, we use a combination of CSS properties and HTML attributes to ensure the field is not accessible or visible in various contexts:

  • Positioning and Size:

    • position: absolute; and left: -9999px; top: -9999px;: This combination of positioning moves the field far outside the visible area of the browser window. The large negative offset ensures that the field is not within the viewable area of the page.
    • height: 1px; width: 1px;: These properties reduce the field's size to just 1 pixel by 1 pixel, making it effectively invisible to users while still being part of the document.
  • Visibility:

    • overflow: hidden;: This property prevents any content within the field from being displayed outside of its bounds, ensuring no accidental visibility of the field’s content.
    • clip: rect(0, 0, 0, 0);: This CSS property clips the field’s content to a rectangle of zero width and height, making it invisible and inaccessible, including to screen readers.
  • Additional Reset Properties:

    • border: 0; padding: 0; margin: 0;: These properties remove any default styling that could affect the field’s appearance or positioning, ensuring the field does not have any additional spacing or borders.
  • Keyboard Navigation:

    • tabindex="-1": This attribute is used to ensure that the field is not focusable by keyboard navigation. By setting tabindex to -1, the field is excluded from the tab order, meaning users cannot accidentally navigate to it using the Tab key. This further ensures that the field is not accessible or interactable by users relying on keyboard navigation.

By applying these CSS styles and HTML attributes, the bot-trap field is effectively hidden from both screen and keyboard users. This approach prevents human users from interacting with the field while allowing the honeypot technique to function correctly for detecting automated bots.

  • Timer Configuration:

    • After testing with colleagues and confirming with @richard-jones during a Mattermost discussion, we set the timer threshold to 5 seconds. This value was chosen to balance the sensitivity of detecting bots and avoiding false positives.
  • Honeypot Validation:

    • The honeypot field is validated before the form submission is processed. This ensures that any potential issues with the email field are detected early, and the user is classified as a bot before any sensitive information is processed.
  • Server-Side Validation:

    • The honeypot field is validated server-side rather than client-side to prevent manipulation by sophisticated bots. This approach ensures the integrity of the detection mechanism and avoids potential circumvention by altering client-side scripts.
  • Unit Tests:

    • Unit tests have been added to verify that the form is correctly identified as either filled by a bot or a human. These tests check for scenarios including:
      • An empty bot-trap email field with a timer within the acceptable range.
      • A filled bot-trap email field with a timer exceeding the threshold.
    • These tests ensure compliance with our requirements and verify the accuracy of the bot detection mechanism.
  • User Notification:

    • When a bot is detected, users will be prompted with a message to contact us if they believe they are being incorrectly identified as a bot.

    image

  • False Positives:

    • While the honeypot technique is effective, there may be cases where legitimate users are mistakenly flagged as bots (false positives). Potential causes for false positives include:
      • Users who fill out the form extremely quickly (e.g., due to autofill tools).
      • Users with accessibility tools that may inadvertently interact with the hidden field.
    • We anticipate receiving some helpdesk tickets related to false positives, and monitoring these cases is necessary to refine our approach as needed.

Categorisation

This PR...

  • has scripts to run
  • has migrations to run
  • adds new infrastructure
  • changes the CI pipeline
  • affects the public site
  • affects the editorial area
  • affects the publisher area
  • affects the monitoring

Basic PR Checklist

Instructions for developers:

  • For each checklist item, if it is N/A to your PR check the N/A box
  • For each item that you have done and confirmed for yourself, check Developer box (including if you have checked the N/A box)

Instructions for reviewers:

  • For each checklist item that has been confirmed by the Developer, check the Reviewer box if you agree
  • For multiple reviewers, feel free to add your own checkbox with your github username next to it if that helps with review tracking

Code Style

  • No deprecated methods are used

    • N/A
    • Developer
    • Reviewer
  • No magic strings/numbers - all strings are in constants or messages files

    • N/A
    • Developer
    • Reviewer
  • ES queries are wrapped in a Query object rather than inlined in the code

    • N/A
    • Developer
    • Reviewer
  • Where possible our common library functions have been used (e.g. dates manipulated via dates)

    • N/A
    • Developer
    • Reviewer
  • Cleaned up commented out code, etc

    • N/A
    • Developer
    • Reviewer
  • Urls are constructed with url_for not hard-coded

    • N/A
    • Developer
    • Reviewer

Testing

  • Unit tests have been added/modified

    • N/A
    • Developer
    • Reviewer
  • Functional tests have been added/modified

    • N/A
    • Developer
    • Reviewer
  • Code has been run manually in development, and functional tests followed locally

    • N/A
    • Developer
    • Reviewer
  • Have CSS/style changes been implemented? If they are of a global scope (e.g. on base HTML elements) have the downstream impacts of the change in other areas of the system been considered?

    • N/A
    • Developer
    • Reviewer

Documentation

Release Readiness

Testing

The best approach is to ask as many people as possible to "register" on the test server and provide feedback on their experience.

Testing Process:

User Registration: Participants should complete the registration process on the test server.
Observation: Users should not interact with or be aware of the honeypot fields in any way. Specifically, they should not:
    See the honeypot field.
    Focus on the honeypot field using keyboard navigation.
    Hear the honeypot field through screen readers.
    Be incorrectly identified as a bot.

Deployment

What deployment considerations are there? (delete any sections you don't need)

Configuration changes

What configuration changes are included in this PR, and do we need to set specific values for production

Scripts

What scripts need to be run from the PR (e.g. if this is a report generating feature), and when (once, regularly, etc).

Migrations

What migrations need to be run to deploy this

Monitoring

What additional monitoring is required of the application as a result of this feature

New Infrastructure

What new infrastructure does this PR require (e.g. new services that need to run on the back-end).

Continuous Integration

What CI changes are required for this

@amdomanska amdomanska marked this pull request as ready for review July 26, 2024 09:57
@amdomanska amdomanska requested review from Steven-Eardley, philipkcl, richard-jones and RK206 and removed request for Steven-Eardley and philipkcl July 26, 2024 09:57
portality/static/js/honeypot.js Outdated Show resolved Hide resolved
portality/view/account.py Outdated Show resolved Hide resolved
@philipkcl
Copy link
Contributor

philipkcl commented Jul 31, 2024

without recaptcha, the checking are easy to be by pass.

bot can create a account record by request like below.

curl 'http://localhost:5004/account/register' \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  --data-raw 'next=%2Fregister&email=&name=xxxx4&sender_email=xxx4%40xxx.com&next=&hptimer=6304.200000047684' | less

if we have to remove recaptcha, at least the timer should be server side can not modify by hacker.
But although timer in server side, hacker still can by pass it by sleep and multi requests.

Recaptcha is not perfect but it is are very good way to against bot.

@RK206
Copy link
Contributor

RK206 commented Sep 4, 2024

@amdomanska I am getting following error when I test at my local. Not sure if it is with my environment. self.hptimer.data is showing as None in is_bot method

127.0.0.1 - - [04/Sep/2024 11:31:43] "POST /account/register HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/Users/rama/CottageLabs/venv/doad-review-3.8.10/lib/python3.8/site-packages/flask/app.py", line 2091, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/rama/CottageLabs/venv/doad-review-3.8.10/lib/python3.8/site-packages/flask/app.py", line 2076, in wsgi_app
    response = self.handle_exception(e)
  File "/Users/rama/CottageLabs/venv/doad-review-3.8.10/lib/python3.8/site-packages/flask_cors/extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/Users/rama/CottageLabs/venv/doad-review-3.8.10/lib/python3.8/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/rama/CottageLabs/venv/doad-review-3.8.10/lib/python3.8/site-packages/flask/app.py", line 1519, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/rama/CottageLabs/venv/doad-review-3.8.10/lib/python3.8/site-packages/flask_cors/extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/Users/rama/CottageLabs/venv/doad-review-3.8.10/lib/python3.8/site-packages/flask/app.py", line 1517, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/rama/CottageLabs/venv/doad-review-3.8.10/lib/python3.8/site-packages/flask_debugtoolbar/__init__.py", line 142, in dispatch_request
    return view_func(**req.view_args)
  File "/Users/rama/CottageLabs/DOAJ/review-code/portality/decorators.py", line 82, in decorated_view
    return fn(*args, **kwargs)
  File "/Users/rama/CottageLabs/DOAJ/review-code/portality/decorators.py", line 124, in decorated_view
    return fn(*args, **kwargs)
  File "/Users/rama/CottageLabs/DOAJ/review-code/portality/view/account.py", line 342, in register
    if request.method == 'POST' and not form.is_bot():
  File "/Users/rama/CottageLabs/DOAJ/review-code/portality/view/account.py", line 328, in is_bot
    return (self.email.data != "" or self.hptimer.data < app.config.get("HONEYPOT_TIMER_THRESHOLD", 5000))
TypeError: '<' not supported between instances of 'NoneType' and 'int'

@RK206
Copy link
Contributor

RK206 commented Sep 16, 2024

I am using Firefox browser on my mac.
Firefox browser version 128.0.3

I see the latest commit on my system. See the git log below

rama$ git log
commit 1a7740288554d2268587023dafa39ab284712a8d (HEAD -> feature/3922_change_recaptcha_to_honeypot, origin/feature/3922_change_recaptcha_to_honeypot)
Author: Aga <[email protected]>
Date:   Thu Aug 1 11:24:36 2024 +0100

    change timer from Integer to Decimal

@RK206
Copy link
Contributor

RK206 commented Sep 17, 2024

I realized there are two ways to reach registration page. One is without logging in, going to the page /account/register. Not sure how we can go this page through UI. In this page, I am not getting the error mentioned above but always getting the error message Are you sure you're a human? If you're having trouble logging in, please [contact us](http://localhost:5004/contact).

Screenshot 2024-09-17 at 12 11 42 PM

Another way is, after login as ManEd, go to users and create user. This way the form looks different than above and always getting the error TypeError: '<' not supported between instances of 'NoneType' and 'int'

Screenshot 2024-09-17 at 12 04 29 PM

Hope this helps.

@amdomanska
Copy link
Contributor Author

@RK206 I committed the changed to the code.

  1. The bot-trap is disabled in the admin's New User registration form - if you access the /account/register page when you're logged in you shouldn't see the error anymore. Please review this change.
  2. I added more information to the flash message in a debug mode. If you could please test the form again, I'd be grateful. Please remember to set DEBUG = True in your dev.cfg and to access the /account/register page as an anonymous user (logged out). If you continue to see the "Are you a human?" error - please pass the information about the field values provided in the error banner.

Copy link
Contributor

@RK206 RK206 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested to my best and seems to be working as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants