Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not ignoring googlebot #269

Open
fabrouy opened this issue Apr 2, 2015 · 3 comments
Open

Not ignoring googlebot #269

fabrouy opened this issue Apr 2, 2015 · 3 comments

Comments

@fabrouy
Copy link

fabrouy commented Apr 2, 2015

Hey guys,

I'm using exception_notification as an engine. Find bellow my configuration. Am I using it right in order to ignore Google robots? I keep getting emails from this IP range: 66.249.64.0 - 66.249.95.255

ExceptionNotification.configure do |config|
  config.add_notifier :email, {
    :email_prefix         => "xxxxxxxx",
    :sender_address       => %{xxxxxxxx},
    :exception_recipients => %w{yyyyyyyy},
    :ignore_crawlers      => %w{Googlebot bingbot}
  }
end

Thanks!

@fabrouy
Copy link
Author

fabrouy commented Apr 18, 2015

exception_notification/lib/exception_notification/rack.rb defines at the bottom

def from_crawler(env, ignored_crawlers)
  agent = env['HTTP_USER_AGENT']
  Array(ignored_crawlers).any? do |crawler|
    agent =~ Regexp.new(crawler)
  end
end

It would be great if we could configure to ignore certain IP ranges. Something like:

ExceptionNotification.configure do |config|
  config.add_notifier :email, {
    :email_prefix         => "xxxxxxxx",
    :sender_address       => %{xxxxxxxx},
    :exception_recipients => %w{yyyyyyyy},
    # If the regex matches any of this great, if not check IP.
    :ignore_crawlers      => %w{Googlebot bingbot},
    :ignore_ip_ranges => [
      { from: "64.233.160.0", to: "64.233.191.255" },
      { from: "66.249.64.0", to: "66.249.95.255" }
    ]
end

Then if no matches test against REMOTE_ADDR

# require 'ipaddr'
# we could use this library to match ip ranges

def from_crawler(env, ignored_crawlers, ignore_ip_ranges)
  agent = env['HTTP_USER_AGENT']
  remote_addr = env['REMOTE_ADDR']
  found_match = Array(ignored_crawlers).any? do |crawler|
    agent =~ Regexp.new(crawler)
  end
  unless found_match 
    Array(ignore_ip_ranges).any? do |ip_range|
      from = IPAddr.new(ip_range[:from]).to_i
      to = IPAddr.new(ip_range[:to]).to_i
      (from..to) === remote_addr
    end
  end
end

I have not tested this but I think it should work. What do you guys think?

@jweslley
Copy link
Collaborator

@fabrouy
Copy link
Author

fabrouy commented Apr 18, 2015

Thanks for pointing that out @jweslley.

Though I think this a very common problem and deserves it's own space and default configs. IMO it is better to filter by IP rather than matching a regex against the request user agent, which can be faked.

You could be ignoring a request made by a malicious user. Google, Bing, Yahoo, etc. proxies IP ranges are well known.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants