Not ignoring googlebot #269

fabrouy · 2015-04-02T23:27:23Z

Hey guys,

I'm using exception_notification as an engine. Find bellow my configuration. Am I using it right in order to ignore Google robots? I keep getting emails from this IP range: 66.249.64.0 - 66.249.95.255

ExceptionNotification.configure do |config|
  config.add_notifier :email, {
    :email_prefix         => "xxxxxxxx",
    :sender_address       => %{xxxxxxxx},
    :exception_recipients => %w{yyyyyyyy},
    :ignore_crawlers      => %w{Googlebot bingbot}
  }
end

Thanks!

fabrouy · 2015-04-18T18:44:38Z

exception_notification/lib/exception_notification/rack.rb defines at the bottom

def from_crawler(env, ignored_crawlers)
  agent = env['HTTP_USER_AGENT']
  Array(ignored_crawlers).any? do |crawler|
    agent =~ Regexp.new(crawler)
  end
end

It would be great if we could configure to ignore certain IP ranges. Something like:

ExceptionNotification.configure do |config|
  config.add_notifier :email, {
    :email_prefix         => "xxxxxxxx",
    :sender_address       => %{xxxxxxxx},
    :exception_recipients => %w{yyyyyyyy},
    # If the regex matches any of this great, if not check IP.
    :ignore_crawlers      => %w{Googlebot bingbot},
    :ignore_ip_ranges => [
      { from: "64.233.160.0", to: "64.233.191.255" },
      { from: "66.249.64.0", to: "66.249.95.255" }
    ]
end

Then if no matches test against REMOTE_ADDR

# require 'ipaddr'
# we could use this library to match ip ranges

def from_crawler(env, ignored_crawlers, ignore_ip_ranges)
  agent = env['HTTP_USER_AGENT']
  remote_addr = env['REMOTE_ADDR']
  found_match = Array(ignored_crawlers).any? do |crawler|
    agent =~ Regexp.new(crawler)
  end
  unless found_match 
    Array(ignore_ip_ranges).any? do |ip_range|
      from = IPAddr.new(ip_range[:from]).to_i
      to = IPAddr.new(ip_range[:to]).to_i
      (from..to) === remote_addr
    end
  end
end

I have not tested this but I think it should work. What do you guys think?

jweslley · 2015-04-18T19:47:14Z

@fabrouy try using a custom ignore_if.

https://github.com/smartinez87/exception_notification#ignore_if

fabrouy · 2015-04-18T20:10:04Z

Thanks for pointing that out @jweslley.

Though I think this a very common problem and deserves it's own space and default configs. IMO it is better to filter by IP rather than matching a regex against the request user agent, which can be faked.

You could be ignoring a request made by a malicious user. Google, Bing, Yahoo, etc. proxies IP ranges are well known.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not ignoring googlebot #269

Not ignoring googlebot #269

fabrouy commented Apr 2, 2015

fabrouy commented Apr 18, 2015

jweslley commented Apr 18, 2015

fabrouy commented Apr 18, 2015

Not ignoring googlebot #269

Not ignoring googlebot #269

Comments

fabrouy commented Apr 2, 2015

fabrouy commented Apr 18, 2015

jweslley commented Apr 18, 2015

fabrouy commented Apr 18, 2015