Skip to content

Commit

Permalink
Ignore other domain images. Close #41
Browse files Browse the repository at this point in the history
  • Loading branch information
c4software committed Sep 8, 2017
1 parent c49a98b commit 18683d0
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions crawler.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,12 @@ def __crawling(self):
if not self.exclude_url(image_link):
continue

# Ignore other domain images
image_link_parsed = urlparse(image_link)
if image_link_parsed.netloc != self.target_domain:
continue


# Test if images as been already seen and not present in the
# robot file
if self.can_fetch(image_link):
Expand Down

0 comments on commit 18683d0

Please sign in to comment.