diff --git a/.gitignore b/.gitignore index dc6e8bb6781..4db22f1d8a0 100644 --- a/.gitignore +++ b/.gitignore @@ -12,4 +12,4 @@ dist .idea # Windows -Thumbs.db \ No newline at end of file +Thumbs.db diff --git a/docs/topics/feed-exports.rst b/docs/topics/feed-exports.rst index e81db64890e..367d8de02a8 100644 --- a/docs/topics/feed-exports.rst +++ b/docs/topics/feed-exports.rst @@ -8,7 +8,7 @@ Feed exports One of the most frequently required features when implementing scrapers is being able to store the scraped data properly and, quite often, that means -generating a "export file" with the scraped data (commonly called "export +generating an "export file" with the scraped data (commonly called "export feed") to be consumed by other systems. Scrapy provides this functionality out of the box with the Feed Exports, which @@ -21,7 +21,7 @@ Serialization formats ===================== For serializing the scraped data, the feed exports use the :ref:`Item exporters -` and these formats are supported out of the box: +`. These formats are supported out of the box: * :ref:`topics-feed-format-json` * :ref:`topics-feed-format-jsonlines` diff --git a/docs/topics/item-pipeline.rst b/docs/topics/item-pipeline.rst index 973c7751659..dd2d799890b 100644 --- a/docs/topics/item-pipeline.rst +++ b/docs/topics/item-pipeline.rst @@ -5,14 +5,14 @@ Item Pipeline ============= After an item has been scraped by a spider, it is sent to the Item Pipeline -which process it through several components that are executed sequentially. +which processes it through several components that are executed sequentially. Each item pipeline component (sometimes referred as just "Item Pipeline") is a Python class that implements a simple method. They receive an item and perform an action over it, also deciding if the item should continue through the pipeline or be dropped and no longer processed. -Typical use for item pipelines are: +Typical uses of item pipelines are: * cleansing HTML data * validating scraped data (checking that the items contain certain fields) @@ -167,7 +167,7 @@ Duplicates filter ----------------- A filter that looks for duplicate items, and drops those items that were -already processed. Let say that our items have an unique id, but our spider +already processed. Let's say that our items have a unique id, but our spider returns multiples items with the same id:: @@ -198,6 +198,6 @@ To activate an Item Pipeline component you must add its class to the } The integer values you assign to classes in this setting determine the -order they run in- items go through pipelines from order number low to -high. It's customary to define these numbers in the 0-1000 range. +order in which they run: items go through from lower valued to higher +valued classes. It's customary to define these numbers in the 0-1000 range. diff --git a/docs/topics/link-extractors.rst b/docs/topics/link-extractors.rst index 9758c2f353f..f2f296fbaac 100644 --- a/docs/topics/link-extractors.rst +++ b/docs/topics/link-extractors.rst @@ -82,7 +82,7 @@ LxmlLinkExtractor module. :type deny_extensions: list - :param restrict_xpaths: is a XPath (or list of XPath's) which defines + :param restrict_xpaths: is an XPath (or list of XPath's) which defines regions inside the response where links should be extracted from. If given, only the text selected by those XPath will be scanned for links. See examples below.