-
Notifications
You must be signed in to change notification settings - Fork 7
Home
The nytimes-articles gem is used for searching articles in the New York Times Article Search API. For instance,
require 'rubygems'
require 'nytimes_articles'
include Nytimes::Articles
Base.api_key = 'YOUR API KEY'
articles = Article.search 'ice cream'
articles = Article.search :title => '"ice cream"', :since => 3.weeks.ago, :fields => :all
articles = Article.search :author => 'Sewell Chan', :facets => [:geo, :people], :fields => :all
articles = Article.search 'ice cream', :only_facets => {:geo => 'BROOKLYN'}, :rank => :newest, :fields => :all, :facets => [:geo, :people]
To get started, all you need is this gem and a valid key from developer.nytimes.com. This gem overlays some Ruby conventions and style over the basic RESTful scheme used by the API. For more details of what you can search, see Article Search below.
This is basically the only method you ever need. Executes a search against the Article Search API and returns a ResultSet of 10 articles. At its simplest form, can be invoked with just a string like so
Article.search 'dog food'
which will do a text search against several text fields in the article and return the most basic fields for each article, but it takes a large number of potential parameters. All of these fields and then some can be returned as display fields in the articles retrieved from search (see the :fields
argument below)
If passed a string as the first argument, the text will be used to search against the title, byline and body fields of articles. This text takes
the following boolean syntax:
-
dog food
– similar to doing a boolean =AND search on both terms -
"ice cream"
– matches the words as a phrase in the text -
ice -cream
– to search text that doesn’t contain a term, prefix with the minus sign.
Should you wish to target text against specific text fields associated with the article, the following named parameters are supported:
-
:abstract
– A summary of the article, written by Times indexers -
:body
– A portion of the beginning of the article. Note: Only a portion of the article body is included in responses. But when you search against the body field, you search the full text of the article. -
:byline
– The article byline, including the author’s name -
:lead_paragraph
– The first paragraph of the article (as it appeared in the printed newspaper) -
:nytd_byline
– The article byline, formatted for NYTimes.com -
:nytd_lead_paragraph
– The first paragraph of the article (as it appears on NYTimes.com) -
:nytd_title
– The article title on NYTimes.com (this field may or may not match the title field; headlines may be shortened and edited for the Web) -
:text
– The text field consists of title + byline + body (combined in an OR search) and is the default field for keyword searches. -
:title
– The article title (headline); corresponds to the headline that appeared in the printed newspaper -
:url
– The URL of the article on NYTimes.com
Beyond query searches, the NY Times API also allows you to search against controlled vocabulary metadata associated with the article. This is powerful, if you want precise matching against specific
people, places, etc (eg, “I want stories about Ford the former president, not Ford the automative company”). The following Facet constants are supported.
-
Facet::CLASSIFIERS
– Taxonomic classifiers that reflect Times content categories, such as Top/News/Sports -
Facet::COLUMN
– A Times column title (if applicable), such as Weddings or Ideas & Trends -
Facet::DATE
– The publication date in YYYYMMDD format -
Facet::DAY_OF_WEEK
– The day of the week (e.g., Monday, Tuesday) the article was published (comparePUB_DAY
, which is the numeric date rather than the day of the week) -
Facet::DESCRIPTION
– Descriptive subject terms assigned by Times indexers (must be in UPPERCASE) -
Facet::DESK
– The Times desk that produced the story (e.g., Business/Financial Desk) -
Facet::GEO
– Standardized names of geographic locations, assigned by Times indexers (must be in UPPERCASE) -
Facet::MATERIAL_TYPE
– The general article type, such as Biography, Editorial or Review -
Facet::ORGANIZATION
– Standardized names of people, assigned by Times indexers (must be UPPERCASE) -
Facet::PAGE
– The page the article appeared on (in the printed paper) -
Facet::PERSON
– Standardized names of people, assigned by Times indexers. When used in a request, values must be UPPERCASE. -
Facet::PUB_DAY
– The day (DD) segment of date, separated for use as facets -
Facet::PUB_MONTH
– The month (MM) segment of date, separated for use as facets -
Facet::PUB_YEAR
– The year (YYYY) segment of date, separated for use as facets -
Facet::SECTION_PAGE
– The full page number of the printed article (e.g., D00002) -
Facet::SOURCE
– The originating body (e.g., AP, Dow Jones, The New York Times) -
Facet::WORKS_MENTIONED
– Literary works mentioned in the article -
Facet::NYTD_BYLINE
– The article byline, formatted for NYTimes.com -
Facet::NYTD_DESCRIPTION
– Descriptive subject terms, assigned for use on NYTimes.com (to get standardized terms, use the TimesTags API). When used in a request, values must be Mixed Case -
Facet::NYTD_GEO
– Standardized names of geographic locations, assigned for use on NYTimes.com (to get standardized terms, use the TimesTags API). When used in a request, values must be Mixed Case -
Facet::NYTD_ORGANIZATION
– Standardized names of organizations, assigned for use on NYTimes.com (to get standardized terms, use the TimesTags API). When used in a request, values must be Mixed Case -
Facet::NYTD_PERSON
– Standardized names of people, assigned for use on NYTimes.com (to get standardized terms, use the TimesTags API). When used in a request, values must be Mixed Case. -
Facet::NYTD_SECTION
– The section the article appears in (on NYTimes.com) -
Facet::NYTD_WORKS_MENTIONED
– Literary works mentioned (titles formatted for use on NYTimes.com)
Note that for your convenience you can also search with symbol versions of the constants (:geo => ['MANHATTAN']
). Even pluralization is supported. To get the string API version of the facet use Facet#symbol_name
The following two search fields are used for facet searching:
-
:only_facets
– takes a single value or array of facets to search. Facets can either be specified as array pairs (like[Facet::GEOGRAPHIC, 'CALIFORNIA']
) or facets returned from a previous search can be passed directly. A single string can be passed as well if you have hand-crafted string. -
:except_facets
– similar to:only_facets
but is used to specify a list of facets to exclude.
-
:begin_date
,:end_date
– the parameters are used to specify a start and end date for search results. BOTH of these must be provided or the API will return an error. Accepts either a Time/Date argument or a string of the format YYYYMMDD. For convenience the following alternative methods are provided -
:before
– an alternative to :end_date. Automatically adds a :before_date of sometime in 1980 if no :since argument is also provided. -
:since
– An alternative to :begin_date. Automatically adds an :end_date of Time.now if no :before argument is provided.
-
:fee
– if set to true, only returns articles that must be purchased. If false, returns only free articles. If not specified, returns all articles -
:has_thumbnail
– returns only articles that have thumbnail images associated. Note that to see the thumbnails, you must specify either:thumbnail
or:all
in the:fields
argument). -
:has_multimedia
– to be implemented
The :facets
argument can be used to specify up to 5 facet fields to be returned alongside the search that provide overall counts
of how much each facet term appears in the search results. FIXME provide list of available facets as well as description of :nytd parameter.
The :fields
parameter is used to indicate what fields are returned with each article from the search results. If not specified, only
the following fields are returned for each article: body, byline, date, title, and url. To return specific fields, any of the search fields
from above can be explicitly specified in a comma-delimited list, as well as the additional display-only (not searchable) fields below (these
are strings or symbols):
-
:all
– return all fields for the article -
:none
– display only the facet breakdown and no article results -
:multimedia
– return any related multimedia links for the article -
:thumbnail
– return information for a related thumbnail image (if the article has one) -
:word_count
– the word_count of the article.