Skip to content

Data fetching scraping overview

Mark Klara edited this page Mar 24, 2024 · 4 revisions

Data Fetcher

The data fetcher is a helper class that manages kicking off the network fetches for each data request, as well as calling the relevant scraping methods to extract the necessary data.

Below is a list of the data (as of 7/1/2023/) that is fetched, and which service it is scraped from:

Active APIs

MSN Money

NOTE: The MSN Money API uses custom IDs for stocks. So we first make a query to map Ticker -> ID. And then after that, use the ID to make the actual API request to scrape. For example, at the time of this writing META stock mapped to ID a1slm7.

(example URLs: https://services.bingapis.com/contentservices-finance.csautosuggest/api/v1/Query?query=meta&market=en-us && https://services.bingapis.com/contentservices-finance.financedataservice/api/v1/KeyRatios?stockId=a1slm7)

We use MSN Money for historical PE ratio (price to earnings ratio) data, which is hard to find on the other providers. We calculate the min/max value of the past 5 years of data.

pe_low: The min of the last 5 years of PE ratio data
pe_high: The max of the last 5 years of PE ratio data

Defunct APIs

Yahoo Finance Quote

(example URL: https://query1.finance.yahoo.com/v6/finance/quote?symbols=meta)

NOTE: This API was made private by Yahoo in mid 2023. Currently it is failing and we need to find an alternative

'name': The company name (for the given ticker)
'average_volume': The average trade volume
'current_price': The current stock price
`TTM EPS`: The trailing-twelve-month earnings per share
`market cap`: The market capitalization of the stock

MorningStar

(example URL: https://financials.morningstar.com/finan/financials/getKeyStatPart.html?&t=meta&region=usa&culture=en-US&cur=&order=asc)

Unused now :'(

This used to provide the "key stats" data to power the "big 5 number" growth rates.

Sadly morningstar changed their APIs to make it infeasible to scrape what we need. Instead we now use StockRow.

Yahoo Finance Quote Summary

(example URL: https://query1.finance.yahoo.com/v10/finance/quoteSummary/meta?modules=assetProfile)

This appears to be an "unofficial" API from yahoo finance, with many different modules.

As of 7/1/2023, we currently request the following modules, which are used to compute the ROIC history for the company. See calculate_roic in the [RuleOneInvestingCalculations.py]:

  • balanceSheetHistory: Used to compute the ROIC history for the company. See calculate_roic in the RuleOneInvestingCalculations.py for details on how this is computed.
`cash`: The amount of cash the company has on their balance sheet.
`longTermDebt`: The total amount of long-term debt the company has.
`totalStockholderEquity`: The "total stockholder equity" (company's assets after all liabilities have been paid. Also know as "total book value"
  • incomeStatementHistory:
`netIncome`: The companies "net income", or revenue minus expenses, interest, and tax.

StockRow "key stats"

(example URL: https://stockrow.com/api/companies/meta/new_key_stats.json)

NOTE: We used to use MorningStar for this, and kept morningstar.py for historical reasons. But they changed their API structure to make it infeasible to scrape what we did before.

These power most of the main "big 5 numbers" from the Rule One Investing book. The only one it doesn't contain is ROIC growth rates, which are calculated from the Yahoo Finance Quote Summary.

`eps`: The 1 year, 3 year, 5 year, and max growth rates for Earnings Per Share
`sales`: The 1 year, 3 year, 5 year, and max growth rates for Revenue (i.e. Sales)
`equity`: The 1 year, 3 year, 5 year, and max growth rates for Equity (i.e. Book Value Per Share)
`cash`: The 1 year, 3 year, 5 year, and max growth rates for Free Cash Flow

We also grab some one-off numbers needed for various calculations or other metrics related to how management is managing the company's assets:

`total_debt`: The total debt the company has (NOTE: probably should be replaced with just 'long term debt'). 
`free_cash_flow`: The most recent Free Cash Flow number. (Displayed in the UI as well as used to compute `debt_payoff_time`.
`debt_payoff_time` : The computed time (in years) it would take to pay off the `total_debt` using the current `free_cash_flow`.
`debt_equity_ratio` : The most recent quarter's debt-to-equity ratio.