-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docs #328
Open
lahdjirayhan
wants to merge
41
commits into
JustAnotherArchivist:master
Choose a base branch
from
lahdjirayhan:add-docs
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Add docs #328
Changes from all commits
Commits
Show all changes
41 commits
Select commit
Hold shift + click to select a range
c39db6e
Quickstart docs, add mustache template
lahdjirayhan 15eba41
Modify mustache file
lahdjirayhan df049bc
Update .gitignore to not track documentation builds
lahdjirayhan 6979c5d
Delete build directory
lahdjirayhan a2d2750
Add initial documentation and docstring
lahdjirayhan ee09d9a
Add example/tutorial
lahdjirayhan 80d1981
Merge branch 'master' into add-docs
lahdjirayhan a61347e
Update .gitignore to not track .vscode
lahdjirayhan aa38bf0
Modify my docstring to match owner expectation
lahdjirayhan 2dade08
Rewrite index.rst
lahdjirayhan a844c45
Merge branch 'master' into add-docs
lahdjirayhan c2eabfb
Add examples
lahdjirayhan 00e08d8
Add docstrings on Twitter module
lahdjirayhan b49fef6
Add docstrings on Instagram module
lahdjirayhan 34cf780
Add docstrings on Telegram module
lahdjirayhan c62a9b4
Add docstring to Reddit module
lahdjirayhan 4e2d184
Add docstring to VK module
lahdjirayhan a733e26
Fix docstring formatting
lahdjirayhan 75b287b
Merge branch 'master' into backup-add-docs
lahdjirayhan ab1dbe9
Try autosummary
lahdjirayhan b5dcf41
Update .gitignore to not track autogenerated _autosummary
lahdjirayhan 26fedeb
Add templates
lahdjirayhan 44ca124
Slight fix
lahdjirayhan d2ba2c9
Modify template to remove double init in docs
lahdjirayhan 319b575
Add docs to facebook module
lahdjirayhan ccbe847
Add docs in weibo module
lahdjirayhan 59f69e5
Add docs to base Scraper class' get_items
lahdjirayhan 294f6b7
Modify index.rst to have some toctree structure for entire package
lahdjirayhan ca5bf06
Merge branch 'master' into add-docs
lahdjirayhan c9a5c08
Update/add docstrings
lahdjirayhan 845ff32
Merge branch 'master' into add-docs
lahdjirayhan a10a195
Update/add docstrings again
lahdjirayhan 8e697a3
Add/update docs for mastodon objects
lahdjirayhan 955bee8
Add/update docs for twitter
lahdjirayhan 31d495e
Update .gitignore to not track venv folder
lahdjirayhan 36f4d0e
Detect snscrape version in docs using importlib
lahdjirayhan 2cb811b
Update index.rst to add mastodon
lahdjirayhan fe818fa
Fix incorrect docstring on TwitterTweetScraper
lahdjirayhan 80627eb
Merge branch 'master' into add-docs
lahdjirayhan 0eebb3b
Retrieve everything except project name from importlib.metadata
lahdjirayhan 0832e95
Fix typo in index.rst
lahdjirayhan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,3 +2,7 @@ __pycache__/ | |
/dist/ | ||
/snscrape.egg-info/ | ||
/.eggs/ | ||
/docs/_build/** | ||
/docs/_autosummary/** | ||
.vscode/ | ||
venv/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line, and also | ||
# from the environment for the first two. | ||
SPHINXOPTS ?= | ||
SPHINXBUILD ?= sphinx-build | ||
SOURCEDIR = . | ||
BUILDDIR = _build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
{{ fullname | escape | underline}} | ||
|
||
.. currentmodule:: {{ module }} | ||
|
||
.. autoclass:: {{ objname }} | ||
:members: | ||
:show-inheritance: | ||
:inherited-members: | ||
|
||
{% block methods %} | ||
|
||
|
||
{% if methods %} | ||
.. rubric:: {{ _('Methods') }} | ||
|
||
.. autosummary:: | ||
{% for item in methods %} | ||
~{{ name }}.{{ item }} | ||
{%- endfor %} | ||
{% endif %} | ||
{% endblock %} | ||
|
||
{% block attributes %} | ||
{% if attributes %} | ||
.. rubric:: {{ _('Attributes') }} | ||
|
||
.. autosummary:: | ||
{% for item in attributes %} | ||
~{{ name }}.{{ item }} | ||
{%- endfor %} | ||
{% endif %} | ||
{% endblock %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
{{ fullname | escape | underline}} | ||
|
||
.. automodule:: {{ fullname }} | ||
|
||
{% block attributes %} | ||
{% if attributes %} | ||
.. rubric:: Module Attributes | ||
|
||
.. autosummary:: | ||
:toctree: | ||
{% for item in attributes %} | ||
{{ item }} | ||
{%- endfor %} | ||
{% endif %} | ||
{% endblock %} | ||
|
||
{% block functions %} | ||
{% if functions %} | ||
.. rubric:: {{ _('Functions') }} | ||
|
||
.. autosummary:: | ||
:toctree: | ||
{% for item in functions %} | ||
{{ item }} | ||
{%- endfor %} | ||
{% endif %} | ||
{% endblock %} | ||
|
||
{% block classes %} | ||
{% if classes %} | ||
.. rubric:: {{ _('Classes') }} | ||
|
||
.. autosummary:: | ||
:toctree: | ||
:template: custom-class-template.rst | ||
{% for item in classes %} | ||
{{ item }} | ||
{%- endfor %} | ||
{% endif %} | ||
{% endblock %} | ||
|
||
{% block exceptions %} | ||
{% if exceptions %} | ||
.. rubric:: {{ _('Exceptions') }} | ||
|
||
.. autosummary:: | ||
:toctree: | ||
{% for item in exceptions %} | ||
{{ item }} | ||
{%- endfor %} | ||
{% endif %} | ||
{% endblock %} | ||
|
||
{% block modules %} | ||
{% if modules %} | ||
.. rubric:: Modules | ||
|
||
.. autosummary:: | ||
:toctree: | ||
:template: custom-module-template.rst | ||
:recursive: | ||
{% for item in modules %} | ||
{{ item.split('.')[-1] }} | ||
{%- endfor %} | ||
{% endif %} | ||
{% endblock %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
.. This file should contain API reference. Ideally, an automatic discovery/summary. | ||
|
||
API Reference | ||
============= | ||
|
||
.. autosummary:: | ||
:toctree: _autosummary | ||
:template: custom-module-template.rst | ||
:recursive: | ||
|
||
snscrape |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
# Configuration file for the Sphinx documentation builder. | ||
# | ||
# This file only contains a selection of the most common options. For a full | ||
# list see the documentation: | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html | ||
|
||
# -- Path setup -------------------------------------------------------------- | ||
|
||
# If extensions (or modules to document with autodoc) are in another directory, | ||
# add these directories to sys.path here. If the directory is relative to the | ||
# documentation root, use os.path.abspath to make it absolute, like shown here. | ||
# | ||
import os | ||
import sys | ||
sys.path.insert(0, os.path.abspath('..')) | ||
|
||
# Tools for importing snscrape at build time | ||
# Avoid name conflict with sphinx configuration variable "version" | ||
from importlib import import_module | ||
from importlib.metadata import metadata | ||
|
||
|
||
# -- Project information ----------------------------------------------------- | ||
|
||
# Project name | ||
project = 'snscrape' | ||
|
||
# Metadata | ||
_metadata = metadata(project) | ||
|
||
# Version in format 0.4.0.20211208 | ||
release = _metadata['version'] | ||
author = _metadata['author'] | ||
|
||
_major, _minor, _patch, _yyyymmdd = release.split('.') | ||
|
||
YEAR = _yyyymmdd[0:4] | ||
copyright = f'{YEAR}, {author}' | ||
|
||
|
||
# -- General configuration --------------------------------------------------- | ||
|
||
# Add any Sphinx extension module names here, as strings. They can be | ||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom | ||
# ones. | ||
extensions = [ | ||
'sphinx.ext.napoleon', | ||
'sphinx.ext.autodoc', | ||
'sphinx.ext.autosummary', | ||
# 'sphinx_autodoc_typehints' | ||
] | ||
|
||
# Add any paths that contain templates here, relative to this directory. | ||
templates_path = ['_templates'] | ||
|
||
# List of patterns, relative to source directory, that match files and | ||
# directories to ignore when looking for source files. | ||
# This pattern also affects html_static_path and html_extra_path. | ||
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] | ||
|
||
# -- Custom extension options ------------------------------------------------ | ||
|
||
# Put type hint in description instead of signature | ||
# Note: the docstrings are overridden if autodoc_typehints is used | ||
autodoc_typehints = 'description' | ||
|
||
# Set 'both' to use both class and __init__ docstrings. | ||
autoclass_content = 'both' | ||
|
||
# Might want to look at it: | ||
# https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#confval-autodoc_type_aliases | ||
# autodoc_type_aliases = {} | ||
|
||
# Turn on autosummary | ||
autosummary_generate = True | ||
|
||
# -- Options for HTML output ------------------------------------------------- | ||
|
||
# The theme to use for HTML and HTML Help pages. See the documentation for | ||
# a list of builtin themes. | ||
# | ||
html_theme = 'nature' | ||
|
||
# Add any paths that contain custom static files (such as style sheets) here, | ||
# relative to this directory. They are copied after the builtin static files, | ||
# so a file named "default.css" will overwrite the builtin "default.css". | ||
html_static_path = ['_static'] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
{{! Modified Google Docstring Template }} | ||
{{summaryPlaceholder}} | ||
{{extendedSummaryPlaceholder}} | ||
{{#parametersExist}} | ||
Args: | ||
{{#args}} | ||
{{var}}: {{descriptionPlaceholder}} | ||
{{/args}} | ||
{{#kwargs}} | ||
{{var}}: {{descriptionPlaceholder}}. Defaults to {{&default}}. | ||
{{/kwargs}} | ||
{{/parametersExist}} | ||
{{#exceptionsExist}} | ||
Raises: | ||
{{#exceptions}} | ||
{{type}}: {{descriptionPlaceholder}} | ||
{{/exceptions}} | ||
{{/exceptionsExist}} | ||
{{#returnsExist}} | ||
Returns: | ||
{{#returns}} | ||
{{descriptionPlaceholder}} | ||
{{/returns}} | ||
{{/returnsExist}} | ||
{{#yieldsExist}} | ||
Yields: | ||
{{#yields}} | ||
{{typePlaceholder}}: {{descriptionPlaceholder}} | ||
{{/yields}} | ||
{{/yieldsExist}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
.. snscrape documentation master file, created by | ||
sphinx-quickstart on Sat Dec 11 06:18:23 2021. | ||
You can adapt this file completely to your liking, but it should at least | ||
contain the root `toctree` directive. | ||
|
||
Welcome to snscrape's documentation! | ||
==================================== | ||
|
||
``snscrape`` is a scraper for social networking services (SNS). It scrapes through things like user profiles, hashtags, or searches and returns the discovered items, usually posts. ``snscrape`` supports several SNS: | ||
|
||
================== ======================================================= | ||
Platform Can scrape for items in: | ||
================== ======================================================= | ||
Twitter User profile, hashtag, search, thread, list, trending | ||
Instagram User profile, hashtag, location | ||
Reddit User profile, subreddit, search (via Pushshift) | ||
Facebook User profile, group, community (for visitor posts) | ||
Telegram Channel | ||
VKontakte User profile | ||
Weibo (Sina Weibo) User profile | ||
Mastodon User profile, thread | ||
================== ======================================================= | ||
|
||
``snscrape`` works without the need for logins/authentications. The drawback of doing so, however, is that some platforms (right now, or in the future) may try to impose limits for unauthenticated or not-logged-in requests coming from your IP address. Such IP-based limits are usually temporary. | ||
|
||
``snscrape`` can be used either from CLI or imported as a library. | ||
|
||
CLI usage | ||
--------- | ||
|
||
The generic syntax of snscrape's CLI is: | ||
|
||
.. code-block:: console | ||
|
||
snscrape [GLOBAL-OPTIONS] SCRAPER-NAME [SCRAPER-OPTIONS] [SCRAPER-ARGUMENTS...] | ||
|
||
``snscrape --help`` and ``snscrape SCRAPER-NAME --help`` provide details on the options and arguments. ``snscrape --help`` also lists all available scrapers. | ||
|
||
The default output of the CLI is the URL of each result. | ||
|
||
Some noteworthy global options are: | ||
|
||
* ``--jsonl`` to get output as JSONL. This includes all information extracted by ``snscrape`` (e.g. message content, datetime, images; details vary by scraper). | ||
* ``--max-results NUMBER`` to only return the first ``NUMBER`` results. | ||
* ``--with-entity`` to get an item on the entity being scraped, e.g. the user or channel. This is not supported on all scrapers. (You can use this together with ``--max-results 0`` to only fetch the entity info.) | ||
|
||
**Examples** | ||
|
||
Collect all tweets by Jason Scott (@textfiles): | ||
|
||
.. code-block:: console | ||
|
||
snscrape twitter-user textfiles | ||
|
||
It's usually useful to redirect the output to a file for further processing, e.g. in bash using the filename ``twitter-@textfiles``: | ||
|
||
.. code-block:: console | ||
|
||
snscrape twitter-user textfiles >twitter-@textfiles | ||
|
||
|
||
To get the latest 100 tweets with the hashtag #archiveteam: | ||
|
||
.. code-block:: console | ||
|
||
snscrape --max-results 100 twitter-hashtag archiveteam | ||
|
||
|
||
Library usage | ||
------------- | ||
|
||
The general idea of steps is: | ||
|
||
#. **Instantiate a scraper object.** | ||
``snscrape`` provides various object classes that implement their own specific ways. For example, :class:`TwitterSearchScraper` gathers tweets via search query, and :class:`TwitterUserScraper` gathers tweets from a specified user. | ||
#. **Call the scraper's** ``get_item()`` **method.** | ||
``get_item()`` is an iterator and yields one item at a time. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here |
||
|
||
Each scraper class provides different options and arguments. Refer to the class signature for more information, e.g. in Jupyter Notebook it can be done via:: | ||
|
||
?TwitterSearchScraper | ||
|
||
**Examples** | ||
|
||
Collect tweets by searching for "omicron variant", limit the results to first 100 tweets, and save the results to a list: | ||
|
||
.. code-block:: python | ||
|
||
from snscrape.modules import TwitterSearchScraper | ||
scraper = TwitterSearchScraper('omicron variant') | ||
|
||
result = [] | ||
|
||
for i, item in enumerate(scraper.get_items()): | ||
result.append(item) | ||
if i == 100: | ||
break | ||
|
||
API reference | ||
============= | ||
|
||
.. toctree:: | ||
:maxdepth: 5 | ||
|
||
api-reference | ||
|
||
Indices and tables | ||
================== | ||
|
||
* :ref:`genindex` | ||
* :ref:`modindex` | ||
* :ref:`search` |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_item()
?