Skip to content

Commit

Permalink
outputdir + quickstart instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
adbar committed Apr 23, 2020
1 parent 2956cc7 commit 311da8c
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 2 deletions.
3 changes: 2 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ On the command-line:
$ trafilatura -u "https://github.blog/2019-03-29-leader-spotlight-erin-spiceland/"
# outputs main content and comments as plain text ...
For more information please refer to the `usage documentation <usage.html>`_.
For more information please refer to `quickstart <quickstart.html>`_, `usage documentation <usage.html>`_ and `tutorial <tutorial1.html>`_.


License
Expand Down Expand Up @@ -160,6 +160,7 @@ Further documentation
corefunctions
evaluation
installation
quickstart
usage
validation
tutorial1
Expand Down
35 changes: 35 additions & 0 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
Quickstart
==========


With Python
-----------

.. code-block:: python
>>> import trafilatura
>>> downloaded = trafilatura.fetch_url('https://github.blog/2019-03-29-leader-spotlight-erin-spiceland/')
>>> trafilatura.extract(downloaded)
# outputs main content and comments as plain text ...
>>> trafilatura.extract(downloaded, xml_output=True, include_comments=False)
# outputs main content without comments as XML ...
For arguments of the ``extract`` function see `core functions <corefunctions.html>`_.


On the command-line
-------------------

.. code-block:: bash
$ trafilatura -u "https://github.blog/2019-03-29-leader-spotlight-erin-spiceland/"
# outputs main content and comments as plain text ...
$ trafilatura --xml --nocomments -u "URL..."
# outputs main content without comments as XML ...
$ trafilatura -h
usage: trafilatura [-h] [-f] [--formatting] [-i INPUTFILE] [-o OUTPUTDIR]
[--nocomments] [--notables] [--csv] [--xml] [--xmltei]
[--validate] [-u URL] [-v]
For more information please refer to `usage documentation <usage.html>`_ and `tutorials <tutorial1.html>`_.
4 changes: 3 additions & 1 deletion docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,14 +86,16 @@ The ``-i/--inputfile`` option allows for bulk download and processing of a list

For all usage instructions see ``trafilatura -h``:

``usage: trafilatura [-h] [-f] [--formatting] [-i INPUTFILE] [--nocomments] [--notables] [--xml] [--xmltei] [-u URL] [-v]``
``usage: trafilatura [-h] [-f] [--formatting] [-i INPUTFILE] [-i OUTPUTDIR] [--nocomments] [--notables] [--xml] [--xmltei] [-u URL] [-v]``

optional arguments:
-h, --help show this help message and exit
-f, --fast fast (without fallback detection)
--formatting include text formatting (bold, italic, etc.)
-i INPUTFILE, --inputfile INPUTFILE
name of input file for batch processing
-o OUTPUTDIR, --outputdir OUTPUTDIR
write results in a specified directory (relative path)
--nocomments don't output any comments
--notables don't output any table elements
--csv CSV output
Expand Down

0 comments on commit 311da8c

Please sign in to comment.