Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.0a7 #26

Merged
merged 3 commits into from
Jul 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ authors:
email: [email protected]
identifiers:
- type: doi
value: 10.5281/zenodo.12528448
value: 10.5281/zenodo.12528447
description: The concept DOI of the work.
- type: url
value: "https://pypi.org/project/waybacktweets/"
Expand Down
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Wayback Tweets

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.12528448.svg)](https://doi.org/10.5281/zenodo.12528448) [![PyPI](https://img.shields.io/pypi/v/waybacktweets)](https://pypi.org/project/waybacktweets) [![docs](https://github.com/claromes/waybacktweets/actions/workflows/docs.yml/badge.svg)](https://github.com/claromes/waybacktweets/actions/workflows/docs.yml) [![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://waybacktweets.streamlit.app)
[![PyPI](https://img.shields.io/pypi/v/waybacktweets)](https://pypi.org/project/waybacktweets) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.12528447.svg)](https://doi.org/10.5281/zenodo.12528447) [![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://waybacktweets.streamlit.app) [![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1zRqi6uTMiGi5z8GQ-PC0tbpCJWULCqMO?usp=sharing)

Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see [Field Options](https://claromes.github.io/waybacktweets/field_options.html)), and saves the data in HTML (for easy viewing of the tweets using the `iframe` tag), CSV, and JSON formats.

Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see [Field Options](https://claromes.github.io/waybacktweets/field_options.html)), and saves the data in HTML, for easy viewing of the tweets using the iframe tags, CSV, and JSON formats.

## Installation

Expand Down Expand Up @@ -57,7 +58,7 @@ if archived_tweets:
## Acknowledgements

- Tristan Lee (Bellingcat's Data Scientist) for the idea of the application.
- Jessica Smith (Snowflake's Marketing Specialist) and Streamlit/Snowflake teams for the additional server resources on Streamlit Cloud.
- Jessica Smith (Snowflake's Community Growth Specialist) and Streamlit/Snowflake team for the additional server resources on Streamlit Cloud.
- OSINT Community for recommending the application.

> [!NOTE]
Expand Down
28 changes: 11 additions & 17 deletions app/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
layout="centered",
menu_items={
"About": f"""
[![GitHub release (latest by date including pre-releases)](https://img.shields.io/github/v/release/claromes/waybacktweets?include_prereleases)](https://github.com/claromes/waybacktweets/releases) [![License](https://img.shields.io/github/license/claromes/waybacktweets)](https://github.com/claromes/waybacktweets/blob/main/LICENSE.md) [![Star](https://img.shields.io/github/stars/claromes/waybacktweets?style=social)](https://github.com/claromes/waybacktweets)
[![License](https://img.shields.io/github/license/claromes/waybacktweets)](https://github.com/claromes/waybacktweets/blob/main/LICENSE.md)

The application is a prototype hosted on Streamlit Cloud, serving as an alternative to the command line tool.

Expand Down Expand Up @@ -168,16 +168,12 @@ def scroll_page():

# ------ User Interface Settings ------ #

st.info(
"🥳 [**Pre-release 1.0x: Python module, CLI, and new Streamlit app**](https://github.com/claromes/waybacktweets/releases)" # noqa: E501
)

st.image(TITLE, use_column_width="never")
st.caption(
"[![GitHub release (latest by date including pre-releases)](https://img.shields.io/github/v/release/claromes/waybacktweets?include_prereleases)](https://github.com/claromes/waybacktweets/releases) [![Star](https://img.shields.io/github/stars/claromes/waybacktweets?style=social)](https://github.com/claromes/waybacktweets)" # noqa: E501
"[![GitHub release (latest by date including pre-releases)](https://img.shields.io/github/v/release/claromes/waybacktweets?include_prereleases)](https://github.com/claromes/waybacktweets/releases) [![sponsor](https://img.shields.io/badge/Donate-via%20Sponsors-ff69b4.svg?logo=github)](https://github.com/sponsors/claromes)" # noqa: E501
)
st.write(
"Retrieves archived tweets CDX data in HTML (for easy viewing of the tweets using the `iframe` tag), CSV, and JSON formats." # noqa: E501
"Retrieves archived tweets CDX data in HTML (for easy viewing of the tweets using the iframe tag), CSV, and JSON formats." # noqa: E501
)

st.write(
Expand Down Expand Up @@ -291,15 +287,15 @@ def scroll_page():

# -- Rendering -- #

if csv_data and json_data and html_content:
st.session_state.count = len(df)
st.write(f"**{st.session_state.count} URLs have been captured**")
st.session_state.count = len(df)
st.write(f"**{st.session_state.count} URLs have been captured**")

# -- HTML -- #
tab1, tab2, tab3 = st.tabs(["HTML", "CSV", "JSON"])

st.header("HTML", divider="gray", anchor=False)
# -- HTML -- #
with tab1:
st.write(
f"Visualize tweets more efficiently through `iframes`. Download the @{st.session_state.current_username}'s archived tweets in HTML." # noqa: E501
f"Visualize tweets more efficiently through iframe tags. Download the @{st.session_state.current_username}'s archived tweets in HTML." # noqa: E501
)

col5, col6 = st.columns([1, 18])
Expand All @@ -317,8 +313,7 @@ def scroll_page():
)

# -- CSV -- #

st.header("CSV", divider="gray", anchor=False)
with tab2:
st.write(
"Check the data returned in the dataframe below and download the file."
)
Expand All @@ -340,8 +335,7 @@ def scroll_page():
st.dataframe(df, use_container_width=True)

# -- JSON -- #

st.header("JSON", divider="gray", anchor=False)
with tab3:
st.write(
"Check the data returned in JSON format below and download the file."
)
Expand Down
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
"sphinx_new_tab_link",
"sphinx_click.ext",
"sphinx_autodoc_typehints",
"sphinxcontrib.youtube",
]

templates_path = ["_templates"]
Expand Down
2 changes: 1 addition & 1 deletion docs/contribute.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ These are the prerequisites:
- Python 3.10+
- Poetry

Install from the source, following the :ref:`installation` instructions.
Install from the source, following the :ref:`installation_from_source` instructions.

Brief explanation about the code under the Wayback Tweets directory:

Expand Down
22 changes: 22 additions & 0 deletions docs/handson.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Hands-On Examples
====================

- **Notebook**

This notebook demonstrates how to fetch, parse, and export archived tweets for a specific user using the ``waybacktweets`` library.

.. image:: https://colab.research.google.com/assets/colab-badge.svg
:target: https://colab.research.google.com/drive/1zRqi6uTMiGi5z8GQ-PC0tbpCJWULCqMO?usp=sharing
:alt: Open In Collab

.. raw:: html

<br>
<br>

- **Video**

Demonstration of how to use Wayback Tweets and other tools to retrieve tweets (in Spanish)

.. youtube:: qy3wOnUxe6A
:width: 100%
8 changes: 5 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,11 @@ Wayback Tweets

Pre-release: |release|

Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see :ref:`field_options`), and saves the data in HTML (for easy viewing of the tweets using the ``iframe`` tag), CSV, and JSON formats.
Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see :ref:`field_options`), and saves the data in HTML, for easy viewing of the tweets using the iframe tags, CSV, and JSON formats.

.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.12528448.svg
:target: https://doi.org/10.5281/zenodo.12528448
.. image:: https://img.shields.io/badge/Donate-via%20Sponsors-ff69b4.svg?logo=github
:target: https://github.com/sponsors/claromes
:alt: GitHub Sponsors

.. note::
Intensive queries can lead to rate limiting, resulting in a temporary ban of a few minutes from web.archive.org.
Expand All @@ -30,6 +31,7 @@ User Guide
field_options
outputs
exceptions
handson
contribute
todo

Expand Down
38 changes: 29 additions & 9 deletions docs/installation.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
.. _installation:

Installation
================

**It is compatible with Python versions 3.10 and above.**

Using pip
------------
Expand All @@ -11,47 +10,68 @@ Using pip

pip install waybacktweets

Using Poetry
------------

.. code-block:: shell

poetry add waybacktweets

.. _installation_from_source:

From source
-------------

Clone the repository:
**Clone the repository:**

.. code-block:: shell

git clone [email protected]:claromes/waybacktweets.git

Change directory:
**Change directory:**

.. code-block:: shell

cd waybacktweets

Install poetry, if you haven't already:
**Install Poetry, if you haven't already:**

.. code-block:: shell

pip install poetry


Install the dependencies:
**Install the dependencies:**

.. code-block:: shell

poetry install

Run the CLI:
**Install the pre-commit:**

.. code-block:: shell

poetry run pre-commit install

**Run the CLI:**

.. code-block:: shell

poetry run waybacktweets [SUBCOMMANDS]

Run the Streamlit App:
**Starts a new shell and activates the virtual environment:**

.. code-block:: shell

poetry shell

**Run the Streamlit App:**

.. code-block:: shell

streamlit run app/app.py

Build the docs:
**Build the docs:**

.. code-block:: shell

Expand Down
8 changes: 2 additions & 6 deletions legacy_app/legacy_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,7 @@
layout="centered",
menu_items={
"About": """
## 🏛️ Wayback Tweets

Tool that displays, via Wayback CDX Server API, multiple archived tweets on Wayback Machine to avoid opening each link manually. Users can apply filters based on specific years and view tweets that do not have the original URL available.

This tool is a prototype, please feel free to send your [feedbacks](https://github.com/claromes/waybacktweets/issues). Created by [@claromes](https://claromes.com).
This is the legacy application of [Wayback Tweets](https://waybacktweets.streamlit.app/).

-------
""", # noqa: E501
Expand Down Expand Up @@ -386,7 +382,7 @@ def next_page():

# UI
st.title(
"Wayback Tweets [![Star](https://img.shields.io/github/stars/claromes/waybacktweets?style=social)](https://github.com/claromes/waybacktweets)", # noqa: E501
"Wayback Tweets", # noqa: E501
anchor=False,
help="v0.4.3",
)
Expand Down
22 changes: 21 additions & 1 deletion poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "waybacktweets"
version = "1.0a6"
version = "1.0a7"
description = "Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing, and saves the data."
authors = ["Claromes <[email protected]>"]
license = "GPLv3"
Expand Down Expand Up @@ -46,6 +46,7 @@ sphinxcontrib-mermaid = "^0.9.2"
sphinx-new-tab-link = "^0.4.0"
sphinx-click = "^6.0.0"
sphinx-autodoc-typehints = "^2.1.1"
sphinxcontrib-youtube = "^1.4.1"

[tool.poetry.group.dev.dependencies]
streamlit = "1.36.0"
Expand Down
2 changes: 1 addition & 1 deletion waybacktweets/_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def _parse_date(
"verbose",
is_flag=True,
default=False,
help="Shows the error log.",
help="Shows the log.",
)
def main(
username: str,
Expand Down
Loading