Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write test suite #11

Open
jordan-gillard opened this issue Feb 25, 2024 · 11 comments
Open

Write test suite #11

jordan-gillard opened this issue Feb 25, 2024 · 11 comments
Assignees
Labels
enhancement New feature or request

Comments

@jordan-gillard
Copy link
Collaborator

I need to preface this issue by saying I love the work you do and that I think this software is very cool.

I notice that this repo doesn't have unit tests. The result of this is that breaking changes and regressions are difficult to catch. I know you guys want to publish EDGAR to pypi, so this is really no bueno. Ideally we'd want a robust test suite that runs in CI for every PR. I think this is especially relevant to PR #10, since it involves major code changes and therefore the strong potential to break existing behavior.

I'm interested in helping Bellingcat, so I'd be happy to work on this or take on an advisory role.

@GalenReich
Copy link
Collaborator

Hi Jordan, thanks for opening this issue! We would really welcome a PR for a test suite 🙌 Hopefully the refactor in #10 (now merged) will help with that as there are now more atomic functions.

If you wanted to join, we have a channel in our Discord dedicated to development of this tool https://discord.gg/bellingcat

@JackCollins91
Copy link
Contributor

Hi @jordan-gillard and @GalenReich
I'd be happy to write a starting test suite.

My idea would be to engineer it such that the test functions go as far as formulating the URLs of the API calls and then validates that the URLs are properly formed, but validate nothing about what is returned to that call.

This way the test suite is not dependent on the EDGAR API to pass.

Then I could introduce a test which does try the API, but only raises a warning if it doesn't work. This way if there's an issue with the EDGAR API, we know there's a problem, but it doesn't cause unit testing to outright fail.

Let me know your thoughts. It is also fine to design the tests to go as far as hitting the API and retrieving results, we just need to establish some queries that we know will always return a certain result.

@rkiddy
Copy link

rkiddy commented Jun 26, 2024

@jordan-gillard Is there anything in process on this? I am not seeing anything in your fork. Which mocking modules do you usually use? I can start something but would want to do something compatible with where you are going.

@GalenReich
Copy link
Collaborator

I'm going to unassign @jordan-gillard as we haven't heard from him in a little while, and it seems that there is some enthusiasm here!

Jordan, if you want to get back involved in an advisory role, just hop back in, we'd be grateful to have you!

For @JackCollins91 and @rkiddy, we have used pytest in other repos, so that might be the way to go here (though I don't have a strong opinion if you wanted to advocate for a different approach). I like Jack's idea of having something to test the response from the API - sudden changes to 3rd party APIs cause problems for us across our tools. Setting the start_date and end_date fields sufficiently in the past should give consistent results.

I'm going to assign Jack to this issue as he's posted a bit of starting detail, but I'd encourage you both to collaborate on this. Though if you'd rather work independently, I'd also welcome a PR to add tests to https://ShadowFinder.

Let me know if you'd like to collaborate on this or be assigned elsewise.

@JackCollins91
Copy link
Contributor

Hey @rkiddy, I'm happy to implement this, but wouldn't have time until about September (maybe sooner, but not certain). If you're enthusiastic and have time before then, you're welcome to take over this PR and I'd be happy to chat/code review or advise, etc.

If not, no problem - i'll still get to this as soon as I clear my present crunch time.

Cheers

@jordan-gillard
Copy link
Collaborator Author

jordan-gillard commented Jun 29, 2024

Hi folks! So sorry for being MIA. I've been swamped with some new work responsibilities.

@JackCollins91 I initially saw your comment on my phone in the morning, and I thought it came from Discord! I couldn't find it (obviously) and then brushed it off 😅

@rkiddy In my experience, almost all Python testing can be accomplished with Python's built-in unittest library and pytest.

As for testing, I think anything is better than nothing. When adding tests to existing code it often begins with integration tests. I definitely agree in @JackCollins91's assessment that we shouldn't use the actual EDGAR API. You could mock out network calls so that they always return a predetermined response. That's usually the way to go. This project uses the requests library, and mocking responses is pretty straight forward. I'd use a pytest fixture and the requests-mock library.

For example, if you were testing fetch_rss_feed in rss.py, you'd want to mock out the requests.get call here:

response = requests.get(RSS_FEED_URL, headers=headers)

Your test file, lets call it test_rss.py, could accomplish this with a pytest fixture:

@pytest.fixture
def mock_requests_get():
    with requests_mock.Mocker() as m:
        yield m

and then the test would use the fixture and set the response you are testing against:

def test_fetch_rss_feed(mock_requests_get):
    """Tests that we do x when the api returns y
    yadda yadda yadda
    """
    # GIVEN
    sample_rss_feed_response = """
    <rss>
        <channel>
            <item>
                <title>Something something idk</title>
            </item>
        </channel>
    </rss>
    """
    mock_requests_get.get(RSS_FEED_URL, text=sample_rss_feed_response)

    # WHEN
    fetch_rss_feed(...)

    # THEN (though you'd surely want to test more than this)
    mock_requests_get.assert_called_once_with(RSS_FEED_URL, headers=headers)

Please don't wait on me to add tests! I will work on it today but tbh I don't even remember what I was up to when I last checked out the repo 😆

@rkiddy
Copy link

rkiddy commented Jul 2, 2024

Sorry if this seems lame, but can someone demonstrate how to run something in the local source. I do a lot with python, but mostly with flask apps and I am not familiar with this packaging style.

I tried:

% virtualenv .venv
% ./venv/bin/python3 -m pip install edgar-tool

But then, how do I execute local code? I made a change to the help string in both edger_tool/rss.py and in edgar_tool/.venv/lib/python3.10/site-packages/edgar_tool/rss.py and running this shows no change:

./.venv/bin/edgar-tool rss --help

I am sure I am confused about something here. If I can run something local, I can write tests.

@JackCollins91
Copy link
Contributor

Hey @rkiddy no worries, mate.
I'm not sure what machine you're using, but here's what I did:

  1. download/install miniconda
  2. create a virtual environment with minicinda, not venv
    conda create -n edgar
    conda activate edgar
  3. cd <your Edgar repo path>
  4. pip install the dependencies (and poetry if that's not included)
  5. now you can run the CLI in a local dev mode by running CLU like:
    poetry run edgar-tool ...

Let me know if that works:)

@GalenReich
Copy link
Collaborator

The project uses poetry to handle virtual environments, so even using conda or virtualenv isn't necessary (though to install poetry you're welcome to do what you like - poetry recommmends using pipx for the install pipx install poetry!).

As per the README:

Install poetry if you haven't already

pip install poetry

Install dependencies

poetry install

Run the tool

poetry run edgar-tool --help

When you do poetry install it creates a virtual environment managed by poetry (if you use an IDE that lets you select a python interpreter it will often autodetect the poetry environment). All dependencies will be installed in the poetry virtual environment. Then you can run commands within the virtual environment with poetry run .... Hope that helps! 🎉

@rkiddy
Copy link

rkiddy commented Jul 4, 2024

I might not be able to help. I think I must use python in a not very pythonic way.

% cd edgar-tool
% cat test_rss.py
import pytest
import requests_mock
import rss


@pytest.fixture
def mock_requests_get():
    with requests_mock.Mocker() as m:
        yield m


def test_fetch_rss_feed(mock_requests_get):
    """Tests that we do x when the api returns y
    yadda yadda yadda
    """
    # GIVEN
    sample_rss_feed_response = """
    <rss>
        <channel>
            <item>
                <title>Something something idk</title>
            </item>
        </channel>
    </rss>
    """
    mock_requests_get.get(rss.RSS_FEED_URL, text=sample_rss_feed_response)

    # WHEN
    fetch_rss_feed(...)

    # THEN (though you'd surely want to test more than this)
    mock_requests_get.assert_called_once_with(RSS_FEED_URL, headers=headers)

% virtualenv .venv
% alias vp='./.venv/bin/python'
% vp -m pip install pytest
% vp -m pip install requests_mock
% ./.venv/bin/pytest test_rss.py 
...
E   ModuleNotFoundError: No module named 'xmltodict'
% vp -m pip install xmltodict
% ./.venv/bin/pytest test_rss.py 
...
E   ModuleNotFoundError: No module named 'jsonlines'
% vp -m pip install jsonlines
% ./.venv/bin/pytest test_rss.py 
...
E   ImportError: cannot import name 'fetch_rss_feed' from 'edgar_tool' (/home/ray/Projects/edgar_bc/edgar_tool/__init__.py)
% ./.venv/bin/pytest test_rss.py 
...
E   ModuleNotFoundError: No module named 'rss'
% vp -m pip install edgar_tool
% ./.venv/bin/pytest test_rss.py 
...
E   ModuleNotFoundError: No module named 'rss'

@jordan-gillard
Copy link
Collaborator Author

I might not be able to help. I think I must use python in a not very pythonic way.
...

@rkiddy You are using virtualenv, but the recommended workflow @GalenReich mentions in #11 (comment) is to use Poetry. I strongly encourage you to use the development workflow recommended in the project's README.md file. You'll have to click the line that says Expand to view information for developers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants