Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write end-to-end test suite #45

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JackCollins91
Copy link
Contributor

@JackCollins91 JackCollins91 commented Nov 23, 2024

Closes Issue #11
Replaces PR #39 (which I will close unmerged after creating this).

This PR introduces two unit tests that provide end-to-end coverage for the RSS and SEC search functionality. Both tests operate by executing a simple CLI command and verifying that the output file is:

  • Created successfully.
  • Matches the expected data structure.

To maintain tidiness, the patch decorator is used to prevent actual output files from being created during testing. These tests ensure that any application relying on edgar-tool's data structure will remain unaffected by internal changes to the edgar-tool logic.


Notes

  1. Test Warnings Instead of Failures
    These tests may fail due to external factors, such as:

    • Internet connectivity issues.
    • Changes to the SEC’s data structure.

    To address this, the tests issue warnings instead of outright failures. This prevents us from being blocked by unit test failures caused by changes outside the scope of the edgar-tool code.
    Question: Is this approach too lenient?

  2. Asserting Actual Values vs. Structure
    While we discussed asserting actual values in the returned data, this could result in overly rigid tests. Changes to the SEC’s backend could cause slight variations in return values, making strict assertions problematic.

    Instead, I propose a middle ground: asserting only the correctness of the data structure. This approach is particularly necessary for the RSS feed, where data changes frequently.
    For older SEC data, we might expect greater stability, but I am not confident enough to assert this. I'm open to revisiting this approach if desired.

  3. Future Changes Related to Create new URL encoding function to align with SEC API #40
    The changes in Create new URL encoding function to align with SEC API #40 are currently in progress. While the URL generation code resides in edgar_tool\url_generator.py, it has not yet been integrated into edgar_tool\cli.py.

    A future PR will cut over text-search functionality to use the new url_generator.
    The tests introduced in this PR do not duplicate any functionality that would become redundant after the cutover. Instead, they ensure that the output data structure remains unchanged following this update.

@JackCollins91
Copy link
Contributor Author

Review appreciated from @jordan-gillard and @GalenReich whenever available :)

Copy link
Collaborator

@jordan-gillard jordan-gillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAN. So so sorry for the incredibly late review. I promise to follow up much quicker in the future. Hope all is well & happy new year!

Comment on lines +69 to +75
except Exception as e:
# Log a warning instead of failing the test
warnings.warn(
f"An exception occurred: {str(e)}\n"
"There might be an issue with accessing the SEC website or the SEC's return payload.",
UserWarning
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we let it fail? We can also configure the test suite to run daily and not just on PRs. A warning log can easily go unnoticed.

@@ -27,3 +29,90 @@ def test_cli_should_return_help_string_when_passed_no_args():
# THEN
assert result.returncode == 0
assert result.stdout.strip() == expected.strip()

@patch("edgar_tool.text_search.write_results_to_file")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think about using pytest's tmp_path fixture? It was made for this exact thing! 🪄

def test_example(tmp_path):
    # GIVEN
    output_file = tmp_path / "results.csv"

    # WHEN
    SecEdgarScraperCli.text_search(
        "John Doe",
        output=str(output_file),
        start_date="2021-01-01",
        end_date="2021-01-31"
    )

    # THEN
    written_data = output_file.read_text()
    # And you can make your assertions directly against the content
    #  of the file - it's truly end-to-end
    assert "'root_form', 'form_name'" in written_data

f"An exception occurred: {str(e)}\n"
"There might be an issue with accessing the RSS feed or the return payload.",
UserWarning
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to run those git-commit hooks 🙃

Suggested change
)
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants