Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs do not explain how to iterate and experiment when drafting a scraper #134

Open
palewire opened this issue Aug 8, 2021 · 5 comments

Comments

@palewire
Copy link
Contributor

palewire commented Aug 8, 2021

There's a little bit of a gap in the contributing docs. Say you've got the repo installed. Say you want to write a scraper. Say you have identified a site to scrape. Say you have even started writing your Python Site class to implement a scraper.

Now, how do you run your scraper as you iterate and experiment?

My guess based on past experience is that you would install the CLI in development mode using setup.py. But there are other techniques as well, like a test-driven development where you add features as you write new unittests.

How do you do it? How would you recommend a newbie take this on?

@palewire
Copy link
Contributor Author

palewire commented Aug 8, 2021

In the past, I've used a technique like this:

pipenv run pip install --editable .

@zstumgoren
Copy link
Member

@palewire Great point. Definitely think it would be helpful for docs to fill in that workflow gap. I'd vote for initially docs that suggest a simple editable install for local manual testing as opposed to requiring TDD up front, with hope that would make contributing less of a slog for folks who haven't yet done formal unit testing.

In terms of documenting the workflow for a parallel editable install, what do you think about a basic example that uses the built-invenv? Or even a global/system install?

@palewire
Copy link
Contributor Author

palewire commented Aug 9, 2021

This is a situation where I don't know the different options well enough to have a discerning opinion. The --editable trick worked well for me, but it may have shortcomings I'm unaware of.

@zstumgoren
Copy link
Member

@palewire Oh, forgot to directly address your question about my workflow. I tend to use TDD (with pytest runs and then tox once the dust has settled), along with manual testing in an editable install. But I'd worry a little about requiring that workflow (which can be a PITA) for all potential contributors. I think if you've written code and tested manually and/or perhaps even have a few basic unit tests that you checked with pytest, we could rely on CI or testing by core maintainers for the last mile of automated testing. Feels like that could lower the bar to entry for contributions, but let me know what you think.

@zstumgoren
Copy link
Member

@palewire I think editable is a solid option, fwiw, and as mentioned above, makes it easier to get up and running. But shout back if you think we should recommend the TDD + editable workflow or some other variation that requires folks to get more deeply into unit testing before submitting PRs.

@zstumgoren zstumgoren reopened this Aug 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants