-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docs #328
base: master
Are you sure you want to change the base?
Add docs #328
Conversation
The docs currently resides in index. Mostly are result of sphinx quickstart. Will move later. The docstrings are still in very initial phase. I wasn't experienced enough with the library to warrant a thorough docstring.
also move/restructure the rst
Thank you for your attempt at this! The lack of documentation is definitely a major issue, so I do appreciate this! One thing that immediately sticks out for me is something I'm deeply allergic to, though I'm not sure I spelt it out anywhere before, so apologies for that. It's duplication of information. It makes future maintenance harder because one has to remember to update things in multiple places. This is inevitably forgotten eventually, which then leads to inconsistencies and confusion. There are three primary examples here:
It probably makes sense to split the documentation writing into three parts: type hints and docstrings for all public API, separate documentation (like a general introduction, CLI vs Python layer, and the Twitter example), and everything directly related to Sphinx (configuration etc.). Most of my points above fall into the last part. Perhaps you'd like to focus on just one for now? A couple comments on style. For docstrings, it should be this, in line with the few existing ones: def func(foo: str) -> str:
'''Basic description
Args:
foo: a description of foo's significance
Returns:
a description of the return value
'''
return foo That is: single quotes, no empty line at the end of the docstring, and an empty line between the docstring and the code. Further, I think the class description should go into a class-level docstring, and only For the rST files, the same style as for my Python code should be used: tabs for indentation, spaces for alignment. Lines should be broken only where it makes sense (i.e. one paragraph = one line); I'm not a fan of breaking text into lines at random points based on 1980s-era screen widths. :-) That's all I can think of right now. Looking forward to seeing where this is heading! :-) |
And @TheTechRobo's comment is correct. #6 is about high-level documentation of the existing scrapers, what they target, returned item fields, etc., while #7 is about the usage of snscrape from Python. There's some overlap though, naturally. |
as noted in PR thread JustAnotherArchivist#328
Thank you for your feedback. This is one of my first attempts to contribute to a project that's not my own, so I understand that I need to adjust some style accordingly. I also agree with the points you brought up about (avoiding) code duplication. I dislike unnecessary duplication myself. It's just that I wasn't (and still am not) experienced with setting up Sphinx, and I think that showed in my previous attempts. I need to look further and try to configure Sphinx better so that I have some questions, though:
class InstagramUserScraper(InstagramCommonScraper):
...
def __init__(self, name, **kwargs):
super().__init__(mode="User", **kwargs)
...
|
Yeah, I have little experience with Sphinx myself as well. Incidentally, that's part of why I put off working on the docs for such a long time. But I'm sure we can find a good solution. Good points on the Instagram and Reddit scrapers. Yes, both of those are a mess and should be refactored. I'll look into that. On that note, I should also revisit the public vs private parts of the code. For example, I don't consider |
Add initial edits into conf.py. I'm trying things out right now.
copy-pasted from https://stackoverflow.com/a/62613202
The previous commit (copypaste template from SO) has following behavior when tested on my local device: 1. It generated all necessary stub .rst files under /_autosummary. (good) 2. It generated all HTMLs for each module, submodule, and class. (good) 3. (the bug) snscrape.modules page lists -- but does not link to any of -- its submodules. The individual HTMLs for each submodule exist, it is just not linked in toctree. I can open each resulting HTML manually on my browser. I've did some trial-and-error, this commit is reflecting the minimal change that makes the snscrape.modules page links to its submodules. At least it works.
I have questions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just something that confused me for a second
Yields belong to get_items instead of __init__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick skim found this
I'm getting this:
|
Oh, I think it's because I'm on the dev version. You should add a check for that |
It does not happen on my end. The following is the snscrape version:
>>> from importlib.metadata import metadata
>>> M = metadata('snscrape')
>>> M['version']
'0.3.5.dev231+g0832e95'
I'm not sure why my version is still at 0.3.5.dev ... surely it should be at 0.4.x by now? I've merged master to my branch every now and then, too. Is this okay? I installed |
@lahdjirayhan The version is only updated when you run |
Also, apologies, I didn't notice that you marked this as ready for review. I'll take a look soon! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't see anything other than these minor style questions. This is great!
''' | ||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To match up with the rest of the docstrings, shouldn't there not be a newline there? (very minor i know but...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Args
, Kwargs
, Returns
, Yields
, etc. is usually positioned unindented in the left according to Google style. In addition to that, I feel uncomfortable putting Args:
right after '''
because it is usually reserved for docstring summary.
On the other hand, I don't think init docstring needs more explanation. The relevant information for each class is already given in the class docstring.
''' | ||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here.
@JustAnotherArchivist Thanks for the info on the version. I'll try it out as soon as I can. Update: Nope, I've tried rerunning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very quick skim leavs me saying LGTM.
@lahdjirayhan I'm not sure that merging the branch in also gets the tags. That could be the problem. |
|
||
#. **Instantiate a scraper object.** | ||
``snscrape`` provides various object classes that implement their own specific ways. For example, :class:`TwitterSearchScraper` gathers tweets via search query, and :class:`TwitterUserScraper` gathers tweets from a specified user. | ||
#. **Call the scraper's** ``get_item()`` **method.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_item()
?
#. **Instantiate a scraper object.** | ||
``snscrape`` provides various object classes that implement their own specific ways. For example, :class:`TwitterSearchScraper` gathers tweets via search query, and :class:`TwitterUserScraper` gathers tweets from a specified user. | ||
#. **Call the scraper's** ``get_item()`` **method.** | ||
``get_item()`` is an iterator and yields one item at a time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
I think I can try writing some docs (docstrings, examples). I may not be able to document all scrapers and all the machinery in it at the moment, just the ones I've used or can fairly understand. That being said, I think having some sort of documentation on this library is still a good thing to do.
Should I continue working on this, @JustAnotherArchivist? I apologize in advance if this PR feels like it comes out of nowhere.
sidenote: I'm not sure the difference between #6 and #7, and whether this PR is solving which.