Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation site #11

Merged
merged 26 commits into from
Mar 13, 2024
Merged
Changes from 1 commit
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
2fa0dc8
add build/ to gitignore
lukavdplas Feb 29, 2024
4c74bc0
add mkdocs to requirements
lukavdplas Feb 29, 2024
3bc77e5
generate empty documentation site
lukavdplas Feb 29, 2024
888c6c8
add mkdocstrings-python to requirements
lukavdplas Feb 29, 2024
5ab11a7
update mkdocs config
lukavdplas Feb 29, 2024
cf49670
add intro from readme
lukavdplas Feb 29, 2024
26f55ad
scaffold file structure, add installation instructions
lukavdplas Feb 29, 2024
2edcd97
generate documentation from docstrings
lukavdplas Feb 29, 2024
45f2154
docstrings for core module
lukavdplas Feb 29, 2024
6e4fb1c
typing improvements, docstrings for document csv reader
lukavdplas Mar 1, 2024
1aee3fe
docstrings for xlsx reader
lukavdplas Mar 1, 2024
1ff859f
docstrings and typing for xml reader
lukavdplas Mar 1, 2024
2e02484
typing and docstrings for html reader
lukavdplas Mar 1, 2024
31b3e69
extractor docstrings
lukavdplas Mar 1, 2024
36f3c22
more documentation for xml extractor
lukavdplas Mar 1, 2024
c7f2d94
document kwargs in extractor subclasses
lukavdplas Mar 1, 2024
45d8876
document supported extractors on reader classes
lukavdplas Mar 1, 2024
9e68def
update NotImplementedError messages
lukavdplas Mar 1, 2024
73a88e6
add CSV example
lukavdplas Mar 1, 2024
4b944fc
add usage document
lukavdplas Mar 4, 2024
84a6444
add module names to API documentation
lukavdplas Mar 4, 2024
c747bbe
more typedefs
lukavdplas Mar 4, 2024
c7ec435
add basic XML test
lukavdplas Mar 4, 2024
3f7ad29
add test for html reader
lukavdplas Mar 4, 2024
9b7d1b4
add xlsx reader test
lukavdplas Mar 4, 2024
bae6c28
remove WIP statement from documentation
lukavdplas Mar 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add intro from readme
lukavdplas committed Feb 29, 2024
commit cf496704670678d54161760fe6f5ed3412bfc9f7
8 changes: 7 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# I-analyzer Readers documentation

Welcome! This documentation is a work in progress.
**This documentation is a work in progress.**

`ianalyzer-readers` is a python module to extract data from XML, HTML, CSV or XLSX files.

This module was originally created for [I-analyzer](https://github.com/UUDigitalHumanitieslab/I-analyzer), a web application that extracts data from a variety of datasets, indexes them and presents a search interface. To do this, we wanted a way to extract data from source files without having to write a new script "from scratch" for each dataset, and an API that would work the same regardless of the source file type.

The basic usage is that you will use the utilities in this package to create a `Reader` class tailored to a dataset. You specify what your data looks like, and then call the `documents()` method of the reader to get an iterator of documents - where each document is a flat dictionary of key/value pairs.