-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(loaders): loaders for files and pdfs #55
Conversation
b947112
to
999d724
Compare
TODO: Clippy fails due to some eluded lifetimes |
@0xMochan can you merge main into your branch? There are some updates to the CI pipeline necessary to support feature flags |
657c68b
to
74f32e6
Compare
@0xMochan seems like there are a lot of fmt/clippy errors still (unrelated to lifetimes), could you resolve those before I start my review? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of things for both loaders:
- We should look into splitting the
new
constructor intowith_glob
andwith_dir
. Playing with it right now and I'm noticing that, for example:
let loader = FileLoader::new("my-dir/")?;
does not work, but using "my-dir/*"
does work, which is not very clear.
2. Implement IntoIterator
see my detailed comment on file.rs
3. Docstrings!!!
4. An example or two would go a long way towards evaluating the DevEx of the feature. For the FileLoader
one, a trivial self-contained example could simply load the Cargo.toml
from the workspace.
Updates
Questions / Concerns
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Couple things to change in the docstrings, as well as some more open questions/comments.
This PR adds loader structs that help load files from the disk. It also adds an optional dependency,
lopdf
, for working w/ PDFs.Implementation
FileLoader
PdfLoader
Both of these structs implement a typestate pattern that enforces state ransitions to happen in a specific order (defined by a state tree). This means, once a
glob
has been processed, you can only iterate on those files by reading them, or applying other specific methods likeignore_errors
until you end with aniter
to finally output the iterator.iterator
versusiter_generator
It was impossible to go the
iter_generator
route due to lifetimes. Since theignore_errors
method introduced a lifetime, adding a function that generates an iterator would have resulting in adding the'static
lifetime which seems like the wrong route.Traits / Reusability
The traits defining the main
*Loader
methods we removed since it required the user to import those traits in order to use the API. This also resulted in some duplication amongst theFileLoader
andPdfLoader
API but one way around this is a 2 layer setup (similar to howReadable
andLoadable
is currently setup) as it allows for reusable code w/o relying on global traits. I'm curious on whether I should apply this to everything.PDF specific
Vec<(usize, String)>
since the data needs to be owned (it's impossible to return a nested iterator afaik).Readable
reusabilityI do reuse
Readable
(bad name lol) but theError
type is hardcoded making it really awkward to use inpdf.rs
. I tried to add thetype Error;
to the trait but whenReadable
implementsResult<PathBuf, FileLoaderError>
, I can't use that type since it's inside the implementation. If I use generics, it's also not possible due to some fringe type error.