-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/quarto identify source files #200
Fix/quarto identify source files #200
Conversation
765d2e9
to
0e784a9
Compare
Regarding the failing check on windows: I'm a bit helpless here. I don't have a windows system to debug the issue and don't know exactly why paths are handled differently between the systems. One Do you have any idea, @wlandau ? |
From the build log:
So I suspect we need |
006b5e7
to
bf0f86f
Compare
Thanks for your feedback and sorry for my late reply!
I think that won't help much. There is already a On windows, As I don't have a windows system for testing, I can't verify if this is only an issue in CI or also in a real case scenario. Would you mind, adding a |
In the latest commit, I changed how the same files are detected (see here). But I don't think that will actually solve the problem. If it doesn't help, I will remove the commit again. However, currently all tests on Windows and Mac are failing on the CI because the package |
IIRC this an issue with GitHub Actions and
I think that helps, but let's please avoid the native pipe because
You can fix it by merging e23a780 into your branch. |
52b8398
to
d6f1772
Compare
Thanks! That worked out!
Unfortunately, that didn't help. See here. Even if that would work, the file
Done! |
out <- list() | ||
|
||
# Collect data about source files. | ||
out$sources <- tar_quarto_files_get_source_files(info$fileInformation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As currently written, this function is based on detecting calls to tar_read()
and tar_load()
. tar_render()
source files do not necessarily call these functions. Is there a more reliable way to detect source files? If we know they're all qmd files that will be run/rendered, can we just include all of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether I do understand your comment correctly. Is your question whether files that do not use tar_read()
or tar_load()
are still somehow detected and used as a dependency?
The doc says the following:
A named list of important file paths in a Quarto project or document:
sources
: source files with tar_load()/tar_read() target dependencies in R code chunks.output
: output files that will be generated during quarto::quarto_render().input
: pre-existing files required to render the project or document, such as _quarto.yml.
I followed this documentation and collected only files that have tar_read
or a tar_load
in them in sources
. All other included files that do not call tar_read/load
are in input
. See this test.
To be honest, I'm not sure whether the differentiation between these three list items is necessary for targets as all of them are simply treated as file dependencies (at least this way I understand the code). In addition, it would be really easy for tar_quarto_files
to detect which targets are loaded because fileInformation
contains the code cells. This way, the call to knitr_deps
in tar_quarto_raw
could be eliminated. But I didn't change that to not alter the interface of these functions.
Did I understand your question correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your question is whether also files are detected that are included by the traditional knitr (tar_render
) way ( ```{r child="file1"
): this is the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed this documentation and collected only files that have tar_read or a tar_load in them in sources.
Sources can have tar_read()
/tar_load()
, but they don't have to, and they don't need to be distinguished from source files that don't have tar_read()
/tar_load()
. tarchetypes
just needs to know which files to scan for tar_read()
/tar_load()
using static code analysis, and it needs to know the inputs to quarto_render()
.
To be honest, I'm not sure whether the differentiation between these three list items is necessary for targets as all of them are simply treated as file dependencies (at least this way I understand the code).
The sources are the files scanned with knitr_deps()
. The other input files do not necessarily even have code chunks that can be parse()
'd as valid R code.
In addition, it would be really easy for tar_quarto_files to detect which targets are loaded because fileInformation contains the code cells.
That part could be useful for just extracting the code chunks, but we already can use knits::knit(tangle = TRUE)
, and it is a unified approach that also works with R Markdown and knitr
.
This way, the call to knitr_deps in tar_quarto_raw could be eliminated.
Regardless of where we get the text of the code chunks, knitr_deps()
is still needed because it runs static code analysis to robustly detect tar_load()
/tar_read()
calls and the targets that they load/read. It is much more reliable to walk the AST than manipulate text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback and the clarifications!
Sources can have tar_read()/tar_load(), but they don't have to, and they don't need to be distinguished from source files that don't have tar_read()/tar_load(). tarchetypes just needs to know which files to scan for tar_read()/tar_load() using static code analysis, and it needs to know the inputs to quarto_render().
Thanks for clarifying this. I did not understand the documentation in such a way. I changed the documentation to make this point clearer. Additionally, I reworked the code and the tests to reflect this. Things are much simpler now.
Regarding knitr_deps()
: Sorry for not being clear here. My intention was not to completely remove the logic behind knitr_deps()
but to not read the files once again as quarto::quarto_inspect
has already done this. For example, one could create an AST from the code snippets provided by quarto inspect. However, in all my reports, report generation takes much longer than tarchetypes to decide whether to rerun the report or not. Thus, it probably doesn't matter and it is the easiest solution to keep everything as is.
Previously, all input files were regarded as source files. However, this might be wrong: In quarto, other files can be imported via `{< inlucde file.qmd >}`. Thus, these file don't have to contain code cells. In addition, included files that contained code cells were not added to the list of source files (even though the might contain `tar_read` statements).
and move code into its own function.
In the project case, the config file is the only missing file in `fileInformation`. Thus, we can simplify the code and only add the config file to the `source` vector. All other files are handled via the `fileInformation` loop.
8309533
to
56fe256
Compare
56fe256
to
f1fdf0a
Compare
After the discussion and clarifications above, I still think this MR has some benefits:
|
Taking a step back here, would you elaborate on the specifics? What is |
I tried out https://github.com/mutlusun/tarchetypes/tree/fix/quarto-identify-source-files, and it works great. Thanks for the contribution! I made minor commits to touch things up, most notably I removed
With
and
To remove the double
But then I got:
This is definitely an issue with Quarto 1.6.33. |
@mutlusun, feel free to post another PR to add yourself as a contributor ( |
Dear @wlandau , thank you for the kind words and merging this MR!
I ran into the same issue. I was at quarto 1.35 locally and did not receive any errors. After the CI failed, I switched to the latest stable version (1.57) and ran into the same errors. I was thinking about investigating these errors, but had no time so far. Probably, I will open an issue there. |
Prework
Related GitHub issues and pull requests
Summary
According to the documentation, the
source
field contains files that contain atar_read
ortar_load
command.Currently, all input files are regarded as source files. However, this might be wrong: In quarto, other files can be imported via
{< inlucde file.qmd >}
. Thus, the input files don't have to contain code cells. In addition, included files that contained code cells were not added to the list of source files (even though the might containtar_read
statements).