Fix/quarto identify source files #200

mutlusun · 2024-10-12T19:06:58Z

Prework

I understand and agree to this repository's code of conduct.
I understand and agree to this repository's contributing guidelines.
I have already submitted an issue or discussion thread to discuss my idea with the maintainer.

Related GitHub issues and pull requests

Ref: --

Summary

According to the documentation, the source field contains files that contain a tar_read or tar_load command.

Currently, all input files are regarded as source files. However, this might be wrong: In quarto, other files can be imported via {< inlucde file.qmd >}. Thus, the input files don't have to contain code cells. In addition, included files that contained code cells were not added to the list of source files (even though the might contain tar_read statements).

mutlusun · 2024-10-14T07:30:13Z

Regarding the failing check on windows: I'm a bit helpless here. I don't have a windows system to debug the issue and don't know exactly why paths are handled differently between the systems.

One file1.qmd contains only "file1.qmd" the other file1.qmd contains also the way up until "root" (../../../../../) and then descends again into the current directory. Probably, the two paths are created on different ways which leads to different results.

Do you have any idea, @wlandau ?

wlandau · 2024-10-21T18:13:21Z

From the build log:

── Failure ('test-tar_quarto.R:521:3'): tar_quarto() reruns if target changes in included file ──
  sort(basename(out)) (`actual`) not equal to sort(c("main.html", "main.qmd", "file1.qmd")) (`expected`).
  
  `actual`:   "file1.qmd" "file1.qmd" "main.html" "main.qmd"
  `expected`: "file1.qmd"             "main.html" "main.qmd"

So I suspect we need out$input <- unique(out$input) just after the loop at https://github.com/mutlusun/tarchetypes/blob/0e784a9d44e8f158d6e21c1800e87e19719ace4e/R/tar_quarto_files.R#L79.

mutlusun · 2024-10-29T11:18:53Z

Thanks for your feedback and sorry for my late reply!

So I suspect we need out$input <- unique(out$input) just after the loop at https://github.com/mutlusun/tarchetypes/blob/0e784a9d44e8f158d6e21c1800e87e19719ace4e/R/tar_quarto_files.R#L79.

I think that won't help much. There is already a unique that reduces the files at the main function here.

On windows, fs::path_rel does not seem to be able to produce the correct file path consistently. One file has a correct relative path like report/file1.qmd and the other gets a relative path like ../../../../run/dir/dir/report/file1.qmd. Because of these differing paths, the same file won't be recognized as the same file. See here for a CI run that prints the whole path of the respective files.

As I don't have a windows system for testing, I can't verify if this is only an issue in CI or also in a real case scenario. Would you mind, adding a unique in the windows test job? Do you see another solution?

mutlusun · 2024-10-29T11:21:19Z

In the latest commit, I changed how the same files are detected (see here). But I don't think that will actually solve the problem. If it doesn't help, I will remove the commit again.

However, currently all tests on Windows and Mac are failing on the CI because the package autometric does not exist. Do you know why?

wlandau · 2024-10-30T15:21:47Z

On windows, fs::path_rel does not seem to be able to produce the correct file path consistently. One file has a correct relative path like report/file1.qmd and the other gets a relative path like ../../../../run/dir/dir/report/file1.qmd. Because of these differing paths, the same file won't be recognized as the same file. See here for a CI run that prints the whole path of the respective files.

IIRC this an issue with GitHub Actions and fs. You can work around it by testing against file.path(basename(dirname(path)), basename(path)) instead of the raw path.

In the latest commit, I changed how the same files are detected (see here).

I think that helps, but let's please avoid the native pipe because tarchetypes has Depends: R (>= 3.5.0).

However, currently all tests on Windows and Mac are failing on the CI because the package autometric does not exist. Do you know why?

You can fix it by merging e23a780 into your branch.

mutlusun · 2024-10-30T20:49:11Z

You can fix it by merging e23a780 into your branch.

Thanks! That worked out!

IIRC this an issue with GitHub Actions and fs. You can work around it by testing against file.path(basename(dirname(path)), basename(path)) instead of the raw path.

Unfortunately, that didn't help. See here. Even if that would work, the file file1.qmd would appear twice and a unique would be necessary. Thus, I solved it now by using unique for the the windows platform. Interestingly, this issue exists only on this single test.

I think that helps, but let's please avoid the native pipe because tarchetypes has Depends: R (>= 3.5.0).

Done!

wlandau · 2024-11-04T17:02:51Z

R/tar_quarto_files.R

+  out <- list()
+
+  # Collect data about source files.
+  out$sources <- tar_quarto_files_get_source_files(info$fileInformation)


As currently written, this function is based on detecting calls to tar_read() and tar_load(). tar_render() source files do not necessarily call these functions. Is there a more reliable way to detect source files? If we know they're all qmd files that will be run/rendered, can we just include all of them?

I'm not sure whether I do understand your comment correctly. Is your question whether files that do not use tar_read() or tar_load() are still somehow detected and used as a dependency?

The doc says the following:

A named list of important file paths in a Quarto project or document:

sources: source files with tar_load()/tar_read() target dependencies in R code chunks.

output: output files that will be generated during quarto::quarto_render().

input: pre-existing files required to render the project or document, such as _quarto.yml.

I followed this documentation and collected only files that have tar_read or a tar_load in them in sources. All other included files that do not call tar_read/load are in input. See this test.

To be honest, I'm not sure whether the differentiation between these three list items is necessary for targets as all of them are simply treated as file dependencies (at least this way I understand the code). In addition, it would be really easy for tar_quarto_files to detect which targets are loaded because fileInformation contains the code cells. This way, the call to knitr_deps in tar_quarto_raw could be eliminated. But I didn't change that to not alter the interface of these functions.

Did I understand your question correctly?

If your question is whether also files are detected that are included by the traditional knitr (tar_render) way ( ```{r child="file1"): this is the case.

I followed this documentation and collected only files that have tar_read or a tar_load in them in sources.

Sources can have tar_read()/tar_load(), but they don't have to, and they don't need to be distinguished from source files that don't have tar_read()/tar_load(). tarchetypes just needs to know which files to scan for tar_read()/tar_load() using static code analysis, and it needs to know the inputs to quarto_render().

To be honest, I'm not sure whether the differentiation between these three list items is necessary for targets as all of them are simply treated as file dependencies (at least this way I understand the code).

The sources are the files scanned with knitr_deps(). The other input files do not necessarily even have code chunks that can be parse()'d as valid R code.

In addition, it would be really easy for tar_quarto_files to detect which targets are loaded because fileInformation contains the code cells.

That part could be useful for just extracting the code chunks, but we already can use knits::knit(tangle = TRUE), and it is a unified approach that also works with R Markdown and knitr.

This way, the call to knitr_deps in tar_quarto_raw could be eliminated.

Regardless of where we get the text of the code chunks, knitr_deps() is still needed because it runs static code analysis to robustly detect tar_load()/tar_read() calls and the targets that they load/read. It is much more reliable to walk the AST than manipulate text.

Thank you for your feedback and the clarifications!

Sources can have tar_read()/tar_load(), but they don't have to, and they don't need to be distinguished from source files that don't have tar_read()/tar_load(). tarchetypes just needs to know which files to scan for tar_read()/tar_load() using static code analysis, and it needs to know the inputs to quarto_render().

Thanks for clarifying this. I did not understand the documentation in such a way. I changed the documentation to make this point clearer. Additionally, I reworked the code and the tests to reflect this. Things are much simpler now.

Regarding knitr_deps(): Sorry for not being clear here. My intention was not to completely remove the logic behind knitr_deps() but to not read the files once again as quarto::quarto_inspect has already done this. For example, one could create an AST from the code snippets provided by quarto inspect. However, in all my reports, report generation takes much longer than tarchetypes to decide whether to rerun the report or not. Thus, it probably doesn't matter and it is the easiest solution to keep everything as is.

Previously, all input files were regarded as source files. However, this might be wrong: In quarto, other files can be imported via `{< inlucde file.qmd >}`. Thus, these file don't have to contain code cells. In addition, included files that contained code cells were not added to the list of source files (even though the might contain `tar_read` statements).

and move code into its own function.

In the project case, the config file is the only missing file in `fileInformation`. Thus, we can simplify the code and only add the config file to the `source` vector. All other files are handled via the `fileInformation` loop.

mutlusun · 2024-11-08T13:49:15Z

After the discussion and clarifications above, I still think this MR has some benefits:

input files are now correctly identified: input does not contain code files any more but other files related to render the project (e.g., _quarto.yml and quarto extensions).
source files now contain all included report files.
test suite is extended

wlandau · 2024-11-10T15:19:58Z

input files are now correctly identified: input does not contain code files any more but other files related to render the project (e.g., _quarto.yml and quarto extensions).

source files now contain all included report files.

Taking a step back here, would you elaborate on the specifics? What is tar_quarto_files() currently doing wrong, and where are the right places to look in quarto_inspect()? Also, what version of quarto (CLI and R package) are you using? These may have changed recently.

wlandau · 2024-11-11T10:33:33Z

I tried out https://github.com/mutlusun/tarchetypes/tree/fix/quarto-identify-source-files, and it works great. Thanks for the contribution!

I made minor commits to touch things up, most notably I removed text3.qmd from one of the tests because Quarto was failing locally. I'm pretty sure this is a quarto issue. Using:

report
├── main.qmd
├── subdir
│   ├── b
│   │   └── text3.qmd
│   └── text2.qmd
└── text1.qmd

With report/main.qmd:

---
title: main
output_format: html
---

{{< include "text1.qmd" >}}

report/text1.qmd:

# First File

Some text here.

{{< include "subdir/text2.qmd" >}}

subdir/text2.qmd:

# Second File

Some text here.

{{< include "subdir/b/text3.qmd" >}}

and subdir/b/text3.qmd:

# Third File

Some text here.

quarto inspect report/main.qmd gives me:

ERROR: NotFound: No such file or directory (os error 2): readfile '/Users/CENSORED/Desktop/report/subdir/subdir/b/text3.qmd'
Path: /Users/CENSORED/Desktop/report/subdir/subdir/b/text3.qmd

Stack trace:
Path: /Users/CENSORED/Desktop/report/subdir/subdir/b/text3.qmd
    at readTextFileSync (ext:deno_fs/30_fs.js:864:10)
    at Object.Deno.readTextFileSync (file:///Applications/quarto/bin/quarto.js:5124:25)
    at mappedStringFromFile (file:///Applications/quarto/bin/quarto.js:16967:24)
    at Object.markdownForFile (file:///Applications/quarto/bin/quarto.js:40523:32)
    at mdForFile (file:///Applications/quarto/bin/quarto.js:39886:29)
    at inner (file:///Applications/quarto/bin/quarto.js:39925:67)
    at inner (file:///Applications/quarto/bin/quarto.js:39925:23)
    at async inner (file:///Applications/quarto/bin/quarto.js:39925:17)
    at async projectResolveCodeCellsForFile (file:///Applications/quarto/bin/quarto.js:39945:5)
    at async inspectConfig (file:///Applications/quarto/bin/quarto.js:108701:13)

To remove the double subdir in the path, I tried changing text2.qmd to:

# Second File

Some text here.

{{< include "b/text3.qmd" >}}

But then I got:

ERROR: Include directive failed.
  in file report/main.qmd, 
  in file text1.qmd, 
  in file subdir/text2.qmd, 
  could not find file /Users/CENSORED/Desktop/report/b/text3.qmd.

Stack trace:
  in file report/main.qmd, 
  in file text1.qmd, 
  in file subdir/text2.qmd, 
  could not find file /Users/CENSORED/Desktop/report/b/text3.qmd.
    at retrieveInclude (file:///Applications/quarto/bin/quarto.js:38818:19)
    at retrieveInclude (file:///Applications/quarto/bin/quarto.js:38840:23)
    at retrieveInclude (file:///Applications/quarto/bin/quarto.js:38840:23)
    at standaloneInclude (file:///Applications/quarto/bin/quarto.js:38854:11)
    at processMarkdownIncludes (file:///Applications/quarto/bin/quarto.js:39084:38)
    at expandIncludes (file:///Applications/quarto/bin/quarto.js:39113:11)
    at async projectResolveFullMarkdownForFile (file:///Applications/quarto/bin/quarto.js:39984:25)
    at async fileExecutionEngine (file:///Applications/quarto/bin/quarto.js:41500:26)
    at async inspectConfig (file:///Applications/quarto/bin/quarto.js:108671:24)
    at async Command.actionHandler (file:///Applications/quarto/bin/quarto.js:108740:20)

This is definitely an issue with Quarto 1.6.33.

wlandau · 2024-11-11T10:34:20Z

@mutlusun, feel free to post another PR to add yourself as a contributor (person(role = "ctb")) in the DESCRIPTION file.

mutlusun · 2024-11-11T11:51:56Z

Dear @wlandau , thank you for the kind words and merging this MR!

@mutlusun, feel free to post another PR to add yourself as a contributor (person(role = "ctb")) in the DESCRIPTION file.
Thank you, I'll do!

I ran into the same issue. I was at quarto 1.35 locally and did not receive any errors. After the CI failed, I switched to the latest stable version (1.57) and ran into the same errors. I was thinking about investigating these errors, but had no time so far. Probably, I will open an issue there.

mutlusun force-pushed the fix/quarto-identify-source-files branch 2 times, most recently from 765d2e9 to 0e784a9 Compare October 14, 2024 07:02

mutlusun force-pushed the fix/quarto-identify-source-files branch 2 times, most recently from 006b5e7 to bf0f86f Compare October 29, 2024 11:03

mutlusun force-pushed the fix/quarto-identify-source-files branch 2 times, most recently from 52b8398 to d6f1772 Compare October 30, 2024 20:40

wlandau reviewed Nov 4, 2024

View reviewed changes

mutlusun added 10 commits November 8, 2024 14:24

refactor(quarto-files): deduplicate code

f3491ce

and move code into its own function.

fix(quarto-files): simplify input file detection

e5fb3bf

In the project case, the config file is the only missing file in `fileInformation`. Thus, we can simplify the code and only add the config file to the `source` vector. All other files are handled via the `fileInformation` loop.

test(quarto-files): fix failing test on windows

bbf96f6

test(quarto): extend tests for quarto

2379445

docs(quarto): extend comments in quarto files

153e9ff

fix(quarto/files): try to identify duplicates in files

c750cc4

test(quarto): fix ci running on windows

0d0fe30

fix(quarto/files): remove native pipe usage

79de6eb

format(tests/quarto): fix lintr lints regarding line length

0dd9568

mutlusun force-pushed the fix/quarto-identify-source-files branch from 8309533 to 56fe256 Compare November 8, 2024 13:27

mutlusun added 3 commits November 8, 2024 14:42

fix(quarto/files): correctly detect source files

6108ec0

fix(quarto/files): detect input files correctly

154b86f

test(quarto/files): remove redundant test

f1fdf0a

mutlusun force-pushed the fix/quarto-identify-source-files branch from 56fe256 to f1fdf0a Compare November 8, 2024 13:46

wlandau-lilly and others added 4 commits November 11, 2024 05:03

versions

7f918f4

news, docs, and fixes

4d069e4

add Rd file

f923954

document function

95d4c9e

wlandau merged commit fd0f83e into ropensci:main Nov 11, 2024
7 of 9 checks passed

This was referenced Nov 11, 2024

chore(description): add myself as contributor #202

Merged

File in subdirectory not found quarto-dev/quarto-cli#11360

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/quarto identify source files #200

Fix/quarto identify source files #200

mutlusun commented Oct 12, 2024

mutlusun commented Oct 14, 2024

wlandau commented Oct 21, 2024

mutlusun commented Oct 29, 2024

mutlusun commented Oct 29, 2024 •

edited

Loading

wlandau commented Oct 30, 2024

mutlusun commented Oct 30, 2024

wlandau Nov 4, 2024

mutlusun Nov 5, 2024

mutlusun Nov 5, 2024

wlandau Nov 5, 2024 •

edited

Loading

mutlusun Nov 8, 2024

mutlusun commented Nov 8, 2024

wlandau commented Nov 10, 2024

wlandau commented Nov 11, 2024

wlandau commented Nov 11, 2024

mutlusun commented Nov 11, 2024

Fix/quarto identify source files #200

Fix/quarto identify source files #200

Conversation

mutlusun commented Oct 12, 2024

Prework

Related GitHub issues and pull requests

Summary

mutlusun commented Oct 14, 2024

wlandau commented Oct 21, 2024

mutlusun commented Oct 29, 2024

mutlusun commented Oct 29, 2024 • edited Loading

wlandau commented Oct 30, 2024

mutlusun commented Oct 30, 2024

wlandau Nov 4, 2024

Choose a reason for hiding this comment

mutlusun Nov 5, 2024

Choose a reason for hiding this comment

mutlusun Nov 5, 2024

Choose a reason for hiding this comment

wlandau Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

mutlusun Nov 8, 2024

Choose a reason for hiding this comment

mutlusun commented Nov 8, 2024

wlandau commented Nov 10, 2024

wlandau commented Nov 11, 2024

wlandau commented Nov 11, 2024

mutlusun commented Nov 11, 2024

mutlusun commented Oct 29, 2024 •

edited

Loading

wlandau Nov 5, 2024 •

edited

Loading