Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

layout_summary() layout order deviates from layout order in PPTX file #596

Closed
markheckmann opened this issue Aug 31, 2024 · 3 comments · Fixed by #598
Closed

layout_summary() layout order deviates from layout order in PPTX file #596

markheckmann opened this issue Aug 31, 2024 · 3 comments · Fixed by #598

Comments

@markheckmann
Copy link
Contributor

The order of the layouts in layout_summary() may deviate from the layout order in the PPTX master.

As this may irritate the user (in this case me :D), I suggest to return a layout order that matches the one in the PPTX file.

Example

File: many_layouts.pptx => This is the internal template.pptx with 4 new layouts (layout_8 to layout_11) added at the end.

url <- "https://github.com/user-attachments/files/16825504/many_layouts.pptx"
file <- tempfile(fileext = ".pptx")
download.file(url, file)
x <- read_pptx(file)
x$slideLayouts$get_metadata()

NB: I used x$slideLayouts$get_metadata() as layout_summary() calls this function internally.

        master_file              name          filename master_name
1  slideMaster1.xml       Title Slide  slideLayout1.xml    Master_1
2  slideMaster1.xml         layout_10 slideLayout10.xml    Master_1
3  slideMaster1.xml         layout_11 slideLayout11.xml    Master_1
4  slideMaster1.xml Title and Content  slideLayout2.xml    Master_1
5  slideMaster1.xml    Section Header  slideLayout3.xml    Master_1
6  slideMaster1.xml       Two Content  slideLayout4.xml    Master_1
7  slideMaster1.xml        Comparison  slideLayout5.xml    Master_1
8  slideMaster1.xml        Title Only  slideLayout6.xml    Master_1
9  slideMaster1.xml             Blank  slideLayout7.xml    Master_1
10 slideMaster1.xml          layout_8  slideLayout8.xml    Master_1
11 slideMaster1.xml          layout_9  slideLayout9.xml    Master_1

layout_10 and layout_11 should appear at the end, like in the PPTX file. However, the ordering in x$slideLayouts$get_metadata() appears to be lexicographical over filename. Hence, 10 and 11 come before 2, yielding a layout order that deviates from the PPTX file.

Analysis

The reason for this lies in the way how the file names are read during initialization in class dir_collection (see line 10 in file ppt_class_dir_collection.R.

...
filenames <- list.files(path = dir_, pattern = "\\.xml$", full.names = TRUE)
...

As per the docs, list.files will apply an alphabetic sort here:

The files are sorted in alphabetical order, on the full path if full.names = TRUE.

markheckmann added a commit to markheckmann/officer that referenced this issue Aug 31, 2024
In class `dir_collection`, files are added to a container during
initialization using alphabetical sorting. This caused the slide
layout order to deviate from the one in the PPTX file, for example,
when calling layout_summary(). Files are now added to a container
in the order of their trailing numeric index. For example,
`slideLayout2.xml` will now preceed `slideLayout10.xml`. Before,
alphabetical sorting was used, where `slideLayout10.xml` comes before `slideLayout2.xml`.
@markheckmann
Copy link
Contributor Author

@davidgohel I changed the order in which files are added to the container in this branch in this file.

The layout order comes out as expected now:

url <- "https://github.com/user-attachments/files/16825504/many_layouts.pptx"
file <- tempfile(fileext = ".pptx")
download.file(url, file)
x <- read_pptx(file)
layout_summary(x) 
              layout   master
1        Title Slide Master_1
2  Title and Content Master_1
3     Section Header Master_1
4        Two Content Master_1
5         Comparison Master_1
6         Title Only Master_1
7              Blank Master_1
8           layout_8 Master_1
9           layout_9 Master_1
10         layout_10 Master_1
11         layout_11 Master_1

All tests are still passed. However, this is a change in a central class. So I would like to make sure, that there are no unwanted side effects.

An alternative to this approach would be to do the reordering directly in get_metadata() and other places where this may be relevant.

@davidgohel What do you think?

@davidgohel
Copy link
Owner

@markheckmann I think you are focused and accurate on these aspects.

For professional reasons, I don't have the ability to focus on the subject these days (things should be better after September/October). So if you feel able to PR, again it will be welcome.

If something break, I think the coverage test will show it (and if not, well I'll take time to fix what need to be fixed)

@markheckmann
Copy link
Contributor Author

okay, great. PR is ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants