-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
Co-authored-by: Dan Roscigno <[email protected]>
- Loading branch information
1 parent
67a367d
commit 018957e
Showing
11 changed files
with
1,512 additions
and
884 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
COVER_IMAGE=./StarRocks.png | ||
COVER_TITLE="StarRocks 3.3" | ||
COPYRIGHT="Copyright (c) 2024 The Linux Foundation" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,6 @@ | ||
.venv | ||
tmp/** | ||
.env | ||
URLs.txt | ||
pdf/** | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,162 +1,170 @@ | ||
# Generate a PDF version of the docs | ||
|
||
# Generate PDFs from the StarRocks Docusaurus documentation site | ||
This was developed to run on a Mac system with an M2 chip. Please open an issue if you try this on another architecture and have problems. | ||
|
||
Node.js code to: | ||
1. Generate the ordered list of URLs from the documentation. This is done using code from `docusaurus-prince-pdf`. | ||
2. Convert each page to a PDF file with Gotenberg. | ||
3. Combine the individual PDF files using Ghostscript and `pdfcombine`. | ||
1. Generate the ordered list of URLs from documentation built with Docusaurus. This is done using code from [`docusaurus-prince-pdf`](https://github.com/signcl/docusaurus-prince-pdf) | ||
2. Open each page with [`puppeteer`](https://pptr.dev/) and save the content (without nav or the footer) as a PDF file | ||
3. Combine the individual PDF files using [pdftk-java](https://gitlab.com/pdftk-java/pdftk/-/blob/master/README.md?ref_type=heads) | ||
|
||
## Clone this repo | ||
## Onetime setup | ||
|
||
Clone this repo to your machine. | ||
### Clone this repo | ||
|
||
## Choose the branch that you want a PDF for | ||
Clone this repo to your machine. | ||
|
||
When you launch the PDF conversion environment, it will use the active branch. So, if you want a PDF for version 3.3: | ||
### Node.js | ||
|
||
```bash | ||
git switch branch-3.3 | ||
``` | ||
This is tested with Node.js version 21. | ||
|
||
## Launch the conversion environment | ||
Use Node.js version 21. You can install Node.js using the instructions at [nodejs.org](https://nodejs.org/en/download). | ||
|
||
The conversion process uses Docker Compose. Launch the environment by running the following command from the `starrocks/docs/docusaurus/PDF/` directory. | ||
### Puppeteer | ||
|
||
The `--wait-timeout 400` will give the services 400 seconds to get to a healthy state. This is to allow both Docusaurus and Gotenberg to become ready to handle requests. On my machine it takes about 200 seconds for Docusaurus to build the docs and start serving them. | ||
Add `puppeteer` and other dependencies by running this command in the repo directory `starrocks/docs/docusaurus/PDF/`. | ||
|
||
```bash | ||
cd starrocks/docs/docusaurus/PDF | ||
docker compose up --detach --wait --wait-timeout 400 --build | ||
yarn install | ||
``` | ||
|
||
> Tip | ||
> | ||
> All of the `docker compose` commands must be run from the `starrocks/docs/docusaurus/PDF/` directory. | ||
### pdftk-java | ||
|
||
## Check the status | ||
|
||
> Tip | ||
> | ||
> If you do not have `jq` installed just run `docker compose ps`. The ouput using `jq` is easier to read, but you can get by with the more basic command. | ||
`pdftk-java` should be installed using Homebrew on a macOS system | ||
|
||
```bash | ||
docker compose ps --format json | jq '{Service: .Service, State: .State, Status: .Status}' | ||
brew install pdftk-java | ||
``` | ||
|
||
Expected output: | ||
## Use | ||
|
||
### Configuration | ||
|
||
There is a sample `.env` file, `.env.sample`, that you can copy to `.env`. This file specifies an image, title to place on the cover, and a Copyright notice. Here is the sample: | ||
|
||
```bash | ||
{ | ||
"Service": "docusaurus", | ||
"State": "running", | ||
"Status": "Up 14 minutes" | ||
} | ||
{ | ||
"Service": "gotenberg", | ||
"State": "running", | ||
"Status": "Up 2 hours (healthy)" | ||
} | ||
COVER_IMAGE=./StarRocks.png | ||
COVER_TITLE="StarRocks 3.3" | ||
COPYRIGHT="Copyright (c) 2024 The Linux Foundation" | ||
``` | ||
|
||
## Get the URL of the "home" page | ||
- Copy `.env.sample` to `.env` | ||
- Edit the file as needed | ||
|
||
### Check to see if Docusaurus is serving the pages | ||
> Note: | ||
> | ||
> For the `COVER_IMAGE` Use a PNG or JPEG. | ||
From the `PDF` directory check the logs of the `docusaurus` service: | ||
### Build your Docusaurus site and serve it | ||
|
||
```bash | ||
docker compose logs -f docusaurus | ||
``` | ||
It seems to be necessary to run `yarn serve` rather than ~`yarn start`~ to have `docusaurus-prince-pdf` crawl the pages. I expect that there is a CSS class difference between development and production modes of Docusaurus. | ||
|
||
When Docusaurus is ready you will see this line at the end of the log output: | ||
If you are using the Docker scripts from [StarRocks](https://github.com/StarRocks/starrocks/tree/main/docs/docusaurus/scripts) then open another shell and: | ||
|
||
```bash | ||
docusaurus-1 | [SUCCESS] Serving "build" directory at: http://0.0.0.0:3000/ | ||
cd starrocks/docs/docusaurus | ||
./scripts/docker-image.sh && ./scripts/docker-build.sh | ||
``` | ||
|
||
Stop watching the logs with CTRL-c | ||
### Get the URL of the "home" page | ||
|
||
### Find the initial URL | ||
Find the URL of the first page to crawl. It needs to be the landing, or home page of the site as the next step will generate a set of PDF files, one for each page of your site by extracting the landing page and looking for the "Next" button at the bottom right corner of each Docusaurus page. If you start from any page other than the first one, then you will only get a portion of the pages. For Chinese language StarRocks documentation served using the `./scripts/docker-build.sh` script this will be: | ||
|
||
First open the docs by launching a browser to the URL at the end of the log output, which should be [http://0.0.0.0:3000/](http://0.0.0.0:3000/). | ||
|
||
Next, change to the Chinese documentation if you are generating a PDF document of the Chinese documentation. | ||
|
||
Copy the URL of the starting page of the documentation that you would like to generate a PDF for. | ||
```bash | ||
http://localhost:3000/zh/docs/introduction/StarRocks_intro/ | ||
``` | ||
|
||
Save the URL. | ||
### Generate a list of pages (URLs) | ||
|
||
## Open a shell in the PDF build environment | ||
This command will crawl the docs and list the URLs in order: | ||
|
||
Launch a shell from the `starrocks/docs/docusaurus/PDF` directory: | ||
> Tip | ||
> | ||
> The rest of the commands should be run from this directory: | ||
> | ||
> ```bash | ||
> starrocks/docs/docusaurus/PDF/ | ||
> ``` | ||
> | ||
> Substitute the URL you just copied for the URL below: | ||
```bash | ||
docker compose exec -ti docusaurus bash | ||
npx docusaurus-prince-pdf --list-only \ | ||
--file URLs.txt \ | ||
-u http://localhost:3000/zh/docs/introduction/StarRocks_intro/ | ||
``` | ||
and `cd` into the `PDF` directory: | ||
<details> | ||
<summary>Expand to see URLs.txt sample</summary> | ||
|
||
This is the file format, using the StarRocks developer docs as an example: | ||
```bash | ||
cd /app/docusaurus/PDF | ||
http://localhost:3000/zh/docs/developers/build-starrocks/Build_in_docker/ | ||
http://localhost:3000/zh/docs/developers/build-starrocks/build_starrocks_on_ubuntu/ | ||
http://localhost:3000/zh/docs/developers/build-starrocks/handbook/ | ||
http://localhost:3000/zh/docs/developers/code-style-guides/protobuf-guides/ | ||
http://localhost:3000/zh/docs/developers/code-style-guides/restful-api-standard/ | ||
http://localhost:3000/zh/docs/developers/code-style-guides/thrift-guides/ | ||
http://localhost:3000/zh/docs/developers/debuginfo/ | ||
http://localhost:3000/zh/docs/developers/development-environment/IDEA/ | ||
http://localhost:3000/zh/docs/developers/development-environment/ide-setup/ | ||
http://localhost:3000/zh/docs/developers/trace-tools/Trace/% | ||
``` | ||
|
||
## Crawl the docs and generate the PDFs | ||
</details> | ||
|
||
Run the command: | ||
|
||
> Tip | ||
> | ||
> The URL in the code sample is for the Chinese documentation, remove the `/zh/` if you want English. | ||
### Generate PDF files for each Docusaurus page | ||
|
||
This reads the `URLs.txt` generated above and: | ||
1. Creates a cover page | ||
2. creates PDF files for each URL in the file | ||
|
||
```bash | ||
node generatePdf.js http://0.0.0.0:3000/zh/docs/introduction/StarRocks_intro/ | ||
node docusaurus-puppeteer-pdf.js | ||
``` | ||
|
||
## Join the individual PDF files | ||
### Combine the individual PDFs | ||
|
||
> Note: | ||
> | ||
> Change the name of the PDF output file as needed, in the example this is `StarRocks_33` | ||
The previous step generated a PDF file for each Docusaurus page, combine the individual pages with `pdftk-java`: | ||
|
||
```bash | ||
cd ../../PDFoutput/ | ||
pdftk 00*pdf output StarRocks_33.pdf | ||
pdftk 0*pdf output docs.pdf | ||
``` | ||
|
||
## Finished file | ||
### Cleanup | ||
|
||
The individual PDF files and the combined file will be on your local machine in `starrocks/docs/PDFoutput/` | ||
There are now 900+ temporary PDF files in the directory, remove them with: | ||
|
||
## Customizing the docs site for PDF | ||
```bash | ||
./clean | ||
``` | ||
|
||
Gotenberg generates the PDF files without the side navigation, header, and footer as these components are not displayed when the `media` is set to `print`. In our docs it does not make sense to have the breadcrumbs, edit URLs, or Feedback widget show. These are filtered out using CSS by adding `display: none` to the classes of these objects when `@media print`. | ||
## Customizing the docs site for PDF | ||
|
||
Removing the Feedback form from the PDF can be done with CSS. This snippet is added to the Docusaurus CSS file `src/css/custom.css`: | ||
Some things do not make sense to have in the PDF, like the Feedback form at the bottom of the page. Removing the Feedback form from the PDF can be done with CSS. This snippet is added to the Docusaurus CSS file `docs/docusaurus/src/css/custom.css`: | ||
|
||
```css | ||
/* When we generate PDF files we do not need to show the: | ||
- edit URL | ||
- Feedback widget | ||
- breadcrumbs | ||
*/ | ||
/* When we generate PDF files: | ||
- avoid breaks in the middle of: | ||
- code blocks | ||
- admonitions (notes, tips, etc.) | ||
- we do not need to show the: | ||
- feedback widget. | ||
- edit this page | ||
- breadcrumbs | ||
*/ | ||
@media print { | ||
.feedback_Ak7m { | ||
display: none; | ||
} | ||
|
||
.theme-doc-footer-edit-meta-row { | ||
display: none; | ||
}; | ||
.theme-code-block , .theme-admonition { | ||
break-inside: avoid; | ||
} | ||
} | ||
|
||
.breadcrumbs { | ||
@media print { | ||
.theme-edit-this-page , .feedback_Ak7m , .theme-doc-breadcrumbs { | ||
display: none; | ||
}; | ||
} | ||
} | ||
``` | ||
|
||
## Links | ||
|
||
- [`docusaurus-prince-pdf`](https://github.com/signcl/docusaurus-prince-pdf) | ||
- [`Gotenberg`](https://pptr.dev/) | ||
- [`pdftk`](https://gitlab.com/pdftk-java/pdftk) | ||
- [Ghostscript](https://www.ghostscript.com/) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.