Skip to content

Commit

Permalink
feat: support of localized images
Browse files Browse the repository at this point in the history
setable log levels
more defaults that "just work"
  • Loading branch information
hatton committed Jul 29, 2022
1 parent b4a0a39 commit b88a02a
Show file tree
Hide file tree
Showing 14 changed files with 7,760 additions and 7,513 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
dist/
sample/
sample_img/
i18n/
node_modules/
version.json
28 changes: 23 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,21 +38,31 @@ means that the id is "0456aa5842946PRETEND4f37c97a0e5".
Determine where you want the markdown files and images to land. The following works well for Docusaurus instances:

```
npx notion-pull-mdx -n secret_PRETEND123456789PRETEND123456789PRETEND6789 -r 0456aa5842946PRETEND4f37c97a0e5 -m "./docs" -i "./images"
npx notion-pull-mdx -n secret_PRETEND123456789PRETEND123456789PRETEND6789 -r 0456aa5842946PRETEND4f37c97a0e5"
```

Likely, you will want to store these codes in your environment variables and then use them like this:

```
(windows)
npx notion-pull-mdx -n %MY_NOTION_TOKEN% -r %MY_NOTION_DOCS_ROOT_PAGE_ID% -m "./docs" -i "./static/notion_images" -p "/notion_images/"
npx notion-pull-mdx -n %MY_NOTION_TOKEN% -r %MY_NOTION_DOCS_ROOT_PAGE_ID%
```

```
(linux / mac)
npx notion-pull-mdx -n $MY_NOTION_TOKEN -r $MY_NOTION_DOCS_ROOT_PAGE_ID -m "./docs" -i "./static/notion_images" -p "/notion_images/"
npx notion-pull-mdx -n $MY_NOTION_TOKEN -r $MY_NOTION_DOCS_ROOT_PAGE_ID
```

NOTE: In the above, we are using `npx` to use the latest `notion-pull-mdx`. A more conservative approach would be to `npm i cross-var notion-pull-mdx` and then create a script in your package.json like this:

```
"scripts": {
"pull": "cross-var notion-pull-mdx -n %NOTION_PULL_INTEGRATION_TOKEN% -r %NOTION_PULL_ROOT_PAGE%"
}
```

and then run that with `npm run pull`.

## 7. Commit

Most projects should probably commit the current markdown and image files each time you run notion-pull-mdx.
Expand All @@ -77,11 +87,19 @@ Links from one document to another in Notion are not yet converted to local link

notion-pull-mdx makes some attempt to keep the right order of things, but there are definitely cases where it isn't smart enough yet.

# Localization
# Text Localization

Localize your files in Crowdin (or whatever) based on the markdown files, not in Notion. For how to do this with Docusaurus, see [Docusaurus i18n](https://docusaurus.io/docs/i18n/crowdin).

You may also need to localize screenshots. Crowdin can also handle localizing assets, but this library currently supports a different approach. If you place for example `fr https:\\imgur.com\1234.png` in the caption of a screenshot in Notion, `notion-pull-mdx` will fetch that image and save it locally with the same name as the primary screenshot, but with "-fr" appended. So you'd get for example `static\img\9876.png` and `static\img\9876-fr.png`. To get the French version to show, you'd need to add that "-fr" to the markdown link when you localize the page's text in crowdin. If there is a way, maybe this modification of the markdown can be made automatic in the future so that you automatically get the right image version.
# Screenshot Localization

The only way we know of to provide localization of image in the current Docusaurus (2.0) is to place the images in the same directory as the markdown, and use relative paths for images. Most projects probably won't localize _every_ image, so we also need a way to "fall back" to the original screenshot when the localized one is missing. `notion-pull-mdx` facilitates this. If no localized version of an image is available, `notion-pull-mdx` places a copy of the original image into the correct location.

So how do you provide these localized screenshot files? Crowdin can handle localizing assets, and in the future we may support that. For now, we currently support a different approach. If you place for example `fr https:\\imgur.com\1234.png` in the caption of a screenshot in Notion, `notion-pull-mdx` will fetch that image and save it in the right place to be found when in French mode. Getting URLs to screenshots is easy with screenshot utilities such as [Greenshot](https://getgreenshot.org/) that support uploading to imgur. Note that `notion-pull-mdx` stores a copy of all images in your source tree, so you wouldn't lose the images if imgur were to go away.

NOTE: that as far as I can tell, when you run `docusaurus start` docusaurus 2.0 offers the language picker but it doesn't actually work. So to test out the localized version, do `docusaurus build` followed by `docusaurus serve`.

NOTE: if you just localize an image, it will not get picked up. You also must localize the page that uses the image. Otherwise, Docusaurus will use the English document and when that asks for `./the-image-path`, it will find the image there in the English section, not your other language section.

# Automated builds with Github Actions

Expand Down
6 changes: 4 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,10 @@
"semantic-release": "semantic-release",
"typecheck": "tsc --noEmit",
"notion-download": "node dist/index.js",
"test": "ts-node --compiler-options \"{\\\"module\\\": \\\"commonjs\\\"}\" src/index.ts",
"cmdhelp": "ts-node --compiler-options \"{\\\"module\\\": \\\"commonjs\\\"}\" src/index.ts",
"// test out with my sample notion db": "",
"sample": "cross-var ts-node --compiler-options \"{\\\"module\\\": \\\"commonjs\\\"}\" src/index.ts -n %NOTION_PULL_INTEGRATION_TOKEN% -r %NOTION_PULL_ROOT_PAGE% -m ./sample -i ./sample_img/inner -p /inner/"
"sample": "cross-var ts-node --compiler-options \"{\\\"module\\\": \\\"commonjs\\\"}\" src/index.ts -n %NOTION_PULL_INTEGRATION_TOKEN% -r %NOTION_PULL_ROOT_PAGE% -m ./sample --log-level verbose",
"sample-with-paths": "cross-var ts-node --compiler-options \"{\\\"module\\\": \\\"commonjs\\\"}\" src/index.ts -n %NOTION_PULL_INTEGRATION_TOKEN% -r %NOTION_PULL_ROOT_PAGE% -m ./sample --img-output-path ./sample_img"
},
"repository": {
"type": "git",
Expand Down Expand Up @@ -51,6 +52,7 @@
"limiter": "^2.1.0",
"node-fetch": "2.6.6",
"notion-to-md": "^2.5.2",
"path": "^0.12.7",
"postinstall-postinstall": "^2.1.0",
"sanitize-filename": "^1.6.3"
},
Expand Down
7 changes: 5 additions & 2 deletions src/DocusaurusTweaks.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import chalk from "chalk";
import { logDebug } from "./log";

export function tweakForDocusaurus(input: string): {
body: string;
Expand Down Expand Up @@ -90,7 +90,10 @@ function notionEmbedsToMDX(input: string): {
while ((match = v.regex.exec(input)) !== null) {
const string = match[0];
const url = match[1];
console.log(chalk.green(`${string} --> ${v.output.replace("$1", url)}`));
logDebug(
"DocusaurusTweaks",
`${string} --> ${v.output.replace("$1", url)}`
);
body = body.replace(string, v.output.replace("$1", url));
imports.add(v.import);
}
Expand Down
3 changes: 2 additions & 1 deletion src/LayoutStrategy.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import * as fs from "fs-extra";
import { verbose } from "./log";
import { NotionPage } from "./NotionPage";

// Here a fuller name would be File Tree Layout Strategy. That is,
Expand All @@ -16,7 +17,7 @@ export abstract class LayoutStrategy {
public async cleanupOldFiles(): Promise<void> {
// Remove any pre-existing files that aren't around anymore; this indicates that they were removed or renamed in Notion.
for (const p of this.existingPagesNotSeenYetInPull) {
console.log(`Removing old doc: ${p}`);
verbose(`Removing old doc: ${p}`);
await fs.rm(p);
}
}
Expand Down
56 changes: 56 additions & 0 deletions src/MakeImagePersistencePlan.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
import { ImageSet } from "./NotionImage";
import * as Path from "path";
import { error } from "./log";

export function makeImagePersistencePlan(
imageSet: ImageSet,
imageOutputRootPath: string,
imagePrefix: string
): void {
if (imageSet.fileType?.ext) {
// Since most images come from pasting screenshots, there isn't normally a filename. That's fine, we just make a hash of the url
// Images that are stored by notion come to us with a complex url that changes over time, so we pick out the UUID that doesn't change. Example:
// https://s3.us-west-2.amazonaws.com/secure.notion-static.com/d1058f46-4d2f-4292-8388-4ad393383439/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIAT73L2G45EIPT3X45%2F20220516%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20220516T233630Z&X-Amz-Expires=3600&X-Amz-Signature=f215704094fcc884d37073b0b108cf6d1c9da9b7d57a898da38bc30c30b4c4b5&X-Amz-SignedHeaders=host&x-id=GetObject

let thingToHash = imageSet.primaryUrl;
const m = /.*secure\.notion-static\.com\/(.*)\//gm.exec(
imageSet.primaryUrl
);
if (m && m.length > 1) {
thingToHash = m[1];
}

const hash = hashOfString(thingToHash);
imageSet.outputFileName = `${hash}.${imageSet.fileType.ext}`;

imageSet.primaryFileOutputPath = Path.posix.join(
imageOutputRootPath?.length > 0
? imageOutputRootPath
: imageSet.pathToParentDocument!,
imageSet.outputFileName
);

if (imageOutputRootPath && imageSet.localizedUrls.length) {
error(
"imageOutputPath was declared, but one or more localizedUrls were found too. If you are going to localize screenshots, then you can't declare an imageOutputPath."
);
}

imageSet.filePathToUseInMarkdown =
(imagePrefix?.length > 0 ? imagePrefix : ".") +
"/" +
imageSet.outputFileName;
} else {
error(
`Something wrong with the filetype extension on the blob we got from ${imageSet.primaryUrl}`
);
}
}

function hashOfString(s: string) {
let hash = 0;
for (let i = 0; i < s.length; ++i)
hash = Math.imul(31, hash) + s.charCodeAt(i);

return Math.abs(hash);
}
File renamed without changes.
152 changes: 77 additions & 75 deletions src/NotionImage.ts
Original file line number Diff line number Diff line change
@@ -1,10 +1,34 @@
import * as fs from "fs-extra";
import FileType from "file-type";
import FileType, { FileTypeResult } from "file-type";
import fetch from "node-fetch";
import * as Path from "path";
import { makeImagePersistencePlan } from "./MakeImagePersistencePlan";
import { logDebug, verbose, info } from "./log";

let existingImagesNotSeenYetInPull: string[] = [];
let imageOutputPath = "not set yet";
let imagePrefix = "not set yet";
let imageOutputPath = ""; // default to putting in the same directory as the document referring to it.
let imagePrefix = ""; // default to "./"

// we parse a notion image and its caption into what we need, which includes any urls to localized versions of the image that may be embedded in the caption
export type ImageSet = {
// We get these from parseImageBlock():
primaryUrl: string;
caption?: string;
localizedUrls: Array<{ iso632Code: string; url: string }>;

// then we fill this in from processImageBlock():
pathToParentDocument?: string;
relativePathToParentDocument?: string;

// then we fill these in readPrimaryImage():
primaryBuffer?: Buffer;
fileType?: FileTypeResult;

// then we fill these in from makeImagePersistencePlan():
primaryFileOutputPath?: string;
outputFileName?: string;
filePathToUseInMarkdown?: string;
};

export async function initImageHandling(
prefix: string,
Expand All @@ -19,85 +43,54 @@ export async function initImageHandling(
// changes, it gets a new id. This way can then prevent downloading
// and image after the 1st time. The downside is currently we don't
// have the smarts to remove unused images.
await fs.mkdir(imageOutputPath, { recursive: true });
if (imageOutputPath) {
await fs.mkdir(imageOutputPath, { recursive: true });
}
}

async function saveImage(
imageSet: ImageSet,
imageFolderPath: string
): Promise<string> {
async function readPrimaryImage(imageSet: ImageSet) {
const response = await fetch(imageSet.primaryUrl);
const arrayBuffer = await response.arrayBuffer();
const buffer = Buffer.from(arrayBuffer);
const fileType = await FileType.fromBuffer(buffer);
if (fileType?.ext) {
// Since most images come from pasting screenshots, there isn't normally a filename. That's fine, we just make a hash of the url
// Images that are stored by notion come to us with a complex url that changes over time, so we pick out the UUID that doesn't change. Example:
// https://s3.us-west-2.amazonaws.com/secure.notion-static.com/d1058f46-4d2f-4292-8388-4ad393383439/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIAT73L2G45EIPT3X45%2F20220516%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20220516T233630Z&X-Amz-Expires=3600&X-Amz-Signature=f215704094fcc884d37073b0b108cf6d1c9da9b7d57a898da38bc30c30b4c4b5&X-Amz-SignedHeaders=host&x-id=GetObject

let thingToHash = imageSet.primaryUrl;
const m = /.*secure\.notion-static\.com\/(.*)\//gm.exec(
imageSet.primaryUrl
);
if (m && m.length > 1) {
thingToHash = m[1];
}
imageSet.primaryBuffer = Buffer.from(arrayBuffer);
imageSet.fileType = await FileType.fromBuffer(imageSet.primaryBuffer);
}

const hash = hashOfString(thingToHash);
const outputFileName = `${hash}.${fileType.ext}`;
const primaryFilePath = writeImageIfNew(
imageFolderPath,
outputFileName,
buffer
);

// if there are localized images, save them too, using the same
// name as the primary but with their language code attached
for (const localizedImage of imageSet.localizedUrls) {
const outputFileName = `${hash}-${localizedImage.iso632Code}.${fileType.ext}`;
console.log("Saving localized image to " + outputFileName);
const response = await fetch(localizedImage.url);
const arrayBuffer = await response.arrayBuffer();
const buffer = Buffer.from(arrayBuffer);
writeImageIfNew(imageFolderPath, outputFileName, buffer);
async function saveImage(imageSet: ImageSet): Promise<void> {
writeImageIfNew(imageSet.primaryFileOutputPath!, imageSet.primaryBuffer!);

let foundLocalizedImage = false;

// if there are localized images, save them too, using the same
// name as the primary but with their language code attached
for (const localizedImage of imageSet.localizedUrls) {
verbose(`Retrieving ${localizedImage.iso632Code} version...`);
const response = await fetch(localizedImage.url);
const arrayBuffer = await response.arrayBuffer();
const buffer = Buffer.from(arrayBuffer);
const directory = `./i18n/${
localizedImage.iso632Code
}/docusaurus-plugin-content-docs/current/${imageSet.relativePathToParentDocument!}`;
if (!foundLocalizedImage) {
foundLocalizedImage = true;
info(
"*** found at least one localized image, so /i18n directory will be created and filled with localized image files."
);
}

return primaryFilePath;
} else {
console.error(
`Something wrong with the filetype extension on the blob we got from ${imageSet.primaryUrl}`
);
return "error";
writeImageIfNew(directory + "/" + imageSet.outputFileName!, buffer);
}
}
function writeImageIfNew(
imageFolderPath: string,
outputFileName: string,
buffer: Buffer
) {
const path = imageFolderPath + "/" + outputFileName;

function writeImageIfNew(path: string, buffer: Buffer) {
imageWasSeen(path);
if (!fs.pathExistsSync(path)) {
console.log("Adding image " + path);
verbose("Adding image " + path);
fs.mkdirsSync(Path.dirname(path));
fs.createWriteStream(path).write(buffer); // async but we're not waiting
} else {
verbose(`image already filled: ${path}`);
}
return outputFileName;
}

function hashOfString(s: string) {
let hash = 0;
for (let i = 0; i < s.length; ++i)
hash = Math.imul(31, hash) + s.charCodeAt(i);

return Math.abs(hash);
}

// we parse a notion image and its caption into what we need, which includes any urls to localized versions of the image that may be embedded in the caption
type ImageSet = {
primaryUrl: string;
caption?: string;
localizedUrls: Array<{ iso632Code: string; url: string }>;
};
export function parseImageBlock(b: any): ImageSet {
const imageSet: ImageSet = {
primaryUrl: "",
Expand Down Expand Up @@ -142,18 +135,27 @@ export function parseImageBlock(b: any): ImageSet {

// Download the image if we don't have it, give it a good name, and
// change the src to point to our copy of the image.
export async function processImageBlock(b: any): Promise<void> {
//console.log(JSON.stringify(b));
export async function processImageBlock(
b: any,
pathToParentDocument: string,
relativePathToThisPage: string
): Promise<void> {
logDebug("processImageBlock", JSON.stringify(b));

// this is broken into all these steps to facilitate unit testing without IO
const imageSet = parseImageBlock(b);
imageSet.pathToParentDocument = pathToParentDocument;
imageSet.relativePathToParentDocument = relativePathToThisPage;

const newPath =
imagePrefix + "/" + (await saveImage(imageSet, imageOutputPath));
await readPrimaryImage(imageSet);
makeImagePersistencePlan(imageSet, imageOutputPath, imagePrefix);
await saveImage(imageSet);

// change the src to point to our copy of the image
if ("file" in b.image) {
b.image.file.url = newPath;
b.image.file.url = imageSet.filePathToUseInMarkdown;
} else {
b.image.external.url = newPath;
b.image.external.url = imageSet.filePathToUseInMarkdown;
}
// put back the simplified caption, stripped of the meta information
if (imageSet.caption) {
Expand All @@ -177,7 +179,7 @@ function imageWasSeen(path: string) {

export async function cleanupOldImages(): Promise<void> {
for (const p of existingImagesNotSeenYetInPull) {
console.log(`Removing old image: ${p}`);
verbose(`Removing old image: ${p}`);
await fs.rm(p);
}
}
Loading

0 comments on commit b88a02a

Please sign in to comment.