Skip to content

Commit

Permalink
feat(fs-bq-import-collection): add transformFunction option (#2251)
Browse files Browse the repository at this point in the history
  • Loading branch information
cabljac authored Jan 31, 2025
1 parent f8f496e commit 79d756a
Show file tree
Hide file tree
Showing 7 changed files with 67 additions and 8 deletions.
32 changes: 32 additions & 0 deletions firestore-bigquery-export/guides/IMPORT_EXISTING_DOCUMENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,3 +139,35 @@ This helps you quickly identify problematic documents and take action accordingl
To retry the failed imports, you can use the output file to manually inspect or reprocess the documents. For example, you could create a script that reads the failed paths and reattempts the import.
> **Note:** If the specified file already exists, it will be **cleared** before writing new failed batch paths.
### Using a Transform Function
You can optionally provide a transform function URL (`--transform-function-url` or `-f`) that will transform document data before it's written to BigQuery. The transform function should should recieve document data and return transformed data. The payload will contain the following:
```
{
data: [{
insertId: int;
json: {
timestamp: int;
event_id: int;
document_name: string;
document_id: int;
operation: ChangeType;
data: string;
},
}]
}
```
The response should be identical in structure.
Example usage of the script with transform function option:
```shell
npx @firebaseextensions/fs-bq-import-collection --non-interactive \
-P <PROJECT_ID> \
-s <COLLECTION_PATH> \
-d <DATASET_ID> \
-f https://us-west1-my-project.cloudfunctions.net/transformFunction
```
12 changes: 4 additions & 8 deletions firestore-bigquery-export/scripts/import/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

24 changes: 24 additions & 0 deletions firestore-bigquery-export/scripts/import/src/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,21 @@ const questions = [
type: "confirm",
default: false,
},
{
message: "What's the URL of your transform function? (Optional)",
name: "transformFunctionUrl",
type: "input",
default: "",
validate: (value) => {
if (!value) return true;
try {
new URL(value);
return true;
} catch {
return "Please enter a valid URL or leave empty";
}
},
},
{
message: "Would you like to use a local firestore emulator?",
name: "useEmulator",
Expand Down Expand Up @@ -213,6 +228,15 @@ export async function parseConfig(): Promise<CliConfig | CliConfigError> {
if (program.datasetLocation === undefined) {
errors.push("DatasetLocation is not specified.");
}

if (program.transformFunctionUrl) {
try {
new URL(program.transformFunctionUrl);
} catch {
errors.push("Transform function URL is invalid");
}
}

if (!validateBatchSize(program.batchSize)) {
errors.push("Invalid batch size.");
}
Expand Down
1 change: 1 addition & 0 deletions firestore-bigquery-export/scripts/import/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ const run = async (): Promise<number> => {
wildcardIds: queryCollectionGroup,
useNewSnapshotQuerySyntax,
bqProjectId: bigQueryProjectId,
transformFunction: config.transformFunctionUrl,
});

await initializeDataSink(dataSink, config);
Expand Down
4 changes: 4 additions & 0 deletions firestore-bigquery-export/scripts/import/src/program.ts
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,10 @@ export const getCLIOptions = () => {
"-u, --use-new-snapshot-query-syntax [true|false]",
"Whether to use updated latest snapshot query"
)
.option(
"-f, --transform-function-url <transform-function-url>",
"URL of function to transform data before export (e.g., https://us-west1-project.cloudfunctions.net/transform)"
)
.option(
"-e, --use-emulator [true|false]",
"Whether to use the firestore emulator"
Expand Down
1 change: 1 addition & 0 deletions firestore-bigquery-export/scripts/import/src/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ export interface CliConfig {
rawChangeLogName: string;
cursorPositionFile: string;
failedBatchOutput?: string;
transformFunctionUrl?: string;
}

export interface CliConfigError {
Expand Down
1 change: 1 addition & 0 deletions firestore-bigquery-export/scripts/import/src/worker.ts
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ async function processDocuments(
wildcardIds: true,
skipInit: true,
useNewSnapshotQuerySyntax: config.useNewSnapshotQuerySyntax,
transformFunction: config.transformFunctionUrl,
});

// Process documents in batches until we've covered the entire partition
Expand Down

0 comments on commit 79d756a

Please sign in to comment.