Skip to content

Commit

Permalink
feat: process StateMarketDeals in Rust
Browse files Browse the repository at this point in the history
Speed up the initial JSON parsing from ~1h to <3m.

Signed-off-by: Miroslav Bajtoš <[email protected]>
  • Loading branch information
bajtos committed Dec 6, 2023
1 parent 42ab058 commit 105859e
Show file tree
Hide file tree
Showing 5 changed files with 415 additions and 2 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -134,3 +134,8 @@ generated/ldn-deals.ndjson
generated/retrieval-tasks.ndjson
generated/StateMarketDeals.ndjson
generated/update-spark-db.sql


# Added by cargo

/target
317 changes: 317 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 11 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[package]
name = "fil-deal-ingester"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
env_logger = "0.10.1"
json-event-parser = "0.1.1"
log = "0.4.20"
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,16 @@ The output is committed to git, see [./generated/ldn-clients.csv](./generated/ld

WARNING: The decompressed file has over 23 GB

3. Run
3. Build the tool for converting `StateMarketDeals.json` to newline-delimited JSON

```sh
cargo build --release
```

4. Run

```sh
jq --stream -c 'fromstream(1|truncate_stream(inputs))' StateMarketDeals.json > generated/StateMarketDeals.ndjson
./target/release/fil-deal-ingester StateMarketDeals.json > generated/StateMarketDeals.ndjson
```

WARNING: This will take very long (more than 1 hour).
Expand Down
Loading

0 comments on commit 105859e

Please sign in to comment.