Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
janheinrichmerker committed Nov 24, 2023
1 parent 27ffc43 commit 7485c73
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ A pointer to the WARC file is stored in the SERP index so that we can quickly ac
<!-- TODO: Add instructions on how to parse the SERPs' contents from the WARC files. -->


#### Import from AQL-22
### Imports

We support automatically importing providers and parsers from the AQL-22 YAML-file format
(see [`data/selected-services.yaml`](data/selected-services.yaml)).
Expand All @@ -263,6 +263,8 @@ aql providers import
aql parsers url-query import
aql parsers url-page import
aql parsers url-offset import
aql parsers warc-query import
aql parsers warc-snippets import
```

We also support importing a previous crawl of captures from the AQL-22 file system backend:
Expand All @@ -271,6 +273,12 @@ We also support importing a previous crawl of captures from the AQL-22 file syst
aql captures import aql-22
```

Last, we support importing all archives from the [Archive-It]() web archive service:

```shell
aql archives import archive-it
```

### Cluster (Helm/Kubernetes)
Running the Archive Query Log on a cluster is recommended for large-scale crawls.
We provide a Helm chart that automatically starts crawling and parsing jobs for you
Expand Down

0 comments on commit 7485c73

Please sign in to comment.