Skip to content

Commit

Permalink
hide implementation under internal dir (close #55)
Browse files Browse the repository at this point in the history
  • Loading branch information
dimus committed Oct 30, 2022
1 parent 77a5ab7 commit 9232675
Show file tree
Hide file tree
Showing 8,426 changed files with 121 additions and 188 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@

## Unreleased

## [v1.0.0-RC1] - 2022-10-30 Sun

- Add [#55] - refactor the directory structure using `internal` directory
to hide code not suitable for public use.

## [v0.13.2] - 2022-09-12 Mon

- Add [#53] - classification ranks and ids in dump files.
Expand Down
15 changes: 0 additions & 15 deletions Dockerfile

This file was deleted.

21 changes: 4 additions & 17 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ all: install

tools: deps
@echo Installing tools from tools.go
@cat bhlindex/tools.go | grep _ | awk -F'"' '{print $$2}' | xargs -tI % go install %
@cat tools.go | grep _ | awk -F'"' '{print $$2}' | xargs -tI % go install %

deps:
@echo Download go.mod dependencies
Expand All @@ -26,29 +26,16 @@ test: deps install
go test -race ./...

build:
cd bhlindex; \
$(GOCLEAN); \
$(FLAGS_SHARED) $(GOBUILD);

install:
@echo Building and Installing bhlindex
cd bhlindex; \
$(FLAGS_SHARED) $(GOINSTALL); \
$(GOCLEAN);

release: dockerhub
@echo Building releases for Linux, Mac, Windows
cd bhlindex; \
release:
@echo Building release for Linux
$(GOCLEAN); \
$(FLAGS_SHARED) GOOS=linux $(GOBUILD); \
tar zcvf /tmp/bhlindex-${VER}-linux.tar.gz bhlindex;

docker: build
docker build -t gnames/bhlindex:latest -t gnames/bhlindex:$(VERSION) .; \
cd bhlindex; \
$(GOCLEAN);

dockerhub: docker
docker push gnames/bhlindex; \
docker push gnames/bhlindex:$(VERSION)

tar zcvf /tmp/bhlindex-${VER}-linux.tar.gz bhlindex;
67 changes: 31 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# Biodiversity Heritage Library Scientific Names Index
# Biodiversity Heritage Library Scientific Names Index (BHLindex)

[![Doc Status][doc-img]][doc]

Creates an index of scientific names occurring in the collection of literature
in Biodiversity Heritage Library
Expand Down Expand Up @@ -95,53 +97,45 @@ bhlindec verify -y

Dump data into tab-separated files

```bash
bhlindex dump
# to compress and save on disk
bhlindex dump | gzip > bhlindex-dump.csv.gz
Three files will be created: `pages`, `names`, `occurrences`. They
will have extension accodring to selected output format (CSV is the default).
If it is required to filter verified results by data-sources, their list and
corresponding IDs can be found at [gnverifier sources page]

Uncompressed dump files take more than 30GB of space.

# -f overrides configuration file settings for output format
bhlindex dump -f tsv | gzip > bhlindex-dump.tsv.gz
bhlindex dump -f json | gzip > bhlindex-dump.json.gz
```bash
# Dump files to a designated directory.
bhlindex dump -d ~/bhlindex-dump
# or
bhlindex dump --dir ~/bhlindex-dump

# Dump records verified to particular data-sources of `gnverifier`.
# In this case verified names are filtered by `The Catalogue of Life` (ID=1)
# and `The Encyclopedia of Life` (ID=12).
bhlindex dump -d ~/bhlindex-dump -s 1,12
or
bhlindex dump --dir ~/bhlindex-dump --sources 1,12

# Dump using JSON or TSV formats.
bhlindex dump -f tsv -d ~/bhlindex-dump
bhlindex dump -f json -d ~/bhlindex-dump
#or
bhlindex dump --format tsv --dir ~/bhlindex-dump
```

To run all commands together

```bash
bhlindex find -y && \
bhlindex verify -y && \
bhlindex dump | gzip > bhlindex-dump.csv.gz
bhlindex dump -d output-dir
```

Serve detected items, pages, verified names, names occurrences via RESTful
interface (default port is 8080).

```bash
bhlindex rest
# using different port
bhlindex rest -p 8000
```

## RESTful API endpoints

- `/api/v1/items`
- `/api/v1/pages`
- `/api/v1/names`
- `/api/v1/occurrences`

| Query | Usage |
| --------------------------------------------- | --------------------------------------------------------------------- |
| items?offset_id=11&limit=100 | get items with ids 11-110 |
| pages?offset_id=11&limit=10 | get pages of items with ids 11-20 |
| names?offset_id=1&limit=10 | get verified names with ids 1-10 |
| names?offset_id=1&limit=10&data_sources=1 | get verified names with ids 1-10 verified to the "Catalogue of Life" |
| occurrences?offset=21&limit=10 | get detected names with ids 21-30 |
| occurrences?offset=21&limit=10&data_sources=1 | get detected names with ids 21-30 verified to the "Catalogue of Life" |

### Testing

Testing requires PostgreSQL database `bhlindex_test`.
Testing will delete all data from the database.
Testing will delete all data from the test database.

```bash
go test
Expand All @@ -150,4 +144,5 @@ go test
[bhl-ocr]: http://opendata.globalnames.org/dumps/
[bhlindex-latest]: https://github.com/gnames/bhlindex/releases/latest
[bhl-test]: https://github.com/gnames/bhlindex/tree/master/testdata/bhl/ocr
[readme]: https://github.com/gnames/bhlindex/tree/master/bhlindex
[doc-img]: https://godoc.org/github.com/gnames/bhlindex?status.png
[doc]: https://godoc.org/github.com/gnames/bhlindex
Binary file added bhlindex
Binary file not shown.
21 changes: 0 additions & 21 deletions bhlindex/LICENSE

This file was deleted.

34 changes: 16 additions & 18 deletions bhlindex/cmd/bhlindex.yaml → cmd/bhlindex.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,10 @@
#
# BHLdir:

# OutputFormat is the format of the detected names dump.
# Can take values of:
# csv - comma-separated values
# tsv - tab-separated values
# json - JSON format (one JSON-encoded line per record)
#
# OutputFormat: csv

# OutputDir is the directory where to place output dump files.
# Default is the current directory
# Jobs is the number of parallel processes running for the name-finding.
# Default is 4
#
# OutputDir: .
# Jobs: 4

# PgHost is the IP or a name of a computer running PostgreSQL database.
# Default is "0.0.0.0"
Expand All @@ -36,19 +28,25 @@
#
# PgDatabase: bhlindex

# Jobs is the number of parallel processes running for the name-finding.
# Default is 4
###### OPTIONAL PARAMETERS ############

# OutputFormat is the format of the detected names dump.
# Can take values of:
# csv - comma-separated values
# tsv - tab-separated values
# json - JSON format (one JSON-encoded line per record)
#
# Jobs: 4
# OutputFormat: csv

# OutputDir is the directory where to place output dump files.
# Default is the current directory
#
# OutputDir: .

# VerifierURL points to a remote GNverifier service.
#
# VerifierURL: https://verifier.globalnames.org/api/v1

# WithWebLogs can be set to true, if logs from RESTful service are required.
# If it set to false, the logs are silenced.
# WithWebLogs: false

# WithoutConfirm can be set to true to avoid confirmation dialogs before
# destructive processes.
#
Expand Down
8 changes: 4 additions & 4 deletions bhlindex/cmd/dump.go → cmd/dump.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@ THE SOFTWARE.
package cmd

import (
"github.com/gnames/bhlindex"
"github.com/gnames/bhlindex/config"
"github.com/gnames/bhlindex/io/dbio"
"github.com/gnames/bhlindex/io/dumpio"
"github.com/gnames/bhlindex/internal"
"github.com/gnames/bhlindex/internal/config"
"github.com/gnames/bhlindex/internal/io/dbio"
"github.com/gnames/bhlindex/internal/io/dumpio"
"github.com/rs/zerolog/log"
"github.com/spf13/cobra"
)
Expand Down
10 changes: 5 additions & 5 deletions bhlindex/cmd/find.go → cmd/find.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ import (
"fmt"
"os"

"github.com/gnames/bhlindex"
"github.com/gnames/bhlindex/config"
"github.com/gnames/bhlindex/io/dbio"
"github.com/gnames/bhlindex/io/finderio"
"github.com/gnames/bhlindex/io/loaderio"
"github.com/gnames/bhlindex/internal"
"github.com/gnames/bhlindex/internal/config"
"github.com/gnames/bhlindex/internal/io/dbio"
"github.com/gnames/bhlindex/internal/io/finderio"
"github.com/gnames/bhlindex/internal/io/loaderio"
"github.com/rs/zerolog/log"
"github.com/spf13/cobra"
)
Expand Down
6 changes: 1 addition & 5 deletions bhlindex/cmd/opts.go → cmd/opts.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import (
"strconv"
"strings"

"github.com/gnames/bhlindex/config"
"github.com/gnames/bhlindex/internal/config"
"github.com/gnames/gnfmt"
"github.com/gnames/gnsys"
"github.com/rs/zerolog/log"
Expand Down Expand Up @@ -67,10 +67,6 @@ func getOpts(cfgPath string) {
opts = append(opts, config.OptVerifierURL(cfg.VerifierURL))
}

if cfg.WithWebLogs {
opts = append(opts, config.OptWithWebLogs(true))
}

if cfg.WithoutConfirm {
opts = append(opts, config.OptWithoutConfirm(true))
}
Expand Down
6 changes: 2 additions & 4 deletions bhlindex/cmd/root.go → cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ import (
"os"
"path/filepath"

"github.com/gnames/bhlindex"
"github.com/gnames/bhlindex/config"
bhlindex "github.com/gnames/bhlindex/internal"
"github.com/gnames/bhlindex/internal/config"
"github.com/gnames/gnsys"
"github.com/rs/zerolog/log"
"github.com/spf13/cobra"
Expand All @@ -52,7 +52,6 @@ type cfgData struct {
PgDatabase string
Jobs int
VerifierURL string
WithWebLogs bool
WithoutConfirm bool
}

Expand Down Expand Up @@ -122,7 +121,6 @@ func initConfig() {
_ = viper.BindEnv("PgDatabase", "BHLI_PG_DATABASE")
_ = viper.BindEnv("Jobs", "BHLI_JOBS")
_ = viper.BindEnv("VerifierURL", "BHLI_VERIFIER_URL")
_ = viper.BindEnv("WithWebLogs", "BHLI_WITH_WEB_LOGS")
_ = viper.BindEnv("WithoutConfirm", "BHLI_WITHOUT_CONFIRM")
viper.AutomaticEnv() // read in environment variables that match

Expand Down
8 changes: 4 additions & 4 deletions bhlindex/cmd/verify.go → cmd/verify.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,10 @@ import (
"fmt"
"os"

"github.com/gnames/bhlindex"
"github.com/gnames/bhlindex/config"
"github.com/gnames/bhlindex/io/dbio"
"github.com/gnames/bhlindex/io/verifio"
"github.com/gnames/bhlindex/internal"
"github.com/gnames/bhlindex/internal/config"
"github.com/gnames/bhlindex/internal/io/dbio"
"github.com/gnames/bhlindex/internal/io/verifio"
"github.com/rs/zerolog/log"
"github.com/spf13/cobra"
)
Expand Down
14 changes: 7 additions & 7 deletions bhlindex.go → internal/bhlindex.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ import (
"path/filepath"
"sync"

"github.com/gnames/bhlindex/config"
"github.com/gnames/bhlindex/ent/finder"
"github.com/gnames/bhlindex/ent/item"
"github.com/gnames/bhlindex/ent/loader"
"github.com/gnames/bhlindex/ent/name"
"github.com/gnames/bhlindex/ent/output"
"github.com/gnames/bhlindex/ent/verif"
"github.com/gnames/bhlindex/internal/config"
"github.com/gnames/bhlindex/internal/ent/finder"
"github.com/gnames/bhlindex/internal/ent/item"
"github.com/gnames/bhlindex/internal/ent/loader"
"github.com/gnames/bhlindex/internal/ent/name"
"github.com/gnames/bhlindex/internal/ent/output"
"github.com/gnames/bhlindex/internal/ent/verif"
"github.com/gnames/gnfmt"
"github.com/gnames/gnlib/ent/gnvers"
"github.com/rs/zerolog/log"
Expand Down
12 changes: 6 additions & 6 deletions bhlindex_test.go → internal/bhlindex_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ import (
"fmt"
"testing"

"github.com/gnames/bhlindex"
"github.com/gnames/bhlindex/config"
"github.com/gnames/bhlindex/io/dbio"
"github.com/gnames/bhlindex/io/finderio"
"github.com/gnames/bhlindex/io/loaderio"
"github.com/gnames/bhlindex/io/verifio"
bhlindex "github.com/gnames/bhlindex/internal"
"github.com/gnames/bhlindex/internal/config"
"github.com/gnames/bhlindex/internal/io/dbio"
"github.com/gnames/bhlindex/internal/io/finderio"
"github.com/gnames/bhlindex/internal/io/loaderio"
"github.com/gnames/bhlindex/internal/io/verifio"
"github.com/stretchr/testify/assert"
)

Expand Down
Loading

0 comments on commit 9232675

Please sign in to comment.