Skip to content

Commit

Permalink
add optional log aggregation to NSQ-messaging service (close #77)
Browse files Browse the repository at this point in the history
  • Loading branch information
dimus committed Feb 6, 2022
1 parent ed11c1e commit 6eb9b04
Show file tree
Hide file tree
Showing 11 changed files with 248 additions and 72 deletions.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2020-2021 gnames
Copyright (c) 2020-2022 gnames

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
105 changes: 74 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ search feature.
* [Installation](#installation)
* [Using Homebrew on Mac OS X, Linux, and Linux on Windows ([WSL2])](#using-homebrew-on-mac-os-x-linux-and-linux-on-windows-wsl2)
* [MS Windows](#ms-windows)
* [Linux and Mac](#linux-and-mac)
* [Linux and Mac (without Homebrew)](#linux-and-mac-without-homebrew)
* [Compile from source](#compile-from-source)
* [Usage](#usage)
* [As a web service](#as-a-web-service)
Expand All @@ -33,6 +33,8 @@ search feature.
* [only_preferred](#only_preferred)
* [quiet](#quiet)
* [sources](#sources)
* [web-logs](#web-logs)
* [nsqd-tcp](#nsqd-tcp)
* [Configuration file](#configuration-file)
* [Advanced Search Query Language](#advanced-search-query-language)
* [Examples of searches](#examples-of-searches)
Expand All @@ -49,15 +51,15 @@ If you want to cite GNverifier, use [DOI generated by Zenodo][Zenodo DOI]:
* Small and fast app to verify scientific names against many biodiversity
databases. The app is a client to a [verifier API].
* It provides 6 different match levels:
* Exact: complete match with a canonical form or a full name-string from a
data source.
* Fuzzy: if exact match did not happen, it tries to match name-strings
assuming spelling errors.
* Partial: strips middle or last epithets from bi- or multi-nomial names
and tries to match what is left.
* PartialFuzzy: the same as Partial but assuming spelling mistakes.
* Virus: verification of virus names.
* FacetedSearch: marks [advanced-search](#advanced-search) queries.
* **Exact**: complete match with a canonical form or a full name-string
from a data source.
* **Fuzzy**: if exact match did not happen, it tries to match name-strings
assuming spelling errors.
* **Partial**: strips middle or last epithets from bi- or multi-nomial names
and tries to match what is left.
* **PartialFuzzy**: the same as Partial but assuming spelling mistakes.
* **Virus**: verification of virus names.
* **FacetedSearch**: marks [advanced-search](#advanced-search) queries.
* Taxonomic resolution. If a database contains taxonomic information, it
returns the currently accepted name for the provided name-string.
* Best match is returned according to the match score. Data sources with some
Expand All @@ -76,7 +78,7 @@ If you want to cite GNverifier, use [DOI generated by Zenodo][Zenodo DOI]:
to find abbreviated names, search by author, year etc.
* Supports feeding data via pipes of an operating system. This feature allows
to chain the program together with other tools.
* `GNverifier` includes a web-based graphical user interface identical to its
* [GNverifier] includes a web-based graphical user interface identical to its
"official" [web-service].

## Installation
Expand All @@ -88,7 +90,7 @@ developed for Mac OS X. Now it is also available on Linux, and can easily
be used on Windows 10, if Windows Subsystem for Linux (WSL) is
[installed][WSL install].

To use `GNverifier` with Homebrew:
To use [GNverifier] with Homebrew:

1. Install [Homebrew]

Expand All @@ -101,7 +103,7 @@ brew install gnverifier

### MS Windows

Download the latest release from [github], unzip.
Download the [latest release] from GitHub, unzip.

One possible way would be to create a default folder for executables and place
``GNverifier`` there.
Expand All @@ -118,17 +120,17 @@ copy path_to\gnverifier.exe C:\Users\your_username\bin
environment variable.

Another, simpler way, would be to use ``cd C:\Users\your_username\bin`` command
in ``cmd`` terminal window. The ``GNverifier`` program then will be automatically
in ``cmd`` terminal window. The [GNverifier] program then will be automatically
found by Windows operating system when you run its commands from that
directory.

You can also read a more detailed guide for Windows users in
[a PDF document][win-pdf].

### Linux and Mac
### Linux and Mac (without Homebrew)

Download the latest release from [github], untar, and install binary somewhere
in your path.
If [Homebrew] is not installed, download the [latest release] from GitHub,
untar, and install binary somewhere in your path.

```bash
tar xvf gnverifier-linux-0.1.0.tar.xz
Expand All @@ -146,8 +148,8 @@ go get github.com/gnames/gnverifier/gnverifier

## Usage

``GNverifier`` takes one name-string or a text file with one name-string per
line as an argument, sends a query with these data to [remote ``gnames``
[GNverifier] takes one name-string or a text file with one name-string per
line as an argument, sends a query with these data to a [remote GNames
server][gnames] to match the name-strigs against many different biodiversity
databases and returns results to STDOUT either in JSON, CSV or TSV format.

Expand Down Expand Up @@ -278,7 +280,7 @@ significantly speeds up parsin of the JSON on the user side.

#### jobs

If the list of names if very large, it is possible to tell GNverifier to
If the list of names if very large, it is possible to tell [GNverifier] to
run requests in parallel. In this example GNverifier will run 8 processes
simultaneously. The order of returned names will be somewhat randomized.

Expand Down Expand Up @@ -317,7 +319,7 @@ Removes log messages from the output. Note that results of verification go
to STDOUT, while log messages go to STDERR. So instead of using `-q` flag
STDERR can be redirected to `/dev/null`:

```
```bash
gnverifier "Puma concolor" -q >verif-results.csv

#or
Expand All @@ -327,11 +329,11 @@ gnverifier "Puma concolor 2>/dev/null >verif-results.csv
#### sources
By default ``GNverifier`` returns only one "best" result of a match. If a user
By default [GNverifier] returns only one "best" result of a match. If a user
has a particular interest in a data set, s/he can set it with this option, and
all matches that exist for this source will be returned as well. You need to
provide a data source id for a dataset. Ids can be found at the following
[URL][data_source_ids]. Some of them are provided in the ``GNverifier`` help
[URL][data_source_ids]. Some of them are provided in the GNverifier help
output as well.
Data from such sources will be returned in preferred_results section of JSON
Expand All @@ -354,19 +356,57 @@ gnverifier "Bubo bubo" -s 0
# potentially even more results get returned by adding --all_matches flag
gnverifier "Bubo bubo" -s 0 -M
```
The `sources` option would overwrite `ds:` settings in case of advanced search.
### web-logs
Requires `--port`. Enables output of logs for web-services.
```bash
gnverifier -p 8777 --web-logs
```
### nsqd-tcp
Rrequires `--port`. Allows to redirect web-service log output to [NSQ]
messaging server's TCP-based endpoint. It is handy for aggregations of logs
from [GNverifier] web-services running inside of Docker containers or in
Kubernetes pods.
```bash
gnverifier -p 8777 --nsqd-tcp=localhost:4150
# with logs printed out
gnverifier -p 8777 --nsqd-tcp=localhost:4150 --with-logs
```
### Configuration file
If you find yourself using the same flags over and over again, it makes sense
to edit configuration file instead. It is located at
`$HOME/.config/gnverifier.yaml`. After that you do not need to use command line
options and flags.
options and flags. Configuration file is self-documented, the [default
gnverifier.yaml] is located on GitHub
```bash
gnverifier file.txt
```
In case if [GNverifier] runs as a web-based user interface, it is also
possible to use environment variables for configuration.
| Env. Var. | Configuration |
| :---------------------- | :----------------- |
| GNV_FORMAT | Format |
| GNV_PREFERRED_ONLY | PreferredOnly |
| GNV_DATA_SOURCES | DataSources |
| GNV_WITH_ALL_MATCHES | WithAllMatches |
| GNV_WITH_CAPITALIZATION | WithCapitalization |
| GNV_VERIFIER_URL | VerifierURL |
| GNV_JOBS | Jobs |
| GNV_WEB_LOGS_NSQD_TCP | WebLogsNsqdTCP |
| GNV_WITH_WEB_LOGS | WithWebLogs |
### Advanced Search Query Language
Example: `g:M. sp:gallop. au:Oliv. y:1750-1799` or `n:M. gallop. Oliv. 1750-1799`
Expand Down Expand Up @@ -409,7 +449,7 @@ It includes following operators:
`tx:Magnoliopsida`).
`all:`
: If true, [gnverifier] will show all results, not only the best ones.
: If true, [GNverifier] will show all results, not only the best ones.
The setting can be `true` or `false` (`all:t`, `all:f`). This setting
will become true if `sources` command line option is set to `0`.
Expand Down Expand Up @@ -444,25 +484,28 @@ gnverifier "g:Cara. isp:daurica ds:1,12"
Authors: [Dmitry Mozzherin][dimus]
Copyright © 2020-2021 Dmitry Mozzherin. See [LICENSE] for further
Copyright © 2020-2022 Dmitry Mozzherin. See [LICENSE] for further
details.
[WSL2]: https://docs.microsoft.com/en-us/windows/wsl/install
[verifier API]: https://apidoc.globalnames.org/gnames-beta
[Catalogue of Life]: https://catalogueoflife.org/
[GBIF]: https://www.gbif.org/
[GNverifier]: https://github.com/gnames/gnverifier
[Homebrew]: https://brew.sh/
[LICENSE]: https://github.com/gnames/gnverifier/blob/master/LICENSE
[NSQ]: https://nsq.io/overview/quick_start.html
[WSL install]: https://docs.microsoft.com/en-us/windows/wsl/install-win10
[WSL2]: https://docs.microsoft.com/en-us/windows/wsl/install
[WoRMS]: https://marinespecies.org/
[Zenodo DOI]: https://zenodo.org/badge/latestdoi/297323648
[data_source_ids]: https://verifier.globalnames.org/data_sources
[default gnverifier.yaml]: https://github.com/gnames/gnverifier/blob/master/gnverifier/cmd/gnverifier.yaml
[dimus]: https://github.com/dimus
[github]: https://github.com/gnames/gnverifier/releases/latest
[gnames]: https://hub.apitree.com/dimus/gnames/
[latest release]: https://github.com/gnames/gnverifier/releases/latest
[gnames]: https://apidoc.globalnames.org/gnames-beta
[go-install]: https://golang.org/doc/install
[LICENSE]: https://github.com/gnames/gnverifier/blob/master/LICENSE
[test directory]: https://github.com/gnames/gnverifier/tree/master/testdata
[uBio]: https://ubio.org/
[verifier API]: https://apidoc.globalnames.org/gnames-beta
[web-service]: https://verifier.globalnames.org
[win-pdf]: https://github.com/gnames/gnverifier/blob/master/use-gnverifier-windows.pdf
[winpath]: https://www.computerhope.com/issues/ch000549.htm
66 changes: 47 additions & 19 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,43 @@ import (

// Config collects and stores external configuration data.
type Config struct {
// Batch is the size of the string slices fed into input channel for
// verification.
Batch int

// DataSources are IDs of DataSources that are important for
// user. Normally only one "the best" reusult returns. If user gives
// preferred sources, then matches from these sources are also
// returned.
DataSources []int

// Format determins the output. It can be either JSON or CSV.
Format gnfmt.Format

// Jobs is the number of verification jobs to run in parallel.
Jobs int

// NamesNumThreshold the number of names after which POST gets redirected
// to GET.
NamesNumThreshold int

// VerifierURL URL for gnames verification service. It only needs to
// be changed if user sets local version of gnames.
VerifierURL string

// PreferredOnly hides BestResult if the user wants to see only
// preferred results.
PreferredOnly bool

// DataSources are IDs of DataSources that are important for
// user. Normally only one "the best" reusult returns. If user gives
// preferred sources, then matches from these sources are also
// returned.
DataSources []int
// WebLogsNsqdTCP provides an address to the NSQ messenger TCP service. If
// this value is set and valid, the web logs will be published to the NSQ.
// The option is ignored if `Port` is not set.
//
// If WithWebLogs option is set to `false`, but `WebLogsNsqdTCP` is set to a
// valid URL, the logs will be sent to the NSQ messanging service, but they
// wil not appear as STRERR output.
// Example: `127.0.0.1:4150`
WebLogsNsqdTCP string

// WithAllMatches flag; if true, results include all matches per source,
// not only the best match.
Expand All @@ -27,20 +52,9 @@ type Config struct {
// will be capitalized when appropriate.
WithCapitalization bool

// VerifierURL URL for gnames verification service. It only needs to
// be changed if user sets local version of gnames.
VerifierURL string

// Jobs is the number of verification jobs to run in parallel.
Jobs int

// Batch is the size of the string slices fed into input channel for
// verification.
Batch int

// NamesNumThreshold the number of names after which POST gets redirected
// to GET.
NamesNumThreshold int
// WithWebLogs flag enables logs when running web-service. This flag is
// ignored if `Port` value is not set.
WithWebLogs bool
}

// Option is a type of all options for Config.
Expand Down Expand Up @@ -104,6 +118,20 @@ func OptNamesNumThreshold(i int) Option {
}
}

// OptWebLogsNsqdTCP provides a URL to NSQ messanging service.
func OptWebLogsNsqdTCP(s string) Option {
return func(cfg *Config) {
cfg.WebLogsNsqdTCP = s
}
}

// OptWithWebLogs sets the WithWebLogs field.
func OptWithWebLogs(b bool) Option {
return func(cfg *Config) {
cfg.WithWebLogs = b
}
}

// New is a Config constructor that takes external options to
// update default values to external ones.
func New(opts ...Option) Config {
Expand Down
Loading

0 comments on commit 6eb9b04

Please sign in to comment.