Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: cannot load lexicon file #84

Open
janwillemb opened this issue Sep 23, 2023 · 5 comments
Open

ERROR: cannot load lexicon file #84

janwillemb opened this issue Sep 23, 2023 · 5 comments

Comments

@janwillemb
Copy link

janwillemb commented Sep 23, 2023

docker-compose up leads to a successful build and the container is listening on port 80.
But when browsing to port 80: bad gateway.

In the log tscan complains:

tscan-tscan-1  | 11:09:06.40: Ready.
tscan-tscan-1  | 11:09:07.34: Starting wopr 1.43
tscan-tscan-1  | 11:09:07.34: Timbl support built in.
tscan-tscan-1  | 11:09:07.34: Based on timbl 6.9
tscan-tscan-1  | 11:09:07.34: Based on libfolia 2.17
tscan-tscan-1  | 11:09:07.34: ICU support, version 70.1
tscan-tscan-1  | 11:09:07.34: std::numeric_limits<int>::max() = 2147483647
tscan-tscan-1  | 11:09:07.34: std::numeric_limits<long>::max() = 9223372036854775807
tscan-tscan-1  | 11:09:07.34: PID:   2005 PPID:   2004
tscan-tscan-1  | 11:09:07.34: Running: xmlserver
tscan-tscan-1  | 11:09:07.34: xmlserver. Returns a FoLiA document over a sequence.
tscan-tscan-1  | 11:09:07.34:  ibasefile: /usr/local/share/tscan/sonar_newspapercorp_tokenized.3.txt.l2r0_-a4+D.ibase
tscan-tscan-1  | 11:09:07.34:  port:      7020
tscan-tscan-1  | 11:09:07.34:  keep:      1
tscan-tscan-1  | 11:09:07.34:  moses:     0
tscan-tscan-1  | 11:09:07.34:  lb:        1
tscan-tscan-1  | 11:09:07.34:  lc:        2
tscan-tscan-1  | 11:09:07.34:  rc:        0
tscan-tscan-1  | 11:09:07.34:  verbose:   2
tscan-tscan-1  | 11:09:07.34:  timbl:
tscan-tscan-1  | 11:09:07.34:  lexicon    /usr/local/share/tscan/sonar_newspapercorp_tokenized.3.txt.lex
tscan-tscan-1  | 11:09:07.34:  hapax:     0
tscan-tscan-1  | 11:09:07.34:  skip_sm:   false
tscan-tscan-1  | 11:09:07.34: ERROR: cannot load lexicon file.
tscan-tscan-1  | 11:09:07.34: Result = -1
tscan-tscan-1  | 11:09:07.34: Running for 00s

Indeed, there is no lex file in the directory /usr/local/share/tscan/:

@arianpasquali
Copy link

arianpasquali commented Sep 26, 2023

Same here. I ran docker-compose up and everything looks fine except it fails to load the lexicon file.
Browsing http://localhost:8830 gives me 502 bad gateway as well.

@kosloot
Copy link
Collaborator

kosloot commented Sep 26, 2023

I noticed:

tscan-tscan-1 | 11:09:07.34: Starting wopr 1.43

So you use the latest version from Git. Unfortunately I performed al lot of code cleaning to it, without a proper testbed.
Therefor I didn't release it, (nor planning a release)

I suggest to try to revert to the release 1.42
But that won't fix missing files of course

@arianpasquali
Copy link

arianpasquali commented Sep 28, 2023

@kosloot do you recommend any specific branch to build the docker image?

We are using wopr as it is specified in the Dockerfile line 130.
It is simply cloning the wopr repo.

I ve also checked previous releases like https://github.com/UUDigitalHumanitieslab/tscan/releases/tag/v0.9.8 but they look deprecated since they do not support docker.

@kosloot
Copy link
Collaborator

kosloot commented Sep 29, 2023

@kosloot do you recommend any specific branch to build the docker image?

No, I have no say at all in the Tscan releases and/or the the Docker builds.
I only wanted to note that the Git version of Wopr might not be the first choice. It is old and flaky software anyway.
But switching to an older version will not resolve missing files, they should be provided in the image I assume.

@Karel-Kroeze
Copy link

Karel-Kroeze commented Feb 6, 2025

if anyone else runs into this, it appears these files are supposed to be grabbed as part of the build step using downloaddata.sh script, but that doesn't seem to happen.

Manually running the batch script, or in my case just downloading the archive from https://dhstatic.hum.uu.nl/tscan/tscan-bigdata.tar.bz2 and extracting the files within into ./data seems to fill in the missing files.

Note that you should probably do this before building, as adding these files invalidates part of the build cache and triggers a (partial) new build on docker compose up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants