Skip to content

Commit

Permalink
Update README.md & docs
Browse files Browse the repository at this point in the history
  • Loading branch information
jeromewu committed May 8, 2019
1 parent 9206c02 commit b322feb
Show file tree
Hide file tree
Showing 5 changed files with 195 additions and 204 deletions.
214 changes: 10 additions & 204 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,205 +56,13 @@ or
# Documentation

* [Examples](./docs/examples.md)
* [Tesseract.recognize](#tesseractrecognizeimage-imagelike-options---tesseractjob)
+ [Simple Example](#simple-example)
+ [More Complicated Example](#more-complicated-example)
* [Tesseract.detect](#tesseractdetectimage-imagelike---tesseractjob)
* [ImageLike](#imagelike)
* [TesseractJob](#tesseractjob)
+ [TesseractJob.progress](#tesseractjobprogresscallback-function---tesseractjob)
+ [TesseractJob.then](#tesseractjobthencallback-function---tesseractjob)
+ [TesseractJob.catch](#tesseractjobcatchcallback-function---tesseractjob)
+ [TesseractJob.finally](#tesseractjobfinallycallback-function---tesseractjob)
* [Local Installation](#local-installation)
+ [corePath](#corepath)
+ [workerPath](#workerpath)
+ [langPath](#langpath)
* [Contributing](#contributing)
+ [Development](#development)
+ [Building Static Files](#building-static-files)
+ [Send us a Pull Request!](#send-us-a-pull-request)
* [Image Format](./docs/image-format.md)
* [API](./docs/api.md)
* [Local Installation](./docs/local-installation.md)

# Contributing

## Tesseract.recognize(image: [ImageLike](#imagelike)[, options]) -> [TesseractJob](#tesseractjob)
Figures out what words are in `image`, where the words are in `image`, etc.
> Note: `image` should be sufficiently high resolution.
> Often, the same image will get much better results if you upscale it before calling `recognize`.
- `image` is any [ImageLike](#imagelike) object.
- `options` is either absent (in which case it is interpreted as `'eng'`), a string specifing a language short code from the [language list](./docs/tesseract_lang_list.md), or a flat json object that may:
+ include properties that override some subset of the [default tesseract parameters](./docs/tesseract_parameters.md)
+ include a `lang` property with a value from the [list of lang parameters](./docs/tesseract_lang_list.md)

Returns a [TesseractJob](#tesseractjob) whose `then`, `progress`, `catch` and `finally` methods can be used to act on the result.

### Simple Example:
```javascript
Tesseract.recognize(myImage)
.then(function(result){
console.log(result)
})
```

### More Complicated Example:
```javascript
// if we know our image is of spanish words without the letter 'e':
Tesseract.recognize(myImage, {
langs: 'spa',
tessedit_char_blacklist: 'e'
})
.then(function(result){
console.log(result)
})
```




## Tesseract.detect(image: [ImageLike](#imagelike)) -> [TesseractJob](#tesseractjob)

Figures out what script (e.g. 'Latin', 'Chinese') the words in image are written in.

- `image` is any [ImageLike](#imagelike) object.

Returns a [TesseractJob](#tesseractjob) whose `then`, `progress`, `catch` and `finally` methods can be used to act on the result of the script.


```javascript
Tesseract.detect(myImage)
.then(function(result){
console.log(result)
})
```


## ImageLike

The main Tesseract.js functions take an `image` parameter, which should be something that is like an image. What's considered "image-like" differs depending on whether it is being run from the browser or through NodeJS.

On a browser, an image can be:
- an `img`, `video`, or `canvas` element
- a `File` object (from a file `<input>` or drag-drop event)
- a path or URL to an accessible image (the image must either be hosted locally)

In Node.js, an image can be
- a path to a local image


## TesseractJob

A TesseractJob is an object returned by a call to `recognize` or `detect`. It's inspired by the ES6 Promise interface and provides `then` and `catch` methods. It also provides `finally` method, which will be fired regardless of the job fate. One important difference is that these methods return the job itself (to enable chaining) rather than new.

Typical use is:
```javascript
Tesseract.recognize(myImage)
.progress(message => console.log(message))
.catch(err => console.error(err))
.then(result => console.log(result))
.finally(resultOrError => console.log(resultOrError))
```

Which is equivalent to:
```javascript
var job1 = Tesseract.recognize(myImage);

job1.progress(message => console.log(message));

job1.catch(err => console.error(err));

job1.then(result => console.log(result));

job1.finally(resultOrError => console.log(resultOrError));
```



### TesseractJob.progress(callback: function) -> TesseractJob
Sets `callback` as the function that will be called every time the job progresses.
- `callback` is a function with the signature `callback(progress)` where `progress` is a json object.

For example:
```javascript
Tesseract.recognize(myImage)
.progress(function(message){console.log('progress is: ', message)})
```

The console will show something like:
```javascript
progress is: {loaded_lang_model: "eng", from_cache: true}
progress is: {initialized_with_lang: "eng"}
progress is: {set_variable: Object}
progress is: {set_variable: Object}
progress is: {recognized: 0}
progress is: {recognized: 0.3}
progress is: {recognized: 0.6}
progress is: {recognized: 0.9}
progress is: {recognized: 1}
```


### TesseractJob.then(callback: function) -> TesseractJob
Sets `callback` as the function that will be called if and when the job successfully completes.
- `callback` is a function with the signature `callback(result)` where `result` is a json object.


For example:
```javascript
Tesseract.recognize(myImage)
.then(function(result){console.log('result is: ', result)})
```

The console will show something like:
```javascript
result is: {
blocks: Array[1]
confidence: 87
html: "<div class='ocr_page' id='page_1' ..."
lines: Array[3]
oem: "DEFAULT"
paragraphs: Array[1]
psm: "SINGLE_BLOCK"
symbols: Array[33]
text: "Hello World↵from beyond↵the Cosmic Void↵↵"
version: "3.04.00"
words: Array[7]
}
```

### TesseractJob.catch(callback: function) -> TesseractJob
Sets `callback` as the function that will be called if the job fails.
- `callback` is a function with the signature `callback(error)` where `error` is a json object.

### TesseractJob.finally(callback: function) -> TesseractJob
Sets `callback` as the function that will be called regardless if the job fails or success.
- `callback` is a function with the signature `callback(resultOrError)` where `resultOrError` is a json object.

## Local Installation

In the browser, `tesseract.js` simply provides the API layer. Internally, it opens a WebWorker to handle requests. That worker itself loads code from the Emscripten-built `tesseract.js-core` which itself is hosted on a CDN. Then it dynamically loads language files hosted on another CDN.

Because of this we recommend loading `tesseract.js` from a CDN. But if you really need to have all your files local, you can use the `Tesseract.create` function which allows you to specify custom paths for workers, languages, and core.

```javascript
window.Tesseract = Tesseract.create({
workerPath: '/path/to/worker.js',
langPath: 'https://cdn.jsdelivr.net/gh/naptha/tessdata@gh-pages/3.02/',
corePath: 'https://cdn.jsdelivr.net/gh/naptha/[email protected]/index.js',
})
```

### corePath
A string specifying the location of the [tesseract.js-core library](https://github.com/naptha/tesseract.js-core), with default value 'https://cdn.jsdelivr.net/gh/naptha/[email protected]/index.js'. Set this string before calling `Tesseract.recognize` and `Tesseract.detect` if you want Tesseract.js to use a different file.

### workerPath
A string specifying the location of the [worker.js](./dist/worker.js) file. Set this string before calling `Tesseract.recognize` and `Tesseract.detect` if you want Tesseract.js to use a different file.

### langPath
A string specifying the location of the tesseract language files, with default value 'https://cdn.jsdelivr.net/gh/naptha/tessdata@gh-pages/3.02/'. Language file URLs are calculated according to the formula `langPath + langCode + '.traineddata.gz'`. Set this string before calling `Tesseract.recognize` and `Tesseract.detect` if you want Tesseract.js to use different language files.


## Contributing
### Development
## Development
To run a development copy of tesseract.js, first clone this repo.
```shell
> git clone https://github.com/naptha/tesseract.js.git
Expand All @@ -269,18 +77,16 @@ Then, `cd tesseract.js && npm install && npm start`

Starting up http-server, serving ./
Available on:
http://127.0.0.1:7355
http://[your ip]:7355
http://127.0.0.1:3000
http://[your ip]:3000

```

Then open `http://localhost:7355/examples/file-input/demo.html` in your favorite browser. The devServer automatically rebuilds `tesseract.js` and `tesseract.worker.js` when you change files in the src folder.
Then open `http://localhost:3000/examples/browser/demo.html` in your favorite browser. The devServer automatically rebuilds `tesseract.dev.js` and `worker.min.js` when you change files in the src folder.

### Building Static Files
## Building Static Files
After you've cloned the repo and run `npm install` as described in the [Development Section](#development), you can build static library files in the dist folder with

```shell
> npm run build
```

### Send us a Pull Request!
Thanks :)
146 changes: 146 additions & 0 deletions docs/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# API

## Tesseract.recognize(image [, options]) -> [TesseractJob](#tesseractjob)
Figures out what words are in `image`, where the words are in `image`, etc.
> Note: `image` should be sufficiently high resolution.
> Often, the same image will get much better results if you upscale it before calling `recognize`.
- `image` see [Image Format](./image-format.md) for more details.
- `options` is either absent (in which case it is interpreted as `'eng'`), a string specifing a language short code from the [language list](./tesseract_lang_list.md), or a flat json object that may:
+ include properties that override some subset of the [default tesseract parameters](./tesseract_parameters.md)
+ include a `lang` property with a value from the [list of lang parameters](./tesseract_lang_list.md), you can use multiple languages separated by '+', ex. `eng+chi_tra`

Returns a [TesseractJob](#tesseractjob) whose `then`, `progress`, `catch` and `finally` methods can be used to act on the result.

### Simple Example:
```javascript
const worker = new Tessearct.TesseractWorker();
worker
.recognize(myImage)
.then(function(result){
console.log(result);
});
```

### More Complicated Example:
```javascript
const worker = new Tessearct.TesseractWorker();
// if we know our image is of spanish words without the letter 'e':
worker
.recognize(myImage, {
lang: 'spa',
tessedit_char_blacklist: 'e',
})
.then(function(result){
console.log(result);
});
```

## Tesseract.detect(image) -> [TesseractJob](#tesseractjob)

Figures out what script (e.g. 'Latin', 'Chinese') the words in image are written in.

- `image` see [Image Format](./image-format.md) for more details.

Returns a [TesseractJob](#tesseractjob) whose `then`, `progress`, `catch` and `finally` methods can be used to act on the result of the script.

```javascript
const worker = new Tessearct.TesseractWorker();
worker
.detect(myImage)
.then(function(result){
console.log(result);
});
```

## TesseractJob

A TesseractJob is an object returned by a call to `recognize` or `detect`. It's inspired by the ES6 Promise interface and provides `then` and `catch` methods. It also provides `finally` method, which will be fired regardless of the job fate. One important difference is that these methods return the job itself (to enable chaining) rather than new.

Typical use is:
```javascript
const worker = new Tessearct.TesseractWorker();
worker.recognize(myImage)
.progress(message => console.log(message))
.catch(err => console.error(err))
.then(result => console.log(result))
.finally(resultOrError => console.log(resultOrError));
```

Which is equivalent to:
```javascript
const worker = new Tessearct.TesseractWorker();
const job1 = worker.recognize(myImage);

job1.progress(message => console.log(message));

job1.catch(err => console.error(err));

job1.then(result => console.log(result));

job1.finally(resultOrError => console.log(resultOrError));
```



### TesseractJob.progress(callback: function) -> TesseractJob
Sets `callback` as the function that will be called every time the job progresses.
- `callback` is a function with the signature `callback(progress)` where `progress` is a json object.

For example:
```javascript
const worker = new Tessearct.TesseractWorker();
worker.recognize(myImage)
.progress(function(message){console.log('progress is: ', message)});
```

The console will show something like:
```javascript
progress is: {loaded_lang_model: "eng", from_cache: true}
progress is: {initialized_with_lang: "eng"}
progress is: {set_variable: Object}
progress is: {set_variable: Object}
progress is: {recognized: 0}
progress is: {recognized: 0.3}
progress is: {recognized: 0.6}
progress is: {recognized: 0.9}
progress is: {recognized: 1}
```


### TesseractJob.then(callback: function) -> TesseractJob
Sets `callback` as the function that will be called if and when the job successfully completes.
- `callback` is a function with the signature `callback(result)` where `result` is a json object.


For example:
```javascript
const worker = new Tessearct.TesseractWorker();
worker.recognize(myImage)
.then(function(result){console.log('result is: ', result)});
```

The console will show something like:
```javascript
result is: {
blocks: Array[1]
confidence: 87
html: "<div class='ocr_page' id='page_1' ..."
lines: Array[3]
oem: "DEFAULT"
paragraphs: Array[1]
psm: "SINGLE_BLOCK"
symbols: Array[33]
text: "Hello World↵from beyond↵the Cosmic Void↵↵"
version: "3.04.00"
words: Array[7]
}
```

### TesseractJob.catch(callback: function) -> TesseractJob
Sets `callback` as the function that will be called if the job fails.
- `callback` is a function with the signature `callback(error)` where `error` is a json object.

### TesseractJob.finally(callback: function) -> TesseractJob
Sets `callback` as the function that will be called regardless if the job fails or success.
- `callback` is a function with the signature `callback(resultOrError)` where `resultOrError` is a json object.
2 changes: 2 additions & 0 deletions docs/examples.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Tesseract.js Examples

You can also check [examples](../examples) folder.

### basic

```javascript
Expand Down
Loading

0 comments on commit b322feb

Please sign in to comment.