Skip to content

Commit

Permalink
fix some typos
Browse files Browse the repository at this point in the history
  • Loading branch information
ngxson committed Apr 11, 2024
1 parent e3a0164 commit f81b80d
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 7 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Another WebAssembly binding for [llama.cpp](https://github.com/ggerganov/llama.cpp). Inspired by [tangledgroup/llama-cpp-wasm](https://github.com/tangledgroup/llama-cpp-wasm), but unlike it, **Wllama** aims to supports **low-level API** like (de)tokenization, embeddings,...

## Breaking changes
## Recent changes

- Version 1.5.0
- Support split model using [gguf-split tool](https://github.com/ggerganov/llama.cpp/tree/master/examples/gguf-split)
Expand Down Expand Up @@ -87,21 +87,21 @@ import { Wllama } from './esm/index.js';

Cases where we want to split the model:
- Due to [size restriction of ArrayBuffer](https://stackoverflow.com/questions/17823225/do-arraybuffers-have-a-maximum-length), the size limitation of a file is 2GB. If your model is bigger than 2GB, you can split the model into small files.
- Even with a small model, splitting into chunks allows the browser to download multiple chunks in parallel, thus making downloading process faster.
- Even with a small model, splitting into chunks allows the browser to download multiple chunks in parallel, thus making the download process a bit faster.

We use [gguf-split tool](https://github.com/ggerganov/llama.cpp/tree/master/examples/gguf-split) to split a big gguf file into smaller files:

```bash
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make gguf-split
# Split the model into chunks of 1GB
./gguf-split --split-max-size 1G ./my_model.gguf ./my_model
# Split the model into chunks of 512 Megabytes
./gguf-split --split-max-size 512M ./my_model.gguf ./my_model
```

This will output files ending with `-00001-of-00003.gguf`, `-00002-of-00003.gguf`,...

You can then give a list of uplaoded files to `loadModelFromUrl`:
You can then give a list of uploaded files to `loadModelFromUrl`:

```js
await wllama.loadModelFromUrl(
Expand All @@ -111,7 +111,7 @@ await wllama.loadModelFromUrl(
'https://huggingface.co/ngxson/tinyllama_split_test/resolve/main/stories15M-q8_0-00003-of-00003.gguf',
],
{
n_download_parallel: 5, // optional: maximum files to download in parallel
n_download_parallel: 5, // optional: maximum files to download in parallel (default: 3)
},
);
```
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@wllama/wllama",
"version": "1.5.0",
"version": "1.5.1",
"description": "Low-level WASM binding for llama.cpp",
"main": "index.js",
"type": "module",
Expand Down

0 comments on commit f81b80d

Please sign in to comment.