Skip to content

0.12.8

Latest
Compare
Choose a tag to compare
@b4rtaz b4rtaz released this 04 Mar 10:15
· 2 commits to main since this release
a91745d

This version extends metrics in inference mode.

...
๐Ÿ’ฟ Weights loaded
Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing
๐Ÿ”ท๏ธ Eval  534 ms Sync  100 ms | Sent  6912 kB Recv 12540 kB | (24 tokens)
๐Ÿ”ถ Pred   68 ms Sync   25 ms | Sent   288 kB Recv   522 kB |  them
๐Ÿ”ถ Pred   58 ms Sync   15 ms | Sent   288 kB Recv   522 kB |  with
๐Ÿ”ถ Pred   57 ms Sync   11 ms | Sent   288 kB Recv   522 kB |  TP
๐Ÿ”ถ Pred   43 ms Sync   18 ms | Sent   288 kB Recv   522 kB | .
...
๐Ÿ”ถ Pred   47 ms Sync   15 ms | Sent   288 kB Recv   522 kB |  used
๐Ÿ”ถ Pred   52 ms Sync   32 ms | Sent   288 kB Recv   522 kB |  in
๐Ÿ”ถ Pred   42 ms Sync   11 ms | Sent   288 kB Recv   522 kB |  deep
๐Ÿ”ถ Pred   44 ms Sync   10 ms | Sent   288 kB Recv   522 kB |  learning

Evaluation
   nBatches: 32
    nTokens: 24
   tokens/s: 37.83 (26.43 ms/tok)
Prediction
    nTokens: 40
   tokens/s: 16.10 (62.10 ms/tok)