This version extends metrics in inference
mode.
...
๐ฟ Weights loaded
Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing
๐ท๏ธ Eval 534 ms Sync 100 ms | Sent 6912 kB Recv 12540 kB | (24 tokens)
๐ถ Pred 68 ms Sync 25 ms | Sent 288 kB Recv 522 kB | them
๐ถ Pred 58 ms Sync 15 ms | Sent 288 kB Recv 522 kB | with
๐ถ Pred 57 ms Sync 11 ms | Sent 288 kB Recv 522 kB | TP
๐ถ Pred 43 ms Sync 18 ms | Sent 288 kB Recv 522 kB | .
...
๐ถ Pred 47 ms Sync 15 ms | Sent 288 kB Recv 522 kB | used
๐ถ Pred 52 ms Sync 32 ms | Sent 288 kB Recv 522 kB | in
๐ถ Pred 42 ms Sync 11 ms | Sent 288 kB Recv 522 kB | deep
๐ถ Pred 44 ms Sync 10 ms | Sent 288 kB Recv 522 kB | learning
Evaluation
nBatches: 32
nTokens: 24
tokens/s: 37.83 (26.43 ms/tok)
Prediction
nTokens: 40
tokens/s: 16.10 (62.10 ms/tok)