Llama-Vision: Enable tracing, refactor generation code #15005

cglagovichTT · 2024-11-13T15:42:24Z

Ticket

What's changed

This PR changes the Llama-Vision interface to make it easier to add batch>1 inference, continuous batching, and vLLM integration.

Refactored Llama-Vision demos
- Implemented prefill/decode wrapper in vision_generator.py
- Use new generator wrapper in all demos
- Added simple_vision_demo.py for easy testing and e2e perf measurement
Refactored Llama cross attention tests
- Added support for batch>1 xattn cache generation
Enable tracing in Llama-Vision

Checklist

Post commit CI https://github.com/tenstorrent/tt-metal/actions/runs/11821812054
- There are a few unit test failures but they have nothing to do with my code changes
T3K unit, frequent, demo https://github.com/tenstorrent/tt-metal/actions/runs/11821820087

…ashDecode are different

…ommit goes back to nlp_tms to create/concat heads.

…porting mask shapes required by non-causal FlashDecode

…tch > 1. WIP, since these changes have now broken the full model and demos

…=1, nonworking for batch>1.

…ep and model execution with ttnn tensors.

…reation and device tensor transformations. Enabled tracing in simple_vision_demo with an easy trace function

…ing generation pipelines.

ayerofieiev-tt

please don't forget to update PR title

mtairum

An interface matching the one's from Meta's Llama is great!

cglagovichTT added 19 commits November 13, 2024 15:38

#14519: Use FlashDecode in LlamaVision xattn

e6651e9

#14519: WIP create simpler interface for LlamaVision

7ed2318

#14519: Update xattn test input shapes since masks with non-causal Fl…

42f87a7

…ashDecode are different

#14519: Change TMs in xattn. Naive TMs now fail xattn test, so this c…

97a9d45

…ommit goes back to nlp_tms to create/concat heads.

#14519: Simple vision demo is functional, with llama_vision_model sup…

236b5f8

…porting mask shapes required by non-causal FlashDecode

#14519: unit tests for xattn, xblock, and xtransformer now support ba…

555f27d

…tch > 1. WIP, since these changes have now broken the full model and demos

#14519: Fix up Llama vision model. Simple demo works again with batch…

ef53146

…=1, nonworking for batch>1.

#14519: Refactor LlamaVision class to clean up separation of input pr…

7d50e6c

…ep and model execution with ttnn tensors.

#14519: Don't pass full token tensor into decode and prefill

ff2f7ba

#14519: Fix rebase issues

51e2c89

#14519: Refactored decode input preparation to separate host tensor c…

0b7807f

…reation and device tensor transformations. Enabled tracing in simple_vision_demo with an easy trace function

#14519: Implement LlamaVision generation class which plugs into exist…

71f0727

…ing generation pipelines.

#14519: Fix test script now that pytest params changed

81cc9b1

#14519: Remove breakpoint

1149598

#14519: license

9e5d0b9

#14519: Remove trace decorator

ac7ffcb

#14519: remove batch option from rot mat

751e4b1

#14519: Add traced demo to CI

659c111

#14519: Fix merge bug in xblock test

31944db

cglagovichTT force-pushed the cglagovich/14519_noopt branch from 60fdc3a to 31944db Compare November 13, 2024 16:43

cglagovichTT marked this pull request as ready for review November 13, 2024 17:25

cglagovichTT requested review from yieldthought, mtairum, uaydonat, ayerofieiev-tt, dmakoviichuk-tt, cfjchu, TT-BrianLiu, blozano-tt and ttmchiou as code owners November 13, 2024 17:25

cfjchu approved these changes Nov 13, 2024

View reviewed changes

ttmchiou approved these changes Nov 14, 2024

View reviewed changes

ayerofieiev-tt approved these changes Nov 14, 2024

View reviewed changes

cglagovichTT changed the title ~~Cglagovich/14519 noopt~~ Llama-Vision: Enable tracing, refactor generation code Nov 14, 2024

mtairum approved these changes Nov 14, 2024

View reviewed changes

cglagovichTT merged commit 758f8c9 into main Nov 14, 2024
149 of 152 checks passed

cglagovichTT deleted the cglagovich/14519_noopt branch November 14, 2024 18:04

uaydonat mentioned this pull request Nov 15, 2024

#13332: add ttnn implementation for Bert-Tiny model #13471

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama-Vision: Enable tracing, refactor generation code #15005

Llama-Vision: Enable tracing, refactor generation code #15005

cglagovichTT commented Nov 13, 2024 •

edited

Loading

ayerofieiev-tt left a comment

mtairum left a comment

Llama-Vision: Enable tracing, refactor generation code #15005

Llama-Vision: Enable tracing, refactor generation code #15005

Conversation

cglagovichTT commented Nov 13, 2024 • edited Loading

Ticket

What's changed

Checklist

ayerofieiev-tt left a comment

Choose a reason for hiding this comment

mtairum left a comment

Choose a reason for hiding this comment

cglagovichTT commented Nov 13, 2024 •

edited

Loading