Skip to content

Commit

Permalink
[LLM][NPU] Fixed GENERATE_HINT (#1526)
Browse files Browse the repository at this point in the history
- *Don't pass `GENERATE_HINT` always as `GENERATE_CONFIG` can't be used
with it*

Related PRs:
- *openvinotoolkit/openvino#28385
  • Loading branch information
AsyaPronina authored Jan 10, 2025
1 parent 77611da commit 4ac98b8
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions src/cpp/src/llm_pipeline_static.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -718,7 +718,6 @@ std::shared_ptr<ov::CompiledModel> StatefulLLMPipeline::setupAndCompileModel(
const uint32_t kMaxPromptLen = pop_int_and_cast(pipeline_config, "MAX_PROMPT_LEN").value_or(1024u);
const uint32_t kMinResponseLen = pop_int_and_cast(pipeline_config, "MIN_RESPONSE_LEN").value_or(128u);
m_kvcache_total = kMaxPromptLen + kMinResponseLen;
std::string generate_hint = pop_or_default<std::string>(pipeline_config, "GENERATE_HINT", "FAST_COMPILE");

update_config(pipeline_config, {"NPU_USE_NPUW", "YES"});
update_config(pipeline_config, {"NPUW_LLM", "YES"});
Expand All @@ -729,7 +728,6 @@ std::shared_ptr<ov::CompiledModel> StatefulLLMPipeline::setupAndCompileModel(

update_config(pipeline_config, {"NPUW_LLM_MAX_PROMPT_LEN", kMaxPromptLen});
update_config(pipeline_config, {"NPUW_LLM_MIN_RESPONSE_LEN", kMinResponseLen});
update_config(pipeline_config, {"NPUW_LLM_GENERATE_HINT", generate_hint});

// NB: Try to apply opt transpose only for Llama-2-7b-chat-hf model
if ( model_desc.name_or_path == "meta-llama/Llama-2-7b-chat-hf" ||
Expand All @@ -739,6 +737,7 @@ std::shared_ptr<ov::CompiledModel> StatefulLLMPipeline::setupAndCompileModel(

rename_key(pipeline_config, "PREFILL_CONFIG", "NPUW_LLM_PREFILL_CONFIG");
rename_key(pipeline_config, "GENERATE_CONFIG", "NPUW_LLM_GENERATE_CONFIG");
rename_key(pipeline_config, "GENERATE_HINT", "NPUW_LLM_GENERATE_HINT");

// Replace CACHE_DIR option if NPUW is enabled
set_npuw_cache_dir(pipeline_config);
Expand Down

0 comments on commit 4ac98b8

Please sign in to comment.