-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate llama.cpp in Aprapipes and a module ImageToTextXForm which can describe an image #345
base: main
Are you sure you want to change the base?
Conversation
#include "EncoderModelAbstract.h" | ||
|
||
class ClipEncoderProps : public EncoderModelAbstractProps { | ||
public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move all the definition to cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
base/include/EncoderModelAbstract.h
Outdated
VIVIT // Video Vision Transformer | ||
}; | ||
|
||
enum DataType { TEXT = 0, IMAGE, AUDIO, TEXT_EMBEDDING, IMAGE_EMBEDDING, AUDIO_EMBEDDING }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maintain all the enum in one single place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
base/CMakeLists.txt
Outdated
@@ -608,6 +625,9 @@ target_link_libraries(aprapipesut | |||
liblzma::liblzma | |||
bigint::bigint | |||
sfml-audio | |||
${COMMON_LIB} | |||
llama |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
base/include/EncoderModelAbstract.h
Outdated
|
||
EncoderModelAbstractProps() { | ||
modelArchitecture = ModelArchitectureType::BERT; | ||
inputTypes = {DataType::TEXT}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move all the declaration in cpp file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
base/include/EncoderModelAbstract.h
Outdated
VIVIT // Video Vision Transformer | ||
}; | ||
|
||
enum DataType { TEXT = 0, IMAGE, AUDIO, TEXT_EMBEDDING, IMAGE_EMBEDDING, AUDIO_EMBEDDING }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use FrameType instead of DataType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return true; | ||
} | ||
} | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throw AIP Exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
base/src/Llava.cpp
Outdated
updateProps(_props); | ||
} | ||
|
||
void updateProps(LlavaProps &_props) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename it to setModelProps
Remove from Constructor of Detail class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
base/src/Llava.cpp
Outdated
bool Llava::modelInit() { | ||
llama_backend_init(false /*NUMA Architecure set to false*/); | ||
|
||
mDetail->mLlavaModelParams = llama_model_default_params(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put in setModelProps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
base/src/Llava.cpp
Outdated
auto frame = frames.begin()->second; | ||
auto frameType = frame->getMetadata()->getFrameType(); | ||
int nPast = 0; | ||
std::string systemPrompt = "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.\nUSER:"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joiskash
System prompt will differ for different version of same model,it may come from Model Strategy class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
system prompt should be a property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
base/src/Llava.cpp
Outdated
std::cout << "\n"; | ||
|
||
/*Prediction token by token*/ | ||
for(int i = 0; i < nPredict; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mraduldubey @joiskash
Need to have different strategy for:-
- Chunk based Implementation
- One big Chunk of data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, we will only have the 2nd implementation
base/include/ModelStrategy.h
Outdated
# pragma once | ||
|
||
#include "Module.h" | ||
#include "ClipEncoder.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove headers which we are not using
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add specific headers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
base/include/SceneDescriptorXForm.h
Outdated
#include "Module.h" | ||
|
||
class SceneDescriptorXFormProps : public ModuleProps { | ||
public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of Crating new Enum, directly use Model strategies enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
} | ||
|
||
/*LLAVA SCENE-DESCRIPTOR STRATEGY*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joiskash @mraduldubey
-> We can keep this constructor as default, but user should also have an option to use different model weights, offloading layers into gpu, prompts
-> Don't use hard coded values
base/src/ModelStrategy.cpp
Outdated
SceneDescriptorModelStrategy::SceneDescriptorModelStrategy() : ModelStrategy() { | ||
auto clipProps = ClipEncoderProps("./data/llm/llava/llava-v1.6-7b/mmproj-model-f16.gguf"); | ||
auto llavaProps = LlavaProps("./data/llm/llava/llava-v1.6-7b/llava-v1.6-mistral-7b.Q8_0.gguf", "Describe the image", 2048, 512, 0.8, 10, 256); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add checks for path
base/src/ModelStrategy.cpp
Outdated
} | ||
|
||
/*LLAVE TEXT-TO-TEXT STRATEGY*/ | ||
LlavaTextToTextModelStrategy::LlavaTextToTextModelStrategy() : ModelStrategy() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as SceneDescriptorModelStrategy
base/src/SceneDescriptorXForm.cpp
Outdated
} | ||
return Module::term(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look into ClipEncoder comments
base/src/SceneDescriptorXForm.cpp
Outdated
bool SceneDescriptorXForm::process(frame_container &frames) { | ||
/*Encoder Model*/ | ||
mDetail->modelStrategy->encoderModel->push(frames); | ||
mDetail->modelStrategy->encoderModel->step(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opt 1
-> Push Should Internally call step
Opt 2
-> If Que is not empty , then we should automatically take up the job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
base/src/SceneDescriptorXForm.cpp
Outdated
mDetail->modelStrategy->encoderModel->step(); | ||
auto clipFrame = | ||
makeFrame(mDetail->modelStrategy->encoderModel->getFrameSize()); | ||
auto clipMetaData = boost::shared_ptr<FrameMetadata>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Push should Take 2 arguments (input Buffer & reference to output buffer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
base/src/SceneDescriptorXForm.cpp
Outdated
mDetail->modelStrategy->llmModel->step(); | ||
auto outFrame = makeFrame(mDetail->modelStrategy->llmModel->getFrameSize()); | ||
mDetail->modelStrategy->llmModel->getFrames(outFrame); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Input -> Frame Container
Output -> Frame Container
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
#include "ModelStrategy.h" | ||
#include "Module.h" | ||
#include "ExternalSinkModule.h" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add More Tests for "getProps" & "setProps"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left a few comments please address them, thanks!
find_library(COMMON_LIB NAMES common_llama.lib libcommon_llama.a REQUIRED) | ||
find_library(LLAVA_LIB NAMES llavalib.lib libllavalib.a REQUIRED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a separate find lib if we are using find package for Llama?
auto outputPinId = inputFrameContainer.begin()->first; | ||
auto inputFrame = inputFrameContainer.begin()->second; | ||
mDetail->storedData = llava_image_embed_make_with_bytes( | ||
mDetail->mClipContext, 8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add 8 number of threads to props
float *char_buffer = (float *)frame->data(); | ||
char_buffer += sizeof(llava_image_embed); | ||
memcpy(char_buffer, mDetail->storedData->embed, | ||
clip_embd_nbytes(mDetail->mClipContext)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clear storedData if not needed
frame_container clipFrames; | ||
mDetail->modelStrategy->encoderModel->push(frames, clipFrames, [&](size_t size) -> frame_sp {return makeFrame(size, mDetail->mOutputPinId); }); | ||
|
||
/*LLM Model*/ | ||
frame_container llavaFrames; | ||
mDetail->modelStrategy->llmModel->push(clipFrames, llavaFrames, [&](size_t size) -> frame_sp {return makeFrame(size, mDetail->mOutputPinId); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a queue in the models? Since everything is blocking and looks
|
||
bool EncoderModelAbstract::term() | ||
{ | ||
mQue->clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mQueue
|
||
void ImageToTextXForm::setProps(ImageToTextXFormProps &props) | ||
{ | ||
if (props.modelStrategyType != mDetail->mProps.modelStrategyType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not allow gpuLayers and modelStratgeyType to change in between
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove all hardcoded values
IMPORTANT: All PRs must be linked to an issue (except for extremely trivial and straightforward changes).
Fixes #[Issue]
Description
Alternative(s) considered
Have you considered any alternatives? And if so, why have you chosen the approach in this PR?
Type
Type Choose one: (Feature)
Screenshots (if applicable)
Checklist