Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Cogvlm v7 and internlm-xcomposer2-vl-7b #2

Open
311-code opened this issue Jun 12, 2024 · 2 comments
Open

[Feature Request] Cogvlm v7 and internlm-xcomposer2-vl-7b #2

311-code opened this issue Jun 12, 2024 · 2 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@311-code
Copy link

311-code commented Jun 12, 2024

I have used cogvlm v7 (through a guy on patreon) to caption 120,000 images and itndies a very good job, it does it even on some uncensored photos (in detail) if I use a special prompt with English and Chinese characters. I also found internlm-xcomposer2-vl-7b to be pretty good but seemed a bit more censored.

Would love to see these two models added.l as options to this repo. Also not sure if this is just sci-fi yet. But an ability fir a clip vision to follow directions such as in certain scenarios if the image will not crop to 1024x1024 without cutting out the subject to put black bars on the left and right side (poor man's masking) and always worked well for me as long as most if the dataset wasn't like this.

In scenarios when it's a full body shot and the subject is next to someone to "crop the image so that the entire person is in view and any other person is not, move then to the side of the image"

Maybe you could have a ling context for certain scenarios and have the ai/clipvision follow your preset rules for cropping.

Ps. Sd3 dataset was captioned using 50 percent cogvlm. I had no issues running on my 4090 but I forgot how many parameters this one was. I believe it may have been thudm--cogvlm-chat-hf

@Trevor-Z
Copy link

What was that special prompt, if I may ask?

@mikeknapp
Copy link
Owner

Thanks so much for these suggestions @brentjohnston!

I do like idea of adding black bars -- and initially I got very excited about doing this -- but then I wondered, do you really need to do this? It was my understanding that, for LORAs at least using Kohya ss, bucketing happens automatically and you don't need to worry too much about the crop. Can you explain why you do that, does it improve the quality?

As for integrating the other models, I'm not so familiar with those. I take it these are LLM/more advanced taggers of some sort? Are you typically dealing with 100K+ datasets?

I probably need to clarify in the description that I'm thinking Candy Machine is more focused on smaller datasets (<1k images), and really only for manual / semi-automated tagging. The project where you had 120,000 images is probably out of scope.

Curious to learn more about how you'd want to use this though....

Also, for the LLM/tagging model side of things, are you a coder yourself? I'm wondering if I exposed some kind of API whether people could integrate these things themselves?

@mikeknapp mikeknapp added enhancement New feature or request question Further information is requested labels Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants