Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM AI #3

Open
1 of 10 tasks
parsehex opened this issue Dec 31, 2023 · 1 comment
Open
1 of 10 tasks

LLM AI #3

parsehex opened this issue Dec 31, 2023 · 1 comment

Comments

@parsehex
Copy link
Owner

parsehex commented Dec 31, 2023

  • - [x] (API) Fix return_prompt option to always return a value
  • (Client/API) Return the options that were directly passed to the model (wouldn't include unsupported options that were dropped).
  • Implement ban_eos_token
  • Implement custom_token_bans
  • Better (more generic) handling of API keys for corresponding clients (e.g. OpenAI client has hasKey method)
  • Return token count from local clients (is the only way to use .generate()?)
  • Implement streaming completion
  • Add "prefix" to LLM client. If defined and .list_models is implemented, will use this before model names when listing all models (replaces hardcoded "openai:")
  • Work on model downloading. It worked when I tried it once but it was pretty simple. At least need stuff like setting name to use for downloaded file/folder. Also, allow passing link to .gguf to download it to models_dir.
  • Figure out some system for picking a number of layers for GPU offloading for models that support it
@parsehex parsehex changed the title Maintainability Improvements LLM AI Jan 6, 2024
@parsehex
Copy link
Owner Author

parsehex commented Jan 8, 2024

In calculating the num-gpu-layers:

  1. The available VRAM factors in
  2. For gguf, the parameter count (e.g. 7b, 13b) and the quant (e.g. Q4, Q5_K_M) factor in

I suppose, for a given model size and quant we could come up with a vram estimate, but how much vram does each layer equal in vram?

--

This is a library for Go to parse gguf files, MIT licensed. Maybe do something to run from server?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant