Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android App Llama-v3-2-3B-Chat Quantized returns garbled text #34

Open
zhehui-chen opened this issue Dec 30, 2024 · 3 comments
Open

Android App Llama-v3-2-3B-Chat Quantized returns garbled text #34

zhehui-chen opened this issue Dec 30, 2024 · 3 comments
Labels

Comments

@zhehui-chen
Copy link

Follow the guideline to build the ChatApp with Llama-v3-2-3B-Chat Quantized. The QNN version I used is 2.28.2.

I successfully run the ChatApp on my android device (OnePlus 13 with snapdragon 8Elite).

However, while chatting with the app, it always returns me with garbled text like the following.

image

Does anyone have any idea about this problem?

@franklyd
Copy link

franklyd commented Jan 3, 2025

I have faced a similar issue. I found the generation quality downgraded a lot, comparing to run it with onnx-runtime in 4 bit.

@gustavla
Copy link

gustavla commented Jan 7, 2025

Hi @zhehui-chen and @franklyd,

Sorry to hear you are seeing poor results through the app.

We know that the app has issues on some consumer devices (especially on Android 14 or earlier). There are two underlying features needed in the Android "metabuild". This is why in the app README we say it only works on Android 15. If you can provide us with exactly what devices, what Android version, and ideally the exact Android build (should be in the settings), that would be really helpful so that we can investigate further why it's not working. Especially if it's on Android 15, where we expect it to work. Thanks!

@franklyd
Copy link

Thanks! Actually, I was running genie-t2t-run on Android device (gen 3) directly.
The instruction for the LLM was to generate some json output, but I observe much poorer quality comparing to onnx q4, especially it cannot follow the instruction to output in json.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants