Beekee Learning By Doing projects as part of AI-for-Education.org learn more
Watch the webinar 'Getting AI to Work Offline' [here](https://www.youtube.com/watch?v=NS7Odcer2M8) part of the Coffee Chats series ☕
Assessing Performance and Educational eXperience with Lightweight AI in Low-Connectivity Settings (APEX) Through the APEX project (Assessing Performance and Educational eXperience with Lightweight AI in Low-Connectivity Settings) Beekee studies the challenge of leveraging lightweight generative artificial intelligence applications for education, in areas with limited or no internet access using single-board computers like the Raspberry Pi. This R&D project has had success on the identification of a first adapted task for the available computing power (namely text generation), the end-users (up to two concurrent users), as well as the identification, implementation, and assessment of models based on general performance metrics (Temperature, CPU, and importantly Energy usage) as well as task timing metrics (model loading, input evaluation, response generation). Next steps will cover the identification and analysis of additional education-oriented tasks using low-cost hardware (e.g. computer vision tasks with TPUs), using Retrieval Augmented Generation (RAG), as well as fine-tuning for offline an LLM that integrates questions from Sierra Leone’s teachers with answers currently generated with ChatGPT3.5 generated answers (answers to be updated with ChatGPT 4).
The purpose of APEX Phase 1 is to verify the feasibility and initial pedagogical value of making LLMs work offline on Single-Board Computers (e.g. Raspberry Pi or equivalents). If successful, this would suggest that applications of AI could be made available offline hence opening up access to AI on the edge in schools and locations in Low and Middle Income Countries (LMICs) where meaninful connectivity and power and is often lacking.
Our primary goal was to explore the feasibility and performance of generative AI models on low-capacity devices such as Raspberry Pi. We reached the following conclusions:
-
Model Evaluation: We evaluated various quantized versions of the Llama-2 model (2-bit, 4-bit, 6-bit, and 8-bit) to determine their performance on Raspberry Pi 5. The models were assessed based on memory consumption, CPU load, power consumption, and CPU temperature.
-
Implementation: We successfully implemented the Llama.cpp framework in C++ for efficient deployment of large-scale language models (LLM) on Raspberry Pi. This included developing Bash scripts to facilitate inference execution and data management.
-
Performance Metrics: We defined and measured key performance metrics, including model loading time, input evaluation time, and response generation time. These metrics provided insights into the feasibility and efficiency of deploying text generation models on resource-limited devices.
-
Applications Explored: We investigated various AI applications on Raspberry Pi, such as text generation, question-answering, document summarization, translation, and code generation. This exploration highlighted the versatility and potential of AI on low-capacity devices but raises a major concern on the validity, educational quality, and relevance for an 'out of of the box' usage.
-
We learned that a Single-Board Computer like the Raspberry Pi could be effectively used for lightweight generative Artificial Intelligence tasks, but only for a limited number of users -- realistically one. This defines the scope of application for offline AI initially for a teacher (1) or teaching supporting roles.
-
We learned that to cope with the constraints in memory of Single-Board Computers, asynchronous paradigms should be further explored to enable more users (e.g. students) to benefit from offline AI models.
- For instance, a first-in-first-out queue system with an estimated waiting time could be explored (similar to a ticket). For this approach to work, a substantial amount of upstream guidance and verification for submitting the prompt is required to avoid misuse of resources, affecting other users in the queue.
-
We learned that there's a very high presence of hallucination when using directly the readily available models, which raises concerns for educators and users who do not possess critical thinking skills.
-
A techno-centric solution to this problem consists of the use of Retrieval-Augmented Generation, which remains to be practically explored a further stage project (APEX Phase 2)
-
A human-centered solution, which could be used in conjunction with RAGs, lies in providing training for teachers and learners not only about prompting but also about the limitations of generative lightweight offline AI, and by extension general AI. The fact that these systems are limited in their capacities also provides an opportunity for developing critical thinking skills for learners, who should judge the answer from AI systems based on discipline-specific criteria.
-