Retrieval Augmented Generation (RAG) is a compelling approach that combines the generative capabilities of Large Language Models (LLMs) with contextual data fetched from external sources. By retrieving relevant information from a knowledge base at runtime, RAG enhances the accuracy and relevance of chatbot responses.
In this project, I set out to build and deploy a RAG-enabled chatbot on a Raspberry Pi 5. Leveraging optimized open-source tools such as llama-cpp-python and FAISS, the chatbot delivers meaningful, context-aware responses using the Llama-3.1-8B model. The aim was to achieve high-performance inference on constrained hardware using efficient model quantization and Arm-specific optimizations.
This past weekend, I had the incredible opportunity to volunteer at the FIRST Tech Challenge UK, serving as both the Lead Field Inspector and FIRST Technical Advisor (FTA). It was an inspiring experience that reminded me why I’m passionate about robotics and STEM education. Being part of an event that empowers students to innovate, collaborate, and lead was nothing short of energizing.
About a month ago, I had the privilege of returning to the University of Cambridge—not as a student this time, but as a guest lecturer. I was invited to speak at an Engineering Applications lecture, where I demonstrated an AI agent running on an Arm-based CPU. The experience was especially meaningful as it took place in the very same lecture theatre where I once sat as an engineering student.
In the age of mobile-first experiences, running large language models (LLMs) directly on devices represents a powerful shift in how we approach AI applications. My latest project, LLM-on-iOS, demonstrates this paradigm by integrating the Gemma 2B language model into an iOS application using MediaPipe Tasks GenAI. This setup allows for efficient, on-device natural language processing and generation—entirely offline and privacy-preserving.
In the summer of 2023, I had the incredible opportunity to work as a Machine Learning Research Intern at the Auto-ID Labs, KAIST in Daejeon, South Korea. Over the course of three months, I was immersed in cutting-edge AI research, deploying models on edge devices, and contributing to real-world applications. This blog post reflects on my technical journey, the challenges I faced, and the valuable lessons I learned.