Bringing Large Language Models to iOS: Running Gemma 2B with MediaPipe Tasks GenAI

28 Sep 2024 in Project

Introduction

In the age of mobile-first experiences, running large language models (LLMs) directly on devices represents a powerful shift in how we approach AI applications. My latest project, LLM-on-iOS, demonstrates this paradigm by integrating the Gemma 2B language model into an iOS application using MediaPipe Tasks GenAI. This setup allows for efficient, on-device natural language processing and generation—entirely offline and privacy-preserving.

Project Overview

LLM-on-iOS is a Swift-based mobile application that showcases how LLMs can operate locally on iOS devices. The project uses MediaPipe’s GenAI tools to facilitate low-latency, customizable inference of the Gemma 2B model. By removing the reliance on cloud services, this approach enhances user privacy and reduces network dependencies.

Demo Video

Key Features

On-Device Inference: Runs the Gemma 2B model locally on the iPhone or iPad.
Customizable Generation: Supports tuning of inference parameters like maximum tokens and temperature.
Seamless UI Integration: Offers a smooth user experience through a native iOS interface.

Getting Started

Requirements

To build and run the app, you’ll need:

Xcode 15.0 or newer
iOS 16.0 or later
CocoaPods for dependency management

Installation Steps

Clone the repository:

git clone https://github.com/yourusername/llm-on-ios.git
cd llm-on-ios

Install dependencies:
```
pod install
```
Open the project in Xcode:
```
open llm-on-ios.xcworkspace
```
Add the Gemma 2B model file (gemma-2b-it-cpu-int4.bin) to your project:
- Drag the file into the Xcode project navigator.
- Check “Copy items if needed.”
- Make sure it’s included in the main target.

Usage Example

Using the model is straightforward. The LLMManager class handles all interactions with Gemma 2B:

let llmManager = LLMManager()
llmManager.initializeLLM()

You can adjust inference settings to suit your application’s needs:

Max Tokens: Limits output length.
Temperature: Controls randomness and creativity.

Performance Notes

The Gemma 2B model is large; ensure adequate device storage.
Expect a slight delay during the initial inference, especially on older devices.
For the most accurate performance assessment, always test on real hardware rather than simulators.

Reflections & Learnings

This project reinforced the potential of running sophisticated AI models locally on consumer-grade hardware. MediaPipe’s GenAI toolkit significantly simplifies integration, making it accessible even for solo developers. One of the key takeaways was balancing model performance with device constraints—optimization and configuration are essential.

Conclusion

With LLM-on-iOS, I’ve explored the exciting frontier of on-device AI. By leveraging Gemma 2B and MediaPipe Tasks GenAI, this project highlights how powerful language models can be deployed efficiently and securely on mobile platforms. I look forward to expanding this work and exploring further optimizations and applications.

You can find the source code and contribute on GitHub.

Bringing Large Language Models to iOS: Running Gemma 2B with MediaPipe Tasks GenAI

Introduction

Project Overview

Demo Video

Key Features

Getting Started

Requirements

Installation Steps

Usage Example

Performance Notes

Reflections & Learnings

Conclusion

Andrew Choi

Error

Introduction

Project Overview

Demo Video

Key Features

Getting Started

Requirements

Installation Steps

Usage Example

Performance Notes

Reflections & Learnings

Conclusion

Templates (for web app):

Error