Bringing Large Language Models to iOS: Running Gemma 2B with MediaPipe Tasks GenAI

Introduction

In the age of mobile-first experiences, running large language models (LLMs) directly on devices represents a powerful shift in how we approach AI applications. My latest project, LLM-on-iOS, demonstrates this paradigm by integrating the Gemma 2B language model into an iOS application using MediaPipe Tasks GenAI. This setup allows for efficient, on-device natural language processing and generation—entirely offline and privacy-preserving.

Project Overview

LLM-on-iOS is a Swift-based mobile application that showcases how LLMs can operate locally on iOS devices. The project uses MediaPipe’s GenAI tools to facilitate low-latency, customizable inference of the Gemma 2B model. By removing the reliance on cloud services, this approach enhances user privacy and reduces network dependencies.

Demo Video

Key Features

  • On-Device Inference: Runs the Gemma 2B model locally on the iPhone or iPad.
  • Customizable Generation: Supports tuning of inference parameters like maximum tokens and temperature.
  • Seamless UI Integration: Offers a smooth user experience through a native iOS interface.

Getting Started

Requirements

To build and run the app, you’ll need:

  • Xcode 15.0 or newer
  • iOS 16.0 or later
  • CocoaPods for dependency management

Installation Steps

  1. Clone the repository:

    git clone https://github.com/yourusername/llm-on-ios.git
    cd llm-on-ios
    
  2. Install dependencies:

    pod install
    
  3. Open the project in Xcode:

    open llm-on-ios.xcworkspace
    
  4. Add the Gemma 2B model file (gemma-2b-it-cpu-int4.bin) to your project:

    • Drag the file into the Xcode project navigator.
    • Check “Copy items if needed.”
    • Make sure it’s included in the main target.

Usage Example

Using the model is straightforward. The LLMManager class handles all interactions with Gemma 2B:

let llmManager = LLMManager()
llmManager.initializeLLM()

You can adjust inference settings to suit your application’s needs:

  • Max Tokens: Limits output length.
  • Temperature: Controls randomness and creativity.

Performance Notes

  • The Gemma 2B model is large; ensure adequate device storage.
  • Expect a slight delay during the initial inference, especially on older devices.
  • For the most accurate performance assessment, always test on real hardware rather than simulators.

Reflections & Learnings

This project reinforced the potential of running sophisticated AI models locally on consumer-grade hardware. MediaPipe’s GenAI toolkit significantly simplifies integration, making it accessible even for solo developers. One of the key takeaways was balancing model performance with device constraints—optimization and configuration are essential.

Conclusion

With LLM-on-iOS, I’ve explored the exciting frontier of on-device AI. By leveraging Gemma 2B and MediaPipe Tasks GenAI, this project highlights how powerful language models can be deployed efficiently and securely on mobile platforms. I look forward to expanding this work and exploring further optimizations and applications.

You can find the source code and contribute on GitHub.