PocketPal AI: On-Device LLMs

See how to run large language models entirely on your phone, no server needed. This talk covers on-device TTS, agentic loops, and a dev pipeline that benchmarks performance across devices.

llama React Native ONNX Appium Claude Code

Overview

PocketPal AI is an open-source mobile app that runs LLMs fully on-device on both iOS and Android. No server, no API key, and works in airplane mode.

Live demo:

Running models on a phone: chat with a local model, on-device TTS (ONNX), and an agentic loop that builds a simple web page, all on the phone.
The dev pipeline (pocketpal-dev-team): a multi-agent Claude Code pipeline that builds the app, then auto-runs e2e tests, takes screenshots and benchmarks across real devices to produce a per-release baseline report.

Then I show the stack underneath.

Links

https://github.com/a-ghorbani/pocketpal-ai
PocketPal AI runs local LLMs on-device using React Native.

Tech stack

llama

Meta's open-weights LLM family optimized for high-performance local deployment and custom fine-tuning across 8B to 405B parameter scales.

Llama 3.1 delivers state-of-the-art performance through a flagship 405B parameter model trained on 15 trillion tokens. It supports a 128k context window: ideal for analyzing massive datasets or long-form documentation. Developers utilize Llama for diverse tasks (multilingual translation, Python code generation, and complex reasoning) while maintaining data sovereignty via local hosting. The ecosystem includes the Llama Stack for agentic workflows and optimized weights for 8B and 70B models, ensuring high throughput on consumer hardware or enterprise clusters.

https://llama.meta.com/

View projects
React Native

React Native is an open-source framework for building native mobile applications (iOS, Android) using JavaScript and React.

React Native is a powerful JavaScript framework, originally developed by Meta (Facebook), enabling developers to build truly native mobile applications for both iOS and Android from a single codebase. It utilizes the React declarative programming model and bridges JavaScript code to the native UI components of each platform, ensuring high performance and a genuine native look and feel. This approach allows teams to share up to 90% of code, drastically reducing development time and cost. Major companies like Facebook, Microsoft, and Shopify leverage React Native for their production apps, validating its scalability and efficiency for cross-platform development.

https://reactnative.dev/

View projects
ONNX

ONNX (Open Neural Network Exchange) is an open-source format: it standardizes machine learning models, ensuring interoperability across all major frameworks and deployment hardware.

ONNX delivers critical model portability. It defines a standardized computation graph and operator set, allowing developers to train a model in one framework (e.g., PyTorch or TensorFlow) and deploy it seamlessly using a different runtime. This eliminates framework lock-in and optimizes production performance. Founded in 2017 by key industry players (AWS, Microsoft, Facebook), ONNX now boasts contributions from companies like NVIDIA, Intel, and Qualcomm. The high-performance ONNX Runtime, for example, powers AI inference across major Microsoft products, including Windows, Office, and Azure Cognitive Services, demonstrating its enterprise-grade efficiency and cross-platform capability (cloud, edge, mobile).

https://onnx.ai

View projects
Appium

Appium is an open-source, cross-platform test automation framework for native, hybrid, and mobile web applications.

Appium simplifies mobile testing by driving iOS, Android, and Windows apps using the W3C WebDriver protocol (the same standard behind Selenium). Its main advantage lies in its philosophy: you do not need to recompile or modify your codebase to run tests, and you can write scripts in any language you prefer (such as Java, Python, or JavaScript). By acting as a bridge to vendor-provided automation frameworks like Apple's XCUITest and Google's UiAutomator2, Appium ensures you test the exact same app package you ship to production.

https://appium.io

View projects
Claude Code

Anthropic's agentic coding tool: Unleash Claude's raw power directly in your terminal or IDE to turn complex, hours-long workflows into a single command.

Claude Code is Anthropic’s powerful agentic coding assistant, designed for high-velocity development. It operates natively within your terminal, IDE (VS Code, JetBrains), or via a web interface, allowing you to delegate complex tasks like feature building, bug fixing, and codebase navigation. The agent plans, edits files, executes commands, and creates commits, maintaining awareness of your entire project structure. Internally, Anthropic engineers using Claude Code reported a 67% increase in productivity, demonstrating its capacity to deliver significant gains for Pro and Max plan users.

https://claude.com/code

View projects