Running LLMs on a phone's NPU/GPU | Nürnberg .

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

June 24, 2026 · Nürnberg

PocketPal AI: On-Device LLMs

See how to run large language models entirely on your phone, no server needed. This talk covers on-device TTS, agentic loops, and a dev pipeline that benchmarks performance across devices.

Overview
Links
Tech stack
  • llama
    Meta's open-weights LLM family optimized for high-performance local deployment and custom fine-tuning across 8B to 405B parameter scales.
    Llama 3.1 delivers state-of-the-art performance through a flagship 405B parameter model trained on 15 trillion tokens. It supports a 128k context window: ideal for analyzing massive datasets or long-form documentation. Developers utilize Llama for diverse tasks (multilingual translation, Python code generation, and complex reasoning) while maintaining data sovereignty via local hosting. The ecosystem includes the Llama Stack for agentic workflows and optimized weights for 8B and 70B models, ensuring high throughput on consumer hardware or enterprise clusters.
  • React Native
    React Native is an open-source framework for building native mobile applications (iOS, Android) using JavaScript and React.
    React Native is a powerful JavaScript framework, originally developed by Meta (Facebook), enabling developers to build truly native mobile applications for both iOS and Android from a single codebase. It utilizes the React declarative programming model and bridges JavaScript code to the native UI components of each platform, ensuring high performance and a genuine native look and feel. This approach allows teams to share up to 90% of code, drastically reducing development time and cost. Major companies like Facebook, Microsoft, and Shopify leverage React Native for their production apps, validating its scalability and efficiency for cross-platform development.
  • ONNX
    ONNX (Open Neural Network Exchange) is an open-source format: it standardizes machine learning models, ensuring interoperability across all major frameworks and deployment hardware.
    ONNX delivers critical model portability. It defines a standardized computation graph and operator set, allowing developers to train a model in one framework (e.g., PyTorch or TensorFlow) and deploy it seamlessly using a different runtime. This eliminates framework lock-in and optimizes production performance. Founded in 2017 by key industry players (AWS, Microsoft, Facebook), ONNX now boasts contributions from companies like NVIDIA, Intel, and Qualcomm. The high-performance ONNX Runtime, for example, powers AI inference across major Microsoft products, including Windows, Office, and Azure Cognitive Services, demonstrating its enterprise-grade efficiency and cross-platform capability (cloud, edge, mobile).
  • Appium
    Appium is an open-source, cross-platform test automation framework for native, hybrid, and mobile web applications.
    Appium simplifies mobile testing by driving iOS, Android, and Windows apps using the W3C WebDriver protocol (the same standard behind Selenium). Its main advantage lies in its philosophy: you do not need to recompile or modify your codebase to run tests, and you can write scripts in any language you prefer (such as Java, Python, or JavaScript). By acting as a bridge to vendor-provided automation frameworks like Apple's XCUITest and Google's UiAutomator2, Appium ensures you test the exact same app package you ship to production.
  • Claude Code
    Anthropic's agentic coding tool: Unleash Claude's raw power directly in your terminal or IDE to turn complex, hours-long workflows into a single command.
    Claude Code is Anthropic’s powerful agentic coding assistant, designed for high-velocity development. It operates natively within your terminal, IDE (VS Code, JetBrains), or via a web interface, allowing you to delegate complex tasks like feature building, bug fixing, and codebase navigation. The agent plans, edits files, executes commands, and creates commits, maintaining awareness of your entire project structure. Internally, Anthropic engineers using Claude Code reported a 67% increase in productivity, demonstrating its capacity to deliver significant gains for Pro and Max plan users.