A Weekend AI Project: Running Speech Recognition and a LLaMA-2 GPT on a Raspberry Pi

A fully offline use of Whisper ASR and LLaMA-2 GPT Model

Published in

Towards Data Science

10 min readJan 20, 2024

Raspberry Pi running a LLaMA model, Image by author

Nowadays, nobody will be surprised by running a deep learning model in the cloud. But the situation can be much more complicated in the edge or consumer device world. There are several reasons for that. First, the use of cloud APIs requires devices to always be online. This is not a problem for a web service but can be a dealbreaker for the device that needs to be functional without Internet access. Second, cloud APIs cost money, and customers likely will not be happy to pay yet another subscription fee. Last but not least, after several years, the project may be finished, API endpoints will be shut down, and the expensive hardware will turn into a brick. Which is naturally not friendly for customers, the ecosystem, and the environment. That’s why I am convinced that the end-user hardware should be fully functional offline, without extra costs or using the online APIs (well, it can be optional but not mandatory).

In this article, I will show how to run a LLaMA GPT model and automatic speech recognition (ASR) on a Raspberry Pi. That will allow us to ask Raspberry Pi questions and get answers. And as promised, all this will work fully offline.

A Weekend AI Project: Running Speech Recognition and a LLaMA-2 GPT on a Raspberry Pi

A fully offline use of Whisper ASR and LLaMA-2 GPT Model

Create an account to read the full story.

Published in Towards Data Science

Written by Dmitrii Eliuseev

Responses (9)

More from Dmitrii Eliuseev and Towards Data Science

Build and Run a Docker Container for your Machine Learning Model

A quick and easy build of a Docker container with a simple machine learning model

Injecting domain expertise into your AI system

How to connect the dots between AI technology and real life

Using LLamaIndex Workflow to Implement an Agent Handoff Feature Like OpenAI Swarm

Example: a customer service chatbot project

Quantum Machine Learning with Python: Kernel Methods and Neural Networks

Introduction

Recommended from Medium

OpenCost: The Open-Source Tool You Need for Kubernetes Cost Management

It tracks and breaks down Kubernetes resource costs by workload, namespace, or service, enabling teams to monitor and optimize cloud…

I Stopped Using Kubernetes. Our DevOps Team Is Happier Than Ever

Why Letting Go of Kubernetes Worked for Us

Lists

Predictive Modeling w/ Python

Practical Guides to Machine Learning

Coding & Development

Natural Language Processing

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

How To Install And Use DeepSeek R-1 In Your Local PC

Here’s a step-by-step guide on how you can run DeepSeek R-1 on your local machine even without internet connection.

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

It literally took one try. I was shocked.

Kube-Pprometheus-Stack vs. k8s-Monitoring-Helm: Which One Should You Use for Kubernetes Monitoring?

Keeping tabs on what’s happening inside a Kubernetes cluster is no small feat. Between metrics, logs, and traces, monitoring can quickly…