A Weekend AI Project: Running Speech Recognition and a LLaMA-2 GPT on a Raspberry Pi

A fully offline use of Whisper ASR and LLaMA-2 GPT Model

Dmitrii Eliuseev
Towards Data Science
10 min readJan 20, 2024

Raspberry Pi running a LLaMA model, Image by author

Nowadays, nobody will be surprised by running a deep learning model in the cloud. But the situation can be much more complicated in the edge or consumer device world. There are several reasons for that. First, the use of cloud APIs requires devices to always be online. This is not a problem for a web service but can be a dealbreaker for the device that needs to be functional without Internet access. Second, cloud APIs cost money, and customers likely will not be happy to pay yet another subscription fee. Last but not least, after several years, the project may be finished, API endpoints will be shut down, and the expensive hardware will turn into a brick. Which is naturally not friendly for customers, the ecosystem, and the environment. That’s why I am convinced that the end-user hardware should be fully functional offline, without extra costs or using the online APIs (well, it can be optional but not mandatory).

In this article, I will show how to run a LLaMA GPT model and automatic speech recognition (ASR) on a Raspberry Pi. That will allow us to ask Raspberry Pi questions and get answers. And as promised, all this will work fully offline.

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Dmitrii Eliuseev
Dmitrii Eliuseev

Written by Dmitrii Eliuseev

Python/IoT developer and data engineer, data science and electronics enthusiast

Responses (9)

What are your thoughts?