
Tired of paying costly subscriptions or wary of sharing your personal data with OpenAI?
What if there were free and more secure alternatives using very capable open source models?
If you’re intrigued, then this guide is for you. Let’s build together our own "ChatGPT," powered by the most capable open source models, right on your iPhone!
On the backend, we’ll leverage Ollama and Google Colab’s free T4 GPU to serve the LLMs. For the frontend, we’ll employ Enchanted, an elegant open source iOS app, to interact with models such as Llama 2, Mistral, Phi-2, and more.
By the end of this guide, you’ll have a powerful AI at your fingertips – without spending a dime. And the best part? You can easily switch among the best open source models according to your needs!
Ready? Let’s dive in!
A Quick Overview of the Key Components
To build our open source "ChatGPT", we’ll use the following key components :
- Google Colab notebook
- Ollama: an open source tool allowing to run locally open-source large language models, such as Llama 2.
- NGrok: a tool to expose a local development server to the Internet with minimal effort.
- Enchanted: an open source iOS/iPad mobile app for chatting with privately hosted models.
Google Colab
Google Colab, is a free cloud service hosted by Google that allows anyone to write and execute Python code through the browser.
Even with a free account, it provides access to a T4 GPU with 12GB RAM, which is largely sufficient to run models like Mistral 7B or Llama 7B. As it requires 8GB of RAM to run the 7B models.
Ollama

Ollama is, for me, the best and also the easiest way to get up and running with open source LLMs. It supports, among others, the most capable LLMs such as Llama 2, Mistral, Phi-2, and you can find the list of available models on ollama.ai/library.
Ngrok
Ngrok is an easy-to-use tool that enables developers to expose a local development server to the Internet with minimal effort. Essentially, it creates a secure tunnel to your localhost, allowing you to share a web service on your local development machine without altering firewall settings or deploying to a public server.
Enchanted

Enchanted is an iOS/iPad mobile app for chatting with open source models such as Llama 2, Mistral, and more.
It features a simple and elegant UI that connects to your private Ollama models. It provides the following functionalities:
- Supports latest Ollama Chat API
- Conversation history included in the API calls
- Dark/Light mode
- Conversation history is stored on your device
- Markdown support (nicely displays tables/lists/code blocks)
- Voice prompts
- Image attachments for prompts
- Specify system prompt used for every conversations
- Edit message content or submit message with different model
- Delete single conversation / delete all conversations
Source: Enchanted Github
What You Will Need
Before diving into the technical setup, ensure you have the following:
1. Google Colab account
To use Google Colab, you first need to sign in to your Google account and access Google Colab through https://colab.research.google.com/.
2. NGrok Account and Token
Go to the NGrok website and sign up if you don’t have an account already. Otherwise, sign in, and request an "Authtoken." Save this token securely; we’ll use it later on in our step-by-step guide.

3. Download Enchanted from the App Store
Download and install the Enchanted App on your iPhone through the following link: https://apps.apple.com/gb/app/enchanted-llm/id6474268307.
Step-by-Step Guide
Step 1 – Run Ollama on Google Colab
Begin by opening a new Google Colab notebook. To ensure that we’re utilizing the GPU, click on ‘Runtime’ in the menu. select ‘Change runtime type’, and choose ‘T4 GPU’ as the hardware accelerator.
Now, we need to install dependancies related to Ngrok and configure the authtoken.
!pip install aiohttp pyngrok
!ngrok config add-authtoken [YOUR_NGROK_AUTHTOKEN]
Then, we install Ollama with the following command.
!curl https://ollama.ai/install.sh | sh
Now, it’s time to run Ollama alongside NGrok. In this example, we’ll configure Ollama to run both the Mistral and Codellama models simultaneously. This setup allows us to switch between models later on, offering flexibility in our chat app’s capabilities.
To manage this, we’ll run them as separate subprocesses:
- Ollama will initiate the models on
localhost:11434
. - NGrok will then expose port
11434
to the internet with a public URL.
Execute the below code snippet to start both Ollama and NGrok:
import os
import asyncio
# Set LD_LIBRARY_PATH so the system NVIDIA library
os.environ.update({'LD_LIBRARY_PATH': '/usr/lib64-nvidia'})
async def run_process(cmd):
print('>>> starting', *cmd)
p = await asyncio.subprocess.create_subprocess_exec(
*cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
async def pipe(lines):
async for line in lines:
print(line.strip().decode('utf-8'))
await asyncio.gather(
pipe(p.stdout),
pipe(p.stderr),
)
await asyncio.gather(
run_process(['ollama', 'serve']),
run_process(['ollama', 'pull', 'mistral']),
run_process(['ollama', 'pull', 'codellama']),
run_process(['ngrok', 'http', '--log', 'stderr', '11434']),
)
After running the code, keep an eye on the logs. You’ll find the public URL generated by NGrok . Let’s note this URL, as we’ll use it in the next step to connect our model with the iPhone through the Enchanted app.

Step 2 – Configure endpoint in Enchanted App
This step will connect the LLM models with your iPhone.
Now, switch to your iPhone, open the Enchanted App, and go to the App Settings. Specify your server endpoint with the public URL we noted in the log in the previous step. It should look something like this: "https://bb51-34-29-149-225.ngrok.io".

Step 3 – Try and Have fun
So that’s it! Now we can chat with both Mistral 7B and Codellama using our iPhone.
First, let’s choose the model Codellama and ask it to give us a code example implementing a neural network in Python. Not bad!

Then, swiftly switch to Mistral 7B and consider asking it for some ideas on what typical dishes to prepare to celebrate the Lunar New Year (as inspired by my friend’s Han HELOIR, Ph.D. ☕️latest article on "Celebrate with AI: Chinese New Year Tips from Mistral and LLaVA on Raspberry Pi").

Closing Thoughts
I’m quite happy with the results I’ve obtained with my open source ChatGPT, including the quality of the responses and the speed of token generation. I’m also delighted with the UI, which is really simple and easy to use.
I hope you enjoyed this guide and that it inspires you to explore the vast landscape of open source AI. Don’t hesitate to go back to Ollama and try out different models. Each has its unique strengths and capabilities, offering various experiences.
As usual, you can find my Google Colab notebook here.
Before you go! 🦸🏻 ♀️
If you liked my story and you want to support me: