How to Run Phi 4 on Raspberry Pi: Step-by-Step Guide (2026)
Running Phi 4 on Raspberry Pi 5 opens a world of edge AI possibilities. With 8GB of RAM and native ARM support, you can deploy a capable language model on a credit-card-sized device—perfect for robots, IoT systems, offline devices, and distributed edge networks. This guide shows you how to set up Phi 4 on Raspberry Pi and leverage it for real-world applications.
Why Run Phi 4 on Raspberry Pi?
Phi 4 is compact and efficient—ideal for edge deployment. Running it on Raspberry Pi 5 unlocks:
- Offline-first AI: Deploy language model intelligence in remote locations without internet
- Robotics integration: Add natural language understanding to physical robots and embedded systems
- IoT applications: Build smart devices that make local decisions without cloud latency
- Privacy at the edge: Process sensitive data on the device; nothing leaves your network
- Cost-effective: Raspberry Pi 5 costs ~$60; far cheaper than cloud API subscriptions
- Distributed networks: Deploy Phi 4 on hundreds of Pi devices for edge computing clusters
Hardware Requirements
Minimum setup:
- Raspberry Pi 5 (8GB recommended; 4GB may struggle)
- 64GB microSD card or NVMe SSD
- USB-C power supply (27W recommended)
- Cooling case (Phi 4 inference is CPU-intensive; avoid thermal throttling)
Optional but recommended:
- USB-to-Ethernet for stable networking (faster than WiFi for model downloads)
- External SSD (NVMe faster than SD card for model storage)
Step 1: Prepare Your Raspberry Pi
Install the latest Raspberry Pi OS (64-bit) using the official imager. Download Raspberry Pi Imager from raspberrypi.com, select OS → Raspberry Pi OS (64-bit), and write to your SD card or SSD.
Boot your Pi and open a terminal. Update the system:
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl wget git python3-pip python3-venv
Verify your setup:
uname -a # Should show ARM64 architecture
Step 2: Increase Swap Space (Critical!)
Phi 4 requires more virtual memory than Raspberry Pi has available. Increase swap to prevent crashes:
sudo dphys-swapfile swapoff
sudo nano /etc/dphys-swapfile
Find this line:
CONF_SWAPSIZE=100
Change it to:
CONF_SWAPSIZE=4096 # 4GB swap; adjust based on your drive size
Save (Ctrl+O, Enter, Ctrl+X). Then:
sudo dphys-swapfile setup
sudo dphys-swapfile swapon
free -h # Verify swap is active
Step 3: Install Ollama on Raspberry Pi
Ollama's install script works on Raspberry Pi ARM64:
curl -fsSL https://ollama.com/install.sh | sh
This installs Ollama and registers it as a systemd service. Verify:
ollama --version
Step 4: Download Phi 4 (Optimized for ARM)
Ollama automatically selects the ARM64-optimized Phi 4 version. Pull it:
ollama pull phi4
This downloads ~8.5GB. On a USB-connected Ethernet link, expect 10–20 minutes. Monitor progress:
df -h / # Check remaining disk space
free -h # Watch memory usage
If downloads are slow over WiFi, connect Ethernet for faster transfers.
Step 5: Run Phi 4 on Your Raspberry Pi
Start the Ollama server:
ollama serve &
This runs in the background. In a new terminal, test Phi 4:
curl -X POST http://localhost:11434/api/generate \
-d '{
"model": "phi4",
"prompt": "Hello, what is your name?",
"stream": false
}' | jq '.response'
Expect a response in 30–120 seconds (slower than modern desktops, but functional for edge tasks). First-token latency is typically 15–30 seconds on Raspberry Pi 5.
Step 6: Integrate Phi 4 Into Your Raspberry Pi Projects
Python automation script:
#!/usr/bin/env python3
import requests
def ask_phi4(prompt):
response = requests.post('http://localhost:11434/api/generate', json={
'model': 'phi4',
'prompt': prompt,
'stream': False
})
return response.json()['response']
# Example: Device monitoring
status = ask_phi4("Summarize: Device temperature is 45C, RAM usage is 70%, CPU at 60%")
print(f"System status: {status}")
Bash automation for IoT tasks:
#!/bin/bash
sensor_data=$(cat /sys/class/thermal/thermal_zone0/temp)
response=$(curl -s -X POST http://localhost:11434/api/generate \
-d "{\"model\": \"phi4\", \"prompt\": \"Device temperature is $(($sensor_data/1000))C. Should I activate cooling?\", \"stream\": false}" \
| jq -r '.response')
echo "Decision: $response"
Optimizing Phi 4 for Raspberry Pi Performance
Reduce context window for faster responses:
ollama run phi4 --num-ctx 512
This limits the model to 512 tokens of context, cutting inference time in half. Ideal for quick decisions in IoT scenarios.
Limit threads to prevent CPU overload:
export OLLAMA_NUM_THREAD=2 # Use only 2 CPU cores
ollama serve
This prevents thermal throttling and allows other processes to run smoothly.
Monitor CPU and memory in real-time:
watch -n 1 'free -h && ps aux | grep ollama'
Real-World Raspberry Pi + Phi 4 Applications
Smart home automation:
Deploy Phi 4 on Raspberry Pi to interpret natural language commands—"Turn on the lights in 5 minutes"—without sending data to cloud services.
Robot intelligence:
Equip a Raspberry Pi-powered robot with Phi 4 for local language understanding. The robot understands commands, asks clarifying questions, and makes decisions without internet.
Offline edge analytics:
Process sensor streams and generate insights locally. Example: Analyze weather data and generate a local weather forecast without API calls.
Distributed edge network:
Deploy Phi 4 across 10–100 Raspberry Pis to create a decentralized inference network. Aggregate results locally without central servers.
Troubleshooting Raspberry Pi + Phi 4
Issue: "Out of memory" or model fails to load
Solution: Verify swap is active (free -h) and set to 4GB+. Reduce context window: ollama run phi4 --num-ctx 256
Issue: Phi 4 responses take 2+ minutes
Solution: This is normal on Raspberry Pi 5. If unacceptable, consider using a smaller model (e.g., "ollama pull neural-chat"). Alternatively, upgrade to multiple Pi 5s for parallel inference.
Issue: Ollama service crashes or doesn't start
Solution: Check logs: journalctl -u ollama -n 50. Restart service: sudo systemctl restart ollama
Issue: WiFi downloads are extremely slow
Solution: Use wired Ethernet with a USB adapter. WiFi on Raspberry Pi 5 peaks at ~100 Mbps; Ethernet can achieve 1 Gbps.
Going Further: Edge AI Clusters
Once you master Phi 4 on a single Raspberry Pi, scale to edge clusters:
- Deploy Phi 4 on 10–100 Pi devices across geographic locations
- Use distributed inference to handle high query volumes
- Add a coordinator to distribute load and aggregate results
- Monitor the entire network with Prometheus + Grafana
Conclusion
Phi 4 on Raspberry Pi democratizes edge AI. For under $100 in hardware, you gain a capable language model that operates offline, respects privacy, and scales horizontally across devices. From smart home automation to robot intelligence to distributed sensor networks, Phi 4 on Raspberry Pi unlocks applications previously reserved for expensive cloud infrastructure.
Deploy local AI at the edge. Start with a single Pi today.