How to Run Phi 4 on Windows: Step-by-Step Guide (2026)
Running Phi 4 on Windows is easier than ever in 2026. This efficient language model delivers strong performance on consumer-grade hardware—including gaming laptops, workstations, and desktop PCs—without cloud dependencies. This guide walks you through installing and optimizing Phi 4 on Windows with Ollama and leveraging your GPU for fast inference.
What Is Phi 4 and Why Run It on Windows?
Phi 4, developed by Microsoft, is a compact yet capable language model optimized for local deployment. At 14B parameters, it excels at reasoning, coding, and general-purpose tasks while fitting comfortably on mid-range Windows hardware—perfect for developers, analysts, and anyone seeking privacy-first AI.
Benefits of running Phi 4 locally on Windows:
- GPU acceleration: NVIDIA CUDA and AMD ROCm support for 10–50x faster inference
- Complete offline operation: Works without internet after initial download
- Zero API costs: Run unlimited prompts for one-time setup
- Data privacy: All processing stays on your machine
- Multi-GPU support: Scale to enterprise workloads
System Requirements for Windows
Minimum: Windows 10 or 11, 16GB RAM, 20GB free disk space
Recommended: NVIDIA RTX 3070 or better (8GB+ VRAM), or AMD Radeon RX 6700 XT
Check your GPU: Right-click Desktop → NVIDIA/AMD Control Panel, or run this in PowerShell:
Get-WmiObject Win32_VideoController | Select-Object Name, AdapterRAM
Note your GPU model and VRAM—this ensures you choose the right Ollama configuration.
Step 1: Install Ollama on Windows
Visit ollama.ai and download the Windows installer. Run the .exe file and follow the setup wizard—Ollama detects your GPU automatically during installation.
After installation, open PowerShell and verify Ollama is ready:
ollama --version
Expected output: ollama version is 0.1.45
Step 2: Configure GPU Acceleration
For NVIDIA GPUs (CUDA):
Ollama automatically detects CUDA-capable GPUs. Verify GPU recognition by running:
ollama serve
Look for output like: GPU memory: 8192 MB (nvidia). This confirms Ollama will use your GPU for inference.
For AMD GPUs (ROCm):
Install AMD ROCm drivers first (rocmdocs.amd.com), then set this environment variable:
$env:OLLAMA_GPU = "rocm"
Then run ollama serve and verify AMD GPU is detected.
Step 3: Download Phi 4
Open PowerShell and pull the Phi 4 model:
ollama pull phi4
This downloads approximately 8.5GB of model weights. On a typical broadband connection, expect 5–15 minutes. Progress displays in real-time:
pulling manifest
pulling 1bd56c8a5c54... 25% ██
...
Step 4: Run Phi 4 Interactively
Start a chat session with Phi 4:
ollama run phi4
You'll see a prompt where you can type questions and receive GPU-accelerated responses. Try this:
>>> Write a Python function to validate email addresses
[Phi 4 responds with working code...]
Type "exit" or press Ctrl+C to close the session.
Step 5: Access Phi 4 via API for Integrations
For production use, access Phi 4 programmatically. Keep a PowerShell window open running ollama serve (it listens on localhost:11434), then query it from Python, JavaScript, or any HTTP client:
curl.exe -X POST http://localhost:11434/api/generate -d '{
"model": "phi4",
"prompt": "List five benefits of local LLMs",
"stream": false
}'
Phi 4 returns JSON with the complete response. This enables adding local AI to web apps, automation scripts, or internal tools.
Performance Tuning for Windows
Increase batch size for throughput:
ollama run phi4 --batch-size 512
Adjust context window for longer documents:
ollama run phi4 --num-ctx 8192
Monitor GPU usage in real-time:
Open Task Manager (Ctrl+Shift+Esc) → Performance tab → GPU to watch VRAM and utilization while Phi 4 runs.
Multi-GPU setup:
If you have multiple GPUs, set:
$env:CUDA_VISIBLE_DEVICES = "0,1"
This distributes model across both GPUs for faster inference on larger models.
Troubleshooting Common Windows Issues
Issue: GPU not detected / using CPU only
Solution: Update GPU drivers (NVIDIA GeForce Experience or AMD Radeon Software), then restart Ollama.
Issue: "port 11434 already in use"
Solution: Ollama is already running. Close all Ollama windows and try again, or use: netstat -ano | findstr :11434 to find and kill the process.
Issue: Out-of-memory errors despite sufficient VRAM
Solution: Reduce batch size or context: ollama run phi4 --batch-size 256 --num-ctx 2048
Building Production Workflows
Once Phi 4 runs reliably, expand into production-grade applications:
- Document analysis: Build a RAG system to answer questions about your private documents
- Code generation: Use Phi 4's coding strengths to auto-generate boilerplate or refactor code
- Customer support: Deploy a local chatbot that handles common requests without cloud overhead
- Data labeling: Use Phi 4 to pre-label datasets for machine learning projects
Conclusion
Your Windows PC is a powerful AI workstation. With Phi 4 and Ollama, you have access to enterprise-grade language model capabilities—offline, private, and free. Leverage GPU acceleration to run Phi 4 at production speed, integrate it into your applications, and build AI systems that never leave your control.
Start building today. No cloud required.