How to Run Phi 4 on Windows: Step-by-Step Guide (2026)

Running Phi 4 on Windows is easier than ever in 2026. This efficient language model delivers strong performance on consumer-grade hardware—including gaming laptops, workstations, and desktop PCs—without cloud dependencies. This guide walks you through installing and optimizing Phi 4 on Windows with Ollama and leveraging your GPU for fast inference.

What Is Phi 4 and Why Run It on Windows?

Phi 4, developed by Microsoft, is a compact yet capable language model optimized for local deployment. At 14B parameters, it excels at reasoning, coding, and general-purpose tasks while fitting comfortably on mid-range Windows hardware—perfect for developers, analysts, and anyone seeking privacy-first AI.

Benefits of running Phi 4 locally on Windows:

GPU acceleration: NVIDIA CUDA and AMD ROCm support for 10–50x faster inference
Complete offline operation: Works without internet after initial download
Zero API costs: Run unlimited prompts for one-time setup
Data privacy: All processing stays on your machine
Multi-GPU support: Scale to enterprise workloads

System Requirements for Windows

Minimum: Windows 10 or 11, 16GB RAM, 20GB free disk space

Recommended: NVIDIA RTX 3070 or better (8GB+ VRAM), or AMD Radeon RX 6700 XT

Check your GPU: Right-click Desktop → NVIDIA/AMD Control Panel, or run this in PowerShell:

Get-WmiObject Win32_VideoController | Select-Object Name, AdapterRAM

Note your GPU model and VRAM—this ensures you choose the right Ollama configuration.

Step 1: Install Ollama on Windows

Visit ollama.ai and download the Windows installer. Run the .exe file and follow the setup wizard—Ollama detects your GPU automatically during installation.

After installation, open PowerShell and verify Ollama is ready:

ollama --version

Expected output: ollama version is 0.1.45

Step 2: Configure GPU Acceleration

For NVIDIA GPUs (CUDA):

Ollama automatically detects CUDA-capable GPUs. Verify GPU recognition by running:

ollama serve

Look for output like: GPU memory: 8192 MB (nvidia). This confirms Ollama will use your GPU for inference.

For AMD GPUs (ROCm):

Install AMD ROCm drivers first (rocmdocs.amd.com), then set this environment variable:

$env:OLLAMA_GPU = "rocm"

Then run ollama serve and verify AMD GPU is detected.

Step 3: Download Phi 4

Open PowerShell and pull the Phi 4 model:

ollama pull phi4

This downloads approximately 8.5GB of model weights. On a typical broadband connection, expect 5–15 minutes. Progress displays in real-time:

pulling manifest
pulling 1bd56c8a5c54... 25% ██
...

Step 4: Run Phi 4 Interactively

Start a chat session with Phi 4:

ollama run phi4

You'll see a prompt where you can type questions and receive GPU-accelerated responses. Try this:

>>> Write a Python function to validate email addresses
[Phi 4 responds with working code...]

Type "exit" or press Ctrl+C to close the session.

Step 5: Access Phi 4 via API for Integrations

For production use, access Phi 4 programmatically. Keep a PowerShell window open running ollama serve (it listens on localhost:11434), then query it from Python, JavaScript, or any HTTP client:

curl.exe -X POST http://localhost:11434/api/generate -d '{
  "model": "phi4",
  "prompt": "List five benefits of local LLMs",
  "stream": false
}'

Phi 4 returns JSON with the complete response. This enables adding local AI to web apps, automation scripts, or internal tools.

Performance Tuning for Windows

Increase batch size for throughput:

ollama run phi4 --batch-size 512

Adjust context window for longer documents:

ollama run phi4 --num-ctx 8192

Monitor GPU usage in real-time:

Open Task Manager (Ctrl+Shift+Esc) → Performance tab → GPU to watch VRAM and utilization while Phi 4 runs.

Multi-GPU setup:

If you have multiple GPUs, set:

$env:CUDA_VISIBLE_DEVICES = "0,1"

This distributes model across both GPUs for faster inference on larger models.

Troubleshooting Common Windows Issues

Issue: GPU not detected / using CPU only

Solution: Update GPU drivers (NVIDIA GeForce Experience or AMD Radeon Software), then restart Ollama.

Issue: "port 11434 already in use"

Solution: Ollama is already running. Close all Ollama windows and try again, or use: netstat -ano | findstr :11434 to find and kill the process.

Issue: Out-of-memory errors despite sufficient VRAM

Solution: Reduce batch size or context: ollama run phi4 --batch-size 256 --num-ctx 2048

Building Production Workflows

Once Phi 4 runs reliably, expand into production-grade applications:

Document analysis: Build a RAG system to answer questions about your private documents
Code generation: Use Phi 4's coding strengths to auto-generate boilerplate or refactor code
Customer support: Deploy a local chatbot that handles common requests without cloud overhead
Data labeling: Use Phi 4 to pre-label datasets for machine learning projects

⚡

Build faster with local AI. Zero latency. Full control. Daily AI Agents provides frameworks for building, deploying, and scaling local LLM systems. Start free.

Get Started →

Conclusion

Your Windows PC is a powerful AI workstation. With Phi 4 and Ollama, you have access to enterprise-grade language model capabilities—offline, private, and free. Leverage GPU acceleration to run Phi 4 at production speed, integrate it into your applications, and build AI systems that never leave your control.

Start building today. No cloud required.