How to Install Ollama on Windows (2026): Complete Setup Guide
Installing Ollama on Windows gives you instant access to powerful language models—no Docker, no Python virtualenvs, no cloud subscriptions. In minutes, your Windows PC becomes a full AI workstation. This guide walks through installation for all Windows PCs, GPU setup for maximum speed, and everything you need to start using models like Llama 2, Phi 4, and Qwen2.5.
What Is Ollama and Why Use It on Windows?
Ollama simplifies local AI. It handles model downloads, GPU optimization, and serves models via a clean REST API—all from a single command-line tool. On Windows, Ollama runs as a background service that stays available even after you restart your PC.
Why Windows users love Ollama in 2026:
- NVIDIA CUDA support: Full GPU acceleration on gaming graphics cards
- No containers required: Native Windows binary; no Docker overhead
- Automatic GPU detection: Plug-and-play VRAM management
- Background service: Ollama runs automatically at startup
- Developer-friendly: Simple REST API for integrations
System Requirements
Minimum: Windows 10 or 11, 16GB RAM, 20GB free disk space
GPU (recommended): NVIDIA RTX 2080 or newer (8GB+ VRAM), or AMD Radeon RX 5700 XT
Check your GPU: Right-click Desktop → NVIDIA Control Panel, or run in PowerShell:
Get-WmiObject Win32_VideoController | Select-Object Name, AdapterRAM
Note your GPU model and available VRAM. This determines which models run comfortably on your machine.
Step 1: Download Ollama for Windows
Visit ollama.ai in your browser. Click "Download" and select "Windows." This downloads OllamaSetup.exe (~200MB).
The installer automatically detects your GPU and installs appropriate drivers during setup.
Step 2: Run the Installer
Double-click OllamaSetup.exe. The installer opens and guides you through setup:
1. Click "Next" through the welcome screen
2. Accept the license agreement
3. Choose installation location (default is recommended: C:\Users\[YourUsername]\AppData\Local\Programs\Ollama)
4. Click "Install"
The installer runs for 1–2 minutes. After completion, it launches Ollama automatically and adds it to your Start menu.
Step 3: Verify Ollama Is Running
After installation, check that Ollama started successfully. Open PowerShell (Windows key → PowerShell) and run:
curl http://localhost:11434/api/tags
If Ollama is running, this returns JSON. If you see an error, Ollama may not have started. Manually launch it:
& "C:\Users\$env:USERNAME\AppData\Local\Programs\Ollama\ollama app.exe"
Step 4: Download Your First Model
Open PowerShell and download Llama 2 7B:
ollama pull llama2
This downloads ~4GB. Progress displays in real-time. On a typical broadband connection, expect 5–10 minutes:
pulling manifest
pulling 1a9242... 45% ████
...
Step 5: Run Your First Model
Start an interactive session:
ollama run llama2
You'll see a prompt. Try it out:
>>> How do I learn to code?
[Llama 2 responds with a detailed learning roadmap...]
Type "exit" or press Ctrl+C to close the chat.
Step 6: Enable GPU Acceleration (NVIDIA)
For NVIDIA GPUs, verify CUDA acceleration is active. Open PowerShell and start the Ollama server:
ollama serve
Look for this in the output:
GPU memory: 8192 MB (nvidia)
This confirms your GPU will be used for inference. Press Ctrl+C to stop the server.
For AMD GPUs: Install AMD ROCm drivers from rocmdocs.amd.com, then set:
$env:OLLAMA_GPU = "rocm"
ollama serve
Explore Popular Models for Windows
Qwen2.5 14B: Best overall—strong reasoning, multilingual, fast on mid-range GPUs
ollama pull qwen2.5
Phi 4: Most efficient—excellent for 8GB GPUs, strong coding
ollama pull phi4
Mistral 7B: Fastest inference—ideal for real-time applications
ollama pull mistral
Llama 2 70B: Most powerful—requires 24GB+ VRAM but delivers superior reasoning
ollama pull llama2:70b
Access Models via API for Developer Integration
Keep Ollama running in the background (the service persists after installation), then query models from Python, C#, Node.js, or any HTTP client:
curl.exe -X POST http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Write a C# function that reverses a string",
"stream": false
}' | powershell -Command "ConvertFrom-Json | Select-Object -ExpandProperty response"
Python example:
import requests
response = requests.post('http://localhost:11434/api/generate', json={
'model': 'llama2',
'prompt': 'Explain machine learning to a 10-year-old',
'stream': False
})
answer = response.json()['response']
print(answer)
C# example:
using System.Net.Http;
var client = new HttpClient();
var payload = new { model = "llama2", prompt = "List five benefits of AI", stream = false };
var response = await client.PostAsync("http://localhost:11434/api/generate",
new StringContent(JsonConvert.SerializeObject(payload), Encoding.UTF8, "application/json"));
var result = JsonConvert.DeserializeObject(await response.Content.ReadAsStringAsync());
Optimize Ollama for Your Windows PC
Increase context window for document analysis:
ollama run llama2 --num-ctx 4096
Run smaller models on limited VRAM:
ollama pull mistral # 7B is faster and lighter than 70B
Monitor GPU usage in Task Manager:
Open Task Manager (Ctrl+Shift+Esc) → Performance → GPU. Watch VRAM usage while models run. This helps you choose appropriately-sized models.
Use multiple GPUs if available:
$env:CUDA_VISIBLE_DEVICES = "0,1"
ollama serve
Keep Ollama Running at Startup
Ollama installs as a Windows service and starts automatically. To verify:
1. Press Windows key + R
2. Type "services.msc"
3. Look for "Ollama" in the list
4. Verify its status is "Running"
If not running, right-click → Start. Set Startup type to "Automatic" so Ollama restarts with your PC.
Troubleshooting Windows Installation
Issue: Installer fails or GPU not detected
Solution: Update your GPU drivers (NVIDIA GeForce Experience or AMD Radeon Software) before installing Ollama. Restart your PC, then reinstall Ollama.
Issue: "Port 11434 already in use"
Solution: Ollama is running twice. Press Windows key + R, type "taskkill /IM ollama.exe /F" to close all instances. Restart.
Issue: Out-of-memory errors despite sufficient VRAM
Solution: Close other GPU-intensive apps (browsers with many tabs, games, video editors). Reduce batch size: ollama run llama2 --batch-size 256
Issue: Very slow model downloads
Solution: Check your internet speed (speedtest.net). If consistently slow, try at off-peak hours. Some models are 40GB+; this is normal.
Next Steps: Build Production Applications
With Ollama running, you can:
- Build a local chatbot: Create a web app with Ollama backend
- Implement RAG: Index your documents and answer questions about them locally
- Automate workflows: Use Ollama in scripts for email drafting, code review, data analysis
- Deploy to production: Scale beyond your PC using Docker containers
Conclusion
Your Windows PC is a capable AI platform. With Ollama installed, you have access to state-of-the-art language models—offline, private, and free. Leverage GPU acceleration for instant inference, integrate models into your applications via REST API, and build AI systems fully under your control.
Install Ollama today. Transform your Windows PC into an AI workstation.