How to Install Ollama on Windows (2026): Complete Setup Guide

Installing Ollama on Windows gives you instant access to powerful language models—no Docker, no Python virtualenvs, no cloud subscriptions. In minutes, your Windows PC becomes a full AI workstation. This guide walks through installation for all Windows PCs, GPU setup for maximum speed, and everything you need to start using models like Llama 2, Phi 4, and Qwen2.5.

What Is Ollama and Why Use It on Windows?

Ollama simplifies local AI. It handles model downloads, GPU optimization, and serves models via a clean REST API—all from a single command-line tool. On Windows, Ollama runs as a background service that stays available even after you restart your PC.

Why Windows users love Ollama in 2026:

NVIDIA CUDA support: Full GPU acceleration on gaming graphics cards
No containers required: Native Windows binary; no Docker overhead
Automatic GPU detection: Plug-and-play VRAM management
Background service: Ollama runs automatically at startup
Developer-friendly: Simple REST API for integrations

System Requirements

Minimum: Windows 10 or 11, 16GB RAM, 20GB free disk space

GPU (recommended): NVIDIA RTX 2080 or newer (8GB+ VRAM), or AMD Radeon RX 5700 XT

Check your GPU: Right-click Desktop → NVIDIA Control Panel, or run in PowerShell:

Get-WmiObject Win32_VideoController | Select-Object Name, AdapterRAM

Note your GPU model and available VRAM. This determines which models run comfortably on your machine.

Step 1: Download Ollama for Windows

Visit ollama.ai in your browser. Click "Download" and select "Windows." This downloads OllamaSetup.exe (~200MB).

The installer automatically detects your GPU and installs appropriate drivers during setup.

Step 2: Run the Installer

Double-click OllamaSetup.exe. The installer opens and guides you through setup:

1. Click "Next" through the welcome screen

2. Accept the license agreement

3. Choose installation location (default is recommended: C:\Users\[YourUsername]\AppData\Local\Programs\Ollama)

4. Click "Install"

The installer runs for 1–2 minutes. After completion, it launches Ollama automatically and adds it to your Start menu.

Step 3: Verify Ollama Is Running

After installation, check that Ollama started successfully. Open PowerShell (Windows key → PowerShell) and run:

curl http://localhost:11434/api/tags

If Ollama is running, this returns JSON. If you see an error, Ollama may not have started. Manually launch it:

& "C:\Users\$env:USERNAME\AppData\Local\Programs\Ollama\ollama app.exe"

Step 4: Download Your First Model

Open PowerShell and download Llama 2 7B:

ollama pull llama2

This downloads ~4GB. Progress displays in real-time. On a typical broadband connection, expect 5–10 minutes:

pulling manifest
pulling 1a9242... 45% ████
...

Step 5: Run Your First Model

Start an interactive session:

ollama run llama2

You'll see a prompt. Try it out:

>>> How do I learn to code?
[Llama 2 responds with a detailed learning roadmap...]

Type "exit" or press Ctrl+C to close the chat.

Step 6: Enable GPU Acceleration (NVIDIA)

For NVIDIA GPUs, verify CUDA acceleration is active. Open PowerShell and start the Ollama server:

ollama serve

Look for this in the output:

GPU memory: 8192 MB (nvidia)

This confirms your GPU will be used for inference. Press Ctrl+C to stop the server.

For AMD GPUs: Install AMD ROCm drivers from rocmdocs.amd.com, then set:

$env:OLLAMA_GPU = "rocm"
ollama serve

Explore Popular Models for Windows

Qwen2.5 14B: Best overall—strong reasoning, multilingual, fast on mid-range GPUs

ollama pull qwen2.5

Phi 4: Most efficient—excellent for 8GB GPUs, strong coding

ollama pull phi4

Mistral 7B: Fastest inference—ideal for real-time applications

ollama pull mistral

Llama 2 70B: Most powerful—requires 24GB+ VRAM but delivers superior reasoning

ollama pull llama2:70b

Access Models via API for Developer Integration

Keep Ollama running in the background (the service persists after installation), then query models from Python, C#, Node.js, or any HTTP client:

curl.exe -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Write a C# function that reverses a string",
  "stream": false
}' | powershell -Command "ConvertFrom-Json | Select-Object -ExpandProperty response"

Python example:

import requests

response = requests.post('http://localhost:11434/api/generate', json={
    'model': 'llama2',
    'prompt': 'Explain machine learning to a 10-year-old',
    'stream': False
})

answer = response.json()['response']
print(answer)

C# example:

using System.Net.Http;

var client = new HttpClient();
var payload = new { model = "llama2", prompt = "List five benefits of AI", stream = false };
var response = await client.PostAsync("http://localhost:11434/api/generate",
    new StringContent(JsonConvert.SerializeObject(payload), Encoding.UTF8, "application/json"));

var result = JsonConvert.DeserializeObject(await response.Content.ReadAsStringAsync());

Optimize Ollama for Your Windows PC

Increase context window for document analysis:

ollama run llama2 --num-ctx 4096

Run smaller models on limited VRAM:

ollama pull mistral  # 7B is faster and lighter than 70B

Monitor GPU usage in Task Manager:

Open Task Manager (Ctrl+Shift+Esc) → Performance → GPU. Watch VRAM usage while models run. This helps you choose appropriately-sized models.

Use multiple GPUs if available:

$env:CUDA_VISIBLE_DEVICES = "0,1"
ollama serve

Keep Ollama Running at Startup

Ollama installs as a Windows service and starts automatically. To verify:

1. Press Windows key + R

2. Type "services.msc"

3. Look for "Ollama" in the list

4. Verify its status is "Running"

If not running, right-click → Start. Set Startup type to "Automatic" so Ollama restarts with your PC.

Troubleshooting Windows Installation

Issue: Installer fails or GPU not detected

Solution: Update your GPU drivers (NVIDIA GeForce Experience or AMD Radeon Software) before installing Ollama. Restart your PC, then reinstall Ollama.

Issue: "Port 11434 already in use"

Solution: Ollama is running twice. Press Windows key + R, type "taskkill /IM ollama.exe /F" to close all instances. Restart.

Issue: Out-of-memory errors despite sufficient VRAM

Solution: Close other GPU-intensive apps (browsers with many tabs, games, video editors). Reduce batch size: ollama run llama2 --batch-size 256

Issue: Very slow model downloads

Solution: Check your internet speed (speedtest.net). If consistently slow, try at off-peak hours. Some models are 40GB+; this is normal.

Next Steps: Build Production Applications

With Ollama running, you can:

Build a local chatbot: Create a web app with Ollama backend
Implement RAG: Index your documents and answer questions about them locally
Automate workflows: Use Ollama in scripts for email drafting, code review, data analysis
Deploy to production: Scale beyond your PC using Docker containers

⚡

From local setup to enterprise AI. Windows-native deployment strategies. Daily AI Agents provides frameworks for building, scaling, and deploying Ollama-powered AI systems on Windows. Explore advanced techniques.

Explore Windows AI Workflows →

Conclusion

Your Windows PC is a capable AI platform. With Ollama installed, you have access to state-of-the-art language models—offline, private, and free. Leverage GPU acceleration for instant inference, integrate models into your applications via REST API, and build AI systems fully under your control.

Install Ollama today. Transform your Windows PC into an AI workstation.