How to Install Ollama on Mac (2026): Complete Setup Guide

Installing Ollama on your Mac is the fastest way to run powerful language models locally. No terminals, no Docker, no cloud subscriptions—just a single installer that gives you instant access to state-of-the-art models like Llama 2, Phi 4, Qwen, and Mistral. This guide covers installation for all Mac hardware, from M1 MacBook Air to Intel iMac, plus everything you need to start using Ollama immediately.

What Is Ollama?

Ollama is the simplest way to run large language models on your Mac. It bundles model downloads, inference optimization, and a REST API into a single command-line tool with a native macOS app. Unlike Docker or conda environments, Ollama requires no complex setup—just download, install, and run.

In 2026, Ollama is the de facto standard for local LLM inference on consumer hardware.

System Requirements

For Apple Silicon Macs (M1, M2, M3, M4): 8GB RAM minimum; 16GB+ recommended

For Intel Macs: 16GB RAM minimum; 32GB+ for smooth performance with larger models

Disk space: Models range from 4GB to 45GB; ensure 100GB+ free for flexibility

Check your Mac's specs: Click the Apple menu → About This Mac. Note your chip type and available memory.

Step 1: Download Ollama for Mac

Visit ollama.ai in your browser. Click the "Download" button. The site automatically detects your Mac's architecture and offers the correct installer:

- Apple Silicon (M1–M4): Downloads "Ollama-darwin-arm64.dmg"

- Intel Mac: Downloads "Ollama-darwin-amd64.dmg"

The download is ~200MB. After it completes, open your Downloads folder.

Step 2: Install the Ollama App

Double-click the .dmg file. This opens an installer window showing:

- Ollama icon (left)

- Applications folder icon (right) with an arrow between them

Drag the Ollama icon to the Applications folder. Installation takes seconds.

After dragging, eject the disk image (click the eject icon in Finder sidebar).

Step 3: Launch Ollama

Open Applications (Cmd+Shift+A) and double-click "Ollama." The app launches silently and adds a neural-network icon to your menu bar (top-right of screen).

This means Ollama is running and ready to use. You can now download and run models.

Step 4: Open Terminal and Download Your First Model

Open Terminal (Applications → Utilities → Terminal) and run:

ollama pull llama2

This downloads the 7B version of Llama 2 (~4GB). Progress displays in real-time. First-time downloads take 5–10 minutes depending on internet speed.

Once complete, start chatting:

ollama run llama2

You'll see a prompt (>>>). Type a question and press Enter:

>>> What are the best ways to learn machine learning?
[Llama 2 responds with thoughtful advice...]

Type "exit" or press Ctrl+D to quit the chat.

Step 5: Verify Metal GPU Acceleration (Apple Silicon Macs)

If you're on Apple Silicon, verify that Ollama uses Metal GPU acceleration. Start the Ollama server in Terminal:

ollama serve

Look for this line in the output:

metal memory: 16.0 GiB

This confirms Metal GPU acceleration is active. Press Ctrl+C to stop the server.

Step 6: Keep Ollama Running in the Background

For continuous access to models, keep Ollama running in the background. The macOS app does this automatically—just leave the menu bar icon visible.

To verify Ollama is still running, check the menu bar (top-right). If you see the brain icon, Ollama is active. If the icon is missing, click Applications → Ollama to restart it.

Explore Popular Models

Ollama hosts hundreds of models. Here are the best for 2026:

Qwen2.5 14B (Best all-rounder): Strong reasoning, multilingual, fast

ollama pull qwen2.5

Phi 4 (Most efficient): Great for MacBook Air, coding-focused

ollama pull phi4

Mistral 7B (Fastest): Quick responses, good for real-time applications

ollama pull mistral

Llama 2 70B (Most powerful): Best reasoning, requires 32GB+ RAM

ollama pull llama2:70b

Access Models via REST API for Developer Integration

Developers can query Ollama programmatically. Keep the Ollama server running (ollama serve) in the background, then use any HTTP client:

curl -X POST http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2",
    "prompt": "Write a Python function to check if a number is prime",
    "stream": false
  }'

Python integration example:

import requests

response = requests.post('http://localhost:11434/api/generate', json={
    'model': 'llama2',
    'prompt': 'Explain quantum computing in simple terms',
    'stream': False
})

print(response.json()['response'])

Optimize Ollama for Your Mac

Increase context window for longer documents:

ollama run llama2 --num-ctx 4096

On Intel Macs with limited VRAM, reduce model size:

ollama pull mistral  # 7B model is smaller and faster than 14B alternatives

Run multiple models simultaneously:

ollama run llama2 &      # Run in background
ollama run phi4          # Run in foreground

Troubleshooting Common Issues

Issue: "ollama: command not found" in Terminal

Solution: Ollama may not be in your PATH. Try the full path: /Applications/Ollama.app/Contents/MacOS/ollama --version

Issue: Out of memory errors with large models

Solution: Close other applications. Switch to smaller models (7B instead of 70B) or reduce context: ollama run llama2 --num-ctx 1024

Issue: Model downloads are very slow

Solution: Check your internet connection. If using WiFi, try Ethernet (via USB adapter) for faster downloads.

Issue: Responses are slow / high latency

Solution: Verify Metal is enabled (Apple Silicon). For Intel Macs, close background apps. Consider using a smaller model.

Next Steps: Build Real Applications

Now that Ollama runs locally, you can:

Build a chatbot: Create a web app that queries Ollama's API
Implement RAG: Index your documents and use Ollama to answer questions about them
Fine-tune models: Specialize Ollama models for your domain
Integrate into workflows: Add AI to scripts, automation tools, and productivity apps

🧠

Build AI applications on your Mac. No cloud required. Daily AI Agents helps you go from Ollama setup to production AI systems. Explore advanced techniques and deployment strategies.

Explore Advanced Workflows →

Conclusion

Ollama transforms your Mac into a powerful AI workstation in minutes. With native GPU acceleration on Apple Silicon and support for hundreds of models, you have everything needed to run state-of-the-art language models locally—offline, private, and free.

Install Ollama today. Your Mac's AI capabilities await.