Ollama

Run Llama 3, Mistral, Gemma, and other models locally.

Overview

Ollama is a lightweight, extensible framework for building and running language models on your local machine. It provides a simple CLI and a local API that is compatible with many tools.

Capabilities & Support

text

vision

audio

image

video

Popular Models

Model Name	Type	Size	Best For
Gemma 4 E4B	textvisionaudio	5.6 GB	Google's April 2026 sweet spot. Native Audio ASR/Translation + Vision. Perfect for M4 MacBooks.
Qwen 3.6-Plus	textvision	24B	Alibaba's April 2026 flagship. State-of-the-art coding and multimodal understanding.
Llama 4 Maverick	text	80B	Meta's legendary reasoning model (2025). Highly reliable for complex agentic workflows.
DeepSeek V4-Lite	text	16B	Fast reasoning model from the DeepSeek family, optimized for 2026 hardware.
Gemma 4 E2B	textvisionaudio	3.6 GB	Ultra-fast multimodal inference. Runs perfectly on 16GB RAM devices.

Installation Guide

Download & Install

Go to ollama.com and download the installer for your OS. Run the installer and finish the setup.

https://ollama.com/download

Verify Installation

Open your terminal and run the ollama command to see if it's working.

ollama --version

Run your first model

Download and run the Llama 3.2 model directly from the CLI.

ollama run llama3.2

Key Features

Simple CLI interface
Local REST API (port 11434)
Large model library (ollama.com/library)
Automatic GPU acceleration

Usage Examples

List downloaded models

ollama list

Pull a vision model

ollama run llama3.2-vision