Ollama

Installation

Install Ollama on Linux or macOS.

curl -fsSL https://ollama.com/install.sh | sh

Start the server manually if needed.

ollama serve

💡 Tip: On Linux, Ollama installs as a systemd service. Use systemctl status ollama to check status.

Model Management

Pull a Model

ollama pull [model]

Pull a specific tag.

ollama pull llama3.2:3b

Pull a quantized variant.

ollama pull mistral:7b-instruct-q4_K_M

List Downloaded Models

ollama list

Show Model Details

ollama show [model]

Show only the Modelfile.

ollama show [model] --modelfile

Copy a Model

ollama cp [model] my-custom-name

Remove a Model

ollama rm [model]

Push a Model to Registry

ollama push username/[model]

Running Models

Interactive Chat (REPL)

ollama run [model]

Run with an initial prompt.

ollama run [model] "[prompt]"

💡 Tip: Inside the REPL, type /help to see commands like /set, /save, /load, /bye.

Pipe Input

echo "[prompt]" | ollama run [model]

Summarize a file piped into the model.

cat document.txt | ollama run [model] "Summarize this document."

Multiline Prompt

ollama run [model] <<'EOF'
Explain the following Rust error in plain English:

error[E0502]: cannot borrow `x` as mutable because it is also borrowed as immutable
EOF

Run with Custom Parameters

ollama run [model] --verbose

REST API

The Ollama server exposes a REST API at http://localhost:11434.

Generate (Single-Turn)

Parameters:

model (Required): Model name.
prompt (Required): The input prompt.
stream (Optional): Stream tokens as they generate (default: true).
options (Optional): Model parameters (temperature, top_p, etc.).

curl [host]/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "[model]",
    "prompt": "[prompt]",
    "stream": false
  }'

Generate with Options

curl [host]/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "[model]",
    "prompt": "[prompt]",
    "stream": false,
    "options": {
      "temperature": 0.1,
      "top_p":       0.9,
      "top_k":       40,
      "num_predict": 512,
      "seed":        42
    }
  }'

Chat (Multi-Turn)

curl [host]/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "[model]",
    "stream": false,
    "messages": [
      {"role": "system",    "content": "You are a concise Linux expert."},
      {"role": "user",      "content": "What does the 2>/dev/null trick do?"}
    ]
  }'

Chat with History

curl [host]/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "[model]",
    "stream": false,
    "messages": [
      {"role": "user",      "content": "What is a closure?"},
      {"role": "assistant", "content": "A closure is a function that captures variables from its enclosing scope..."},
      {"role": "user",      "content": "Show me an example in Rust."}
    ]
  }'

Embeddings

curl [host]/api/embed \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nomic-embed-text",
    "input": "Rust ownership rules explained."
  }'

List Local Models (API)

curl [host]/api/tags

Show Model Info (API)

curl [host]/api/show \
  -H "Content-Type: application/json" \
  -d '{"name": "[model]"}'

Pull a Model (API)

curl [host]/api/pull \
  -H "Content-Type: application/json" \
  -d '{"name": "[model]", "stream": false}'

Delete a Model (API)

curl -X DELETE [host]/api/delete \
  -H "Content-Type: application/json" \
  -d '{"name": "[model]"}'

Check Running Models

curl [host]/api/ps

Modelfile

Create a custom model with a Modelfile:

Basic Modelfile

FROM [model]

SYSTEM """
You are an expert Rust engineer. You write safe, idiomatic Rust.
You always explain why a design decision was made.
Never suggest code that uses unwrap() without justification.
"""

PARAMETER temperature 0.1
PARAMETER top_p 0.9
PARAMETER num_predict 2048

Build and register the custom model.

ollama create my-rust-expert -f ./Modelfile

Test the custom model.

ollama run my-rust-expert "How should I handle errors in a CLI app?"

Modelfile Parameters

Parameter	Description	Example
`temperature`	Randomness (0 = deterministic)	`0.1`
`top_p`	Nucleus sampling	`0.9`
`top_k`	Top-K sampling	`40`
`num_ctx`	Context window size (tokens)	`8192`
`num_predict`	Max tokens to generate	`1024`
`repeat_penalty`	Penalise token repetition	`1.1`
`seed`	Fixed seed for reproducibility	`42`
`stop`	Stop sequences	`"<

Modelfile with Template

Customise the prompt template for models that use special tokens:

FROM mistral

TEMPLATE """[INST] {{ .System }} {{ .Prompt }} [/INST]"""

PARAMETER temperature 0.7

Environment Variables

Variable	Default	Description
`OLLAMA_HOST`	`127.0.0.1:11434`	Interface and port to listen on
`OLLAMA_MODELS`	`~/.ollama/models`	Path to model storage directory
`OLLAMA_NUM_PARALLEL`	`1`	Max concurrent request handlers
`OLLAMA_MAX_QUEUE`	`512`	Max queued requests
`OLLAMA_KEEP_ALIVE`	`5m`	How long to keep model in memory
`OLLAMA_DEBUG`	`false`	Enable debug logging

Expose Ollama on all interfaces.

OLLAMA_HOST=0.0.0.0 ollama serve

Increase the default context window.

OLLAMA_NUM_CTX=16384 ollama serve

GPU Configuration

Check whether Ollama is using the GPU.

ollama run [model] --verbose 2>&1 | grep "GPU"

Force CPU-only mode.

CUDA_VISIBLE_DEVICES="" ollama serve

Select a specific GPU on a multi-GPU system.

CUDA_VISIBLE_DEVICES=1 ollama serve

Popular Models

Model	Pull Command	Best For
Llama 3.2 3B	`ollama pull llama3.2:3b`	Fast, general purpose
Llama 3.3 70B	`ollama pull llama3.3`	High quality reasoning
Mistral 7B	`ollama pull mistral`	Instruction following
Phi-4	`ollama pull phi4`	Compact, strong reasoning
Gemma 3	`ollama pull gemma3`	Google, strong coding
Qwen2.5-Coder	`ollama pull qwen2.5-coder`	Code generation
DeepSeek-R1	`ollama pull deepseek-r1`	Deep reasoning
nomic-embed-text	`ollama pull nomic-embed-text`	Text embeddings

Ollama

Customize Variables

Installation

Model Management

Pull a Model

List Downloaded Models

Show Model Details

Copy a Model

Remove a Model

Push a Model to Registry

Running Models

Interactive Chat (REPL)

Pipe Input

Multiline Prompt

Run with Custom Parameters

REST API

Generate (Single-Turn)

Generate with Options

Chat (Multi-Turn)

Chat with History

Embeddings

List Local Models (API)

Show Model Info (API)

Pull a Model (API)

Delete a Model (API)

Check Running Models

Modelfile

Basic Modelfile

Modelfile Parameters

Modelfile with Template

Environment Variables

GPU Configuration

Popular Models

Resources

Customize Variables

💡 Tips & Tricks

Installation

Model Management

Pull a Model

List Downloaded Models

Show Model Details

Copy a Model

Remove a Model

Push a Model to Registry

Running Models

Interactive Chat (REPL)

Pipe Input

Multiline Prompt

Run with Custom Parameters

REST API

Generate (Single-Turn)

Generate with Options

Chat (Multi-Turn)

Chat with History

Embeddings

List Local Models (API)

Show Model Info (API)

Pull a Model (API)

Delete a Model (API)

Check Running Models

Modelfile

Basic Modelfile

Modelfile Parameters

Modelfile with Template

Environment Variables

GPU Configuration

Popular Models

Resources

Clear Variables

Print Options