Skip to main content
Use Ollama for quick local model serving with a simple CLI or Docker-based deployment.
Ollama uses GGUF models and supports GPU acceleration (CUDA, Metal, ROCm).

Installation

Download directly from ollama.com/download.

Using LFM2 Models

Ollama can load GGUF models directly from Hugging Face or from local files.

Running GGUFs

You can run LFM2 models directly from Hugging Face:
ollama run hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF
See the Models page for all available GGUF repositories. To use a local GGUF file, first download a model from Hugging Face:
pip install huggingface-hub
hf download LiquidAI/LFM2.5-1.2B-Instruct-GGUF {quantization}.gguf --local-dir .
Replace {quantization} with your preferred quantization level (e.g., q4_k_m, q8_0). Then run the local model:
ollama run /path/to/model.gguf
For custom configurations (specific quantization, chat template, or parameters), create a Modelfile.Create a plain text file named Modelfile (no extension) with the following content:
FROM /path/to/model.gguf

TEMPLATE """<|startoftext|><|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|endoftext|>"
Import the model with the Modelfile:
ollama create my-model -f Modelfile
Then run it:
ollama run my-model

Basic Usage

Interact with models through the command-line interface.

Interactive Chat

ollama run hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF
Type your messages and press Enter. Use /bye to exit.

Single Prompt

ollama run hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF "What is machine learning?"
If you imported a model with a custom name using a Modelfile, use that name instead (e.g., ollama run my-model).

Serving Models

Ollama automatically starts a server on http://localhost:11434 with an OpenAI-compatible API for programmatic access.

Python Client

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF",
    messages=[
        {"role": "user", "content": "Explain quantum computing."}
    ],
    temperature=0.7
)
print(response.choices[0].message.content)
Ollama provides two native API endpoints:Generate API (simple completion):
curl http://localhost:11434/api/generate -d '{
  "model": "hf.co/LiquidAI/LFM2-1.2B-GGUF",
  "prompt": "What is artificial intelligence?"
}'
Chat API (conversational format):
curl http://localhost:11434/api/chat -d '{
  "model": "hf.co/LiquidAI/LFM2-1.2B-GGUF",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ]
}'

Vision Models

LFM2-VL GGUF models can also be used for multimodal inference with Ollama.
Run a vision model directly and provide images in the chat:
ollama run hf.co/LiquidAI/LFM2.5-VL-1.6B-GGUF
In the interactive chat, you can ask questions about images using the /image command followed by the file path:
>>> /image path/to/image.jpg What's in this image?
Or provide the image path directly in your prompt:
>>> Describe the contents of ~/Downloads/photo.png
from openai import OpenAI
import base64

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="not-needed"
)

# Encode image to base64
with open("image.jpg", "rb") as image_file:
    image_data = base64.b64encode(image_file.read()).decode("utf-8")

response = client.chat.completions.create(
    model="hf.co/LiquidAI/LFM2.5-VL-1.6B-GGUF",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}},
                {"type": "text", "text": "What's in this image?"}
            ]
        }
    ]
)
print(response.choices[0].message.content)

Model Management

List installed models:
ollama list
Remove a model:
ollama rm hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF
Show model information:
ollama show hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF