LFM2-8B-A1B

← Back to Text Models LFM2-8B-A1B is Liquid AI’s Mixture-of-Experts model, combining 8B total parameters with only 1.5B active parameters per forward pass. This delivers the quality of larger models with the speed and efficiency of smaller ones—ideal for on-device deployment.

HF GGUF MLX

Specifications

Property	Value
Parameters	8B (1.5B active)
Context Length	32K tokens
Architecture	LFM2 (MoE)

MoE Efficiency

8B quality, 1.5B inference cost

On-Device

Runs on phones and laptops

Tool Calling

Native function calling support

Quick Start

Transformers
llama.cpp
vLLM

Install:

pip install git+https://github.com/huggingface/transformers.git@0c9a72e4576fe4c84077f066e585129c97bfd4e6 torch

Download & Run:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2-8B-A1B", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2-8B-A1B")

input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": "What is machine learning?"}],
    add_generation_prompt=True, return_tensors="pt"
).to(model.device)

output = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Install:

brew install llama.cpp
pip install huggingface-hub

Download:

huggingface-cli download LiquidAI/LFM2-8B-A1B-GGUF \
  lfm2-8b-a1b-q4_k_m.gguf --local-dir .

Run:

llama-cli -m lfm2-8b-a1b-q4_k_m.gguf \
  -p "What is machine learning?" -n 256

Install:

pip install vllm

Run:

from vllm import LLM, SamplingParams

llm = LLM(model="LiquidAI/LFM2-8B-A1B")
output = llm.chat("What is machine learning?", SamplingParams(max_tokens=256))
print(output[0].outputs[0].text)

Get Started

Models

Key Concepts

Inference

Fine-tuning

Frameworks

Help

Specifications

MoE Efficiency

On-Device

Tool Calling

Quick Start

Get Started

Models

Key Concepts

Inference

Fine-tuning

Frameworks

Help

​Specifications

MoE Efficiency

On-Device

Tool Calling

​Quick Start

Specifications

Quick Start