LFM2-700M

← Back to Text Models LFM2-700M is a compact model balancing capability and efficiency. Suitable for deployment on a wide range of devices including phones, tablets, and laptops with limited resources.

HF GGUF MLX ONNX

Specifications

Property	Value
Parameters	700M
Context Length	32K tokens
Architecture	LFM2 (Dense)

Edge Deployment

Optimized for resource-constrained devices

Low Latency

Fast inference for real-time applications

Fine-tunable

TRL compatible (SFT, DPO, GRPO)

Quick Start

Transformers
llama.cpp
vLLM

Install:

pip install transformers torch

Download & Run:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2-700M", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2-700M")

input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": "What is machine learning?"}],
    add_generation_prompt=True, return_tensors="pt"
).to(model.device)

output = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Install:

brew install llama.cpp
pip install huggingface-hub

Download:

huggingface-cli download LiquidAI/LFM2-700M-GGUF \
  lfm2-700m-q4_k_m.gguf --local-dir .

Run:

llama-cli -m lfm2-700m-q4_k_m.gguf \
  -p "What is machine learning?" -n 256

Install:

pip install vllm

Run:

from vllm import LLM, SamplingParams

llm = LLM(model="LiquidAI/LFM2-700M")
output = llm.chat("What is machine learning?", SamplingParams(max_tokens=256))
print(output[0].outputs[0].text)

Get Started

Models

Key Concepts

Inference

Fine-tuning

Frameworks

Help

Specifications

Edge Deployment

Low Latency

Fine-tunable

Quick Start

Get Started

Models

Key Concepts

Inference

Fine-tuning

Frameworks

Help

​Specifications

Edge Deployment

Low Latency

Fine-tunable

​Quick Start

Specifications

Quick Start