Troubleshooting

Installation Issues

ImportError: cannot import name 'LfmForCausalLM'

Ensure you have the latest version of transformers installed:

pip install transformers>=4.55.0

If you’re using an older version, the LFM model classes may not be available.

CUDA out of memory errors

Try these solutions in order:

Use a smaller model: Try LFM2-350M instead of LFM2-1.2B
Enable quantization:

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=True,
    device_map="auto"
)

Reduce batch size or sequence length
Use gradient checkpointing for training:

model.gradient_checkpointing_enable()

Model download fails or times out

Check your internet connection
Try using huggingface-cli login if the model requires authentication
Set a longer timeout: HF_HUB_DOWNLOAD_TIMEOUT=600
Try downloading with snapshot_download:

from huggingface_hub import snapshot_download
snapshot_download("LiquidAI/LFM2.5-1.2B-Instruct")

Inference Issues

Model generates repetitive or low-quality output

Adjust generation parameters:

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.1
)

Key parameters to tune:

temperature: Lower (0.3-0.5) for factual, higher (0.7-1.0) for creative
top_p: 0.9 is a good default
repetition_penalty: 1.1-1.2 helps avoid loops

Slow inference speed

Optimization strategies:

Use GGUF models with llama.cpp for CPU inference
Use MLX models on Apple Silicon
Enable Flash Attention (if available):

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    attn_implementation="flash_attention_2"
)

Use vLLM for high-throughput serving
Use smaller quantization levels (Q4 vs Q8)

Output is cut off or incomplete

Increase max_new_tokens:

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,  # Increase this value
)

Check that your input isn’t too long - LFM models support 32k context, but very long inputs leave less room for output.

Fine-tuning Issues

Training loss not decreasing

Common causes and solutions:

Learning rate too high/low: Try 2e-4 for LoRA, 2e-5 for full fine-tuning
Dataset format issues: Verify your data matches the expected chat template
Insufficient data: Ensure you have enough training examples
Check for data leakage: Make sure eval data isn’t in training set

Out of memory during fine-tuning

Memory optimization strategies:

Use QLoRA instead of full fine-tuning
Reduce batch size and increase gradient accumulation
Enable gradient checkpointing
Use a smaller model (LFM2-350M for experiments)

training_args = TrainingArguments(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    gradient_checkpointing=True,
)

llama.cpp / GGUF Issues

Model fails to load in llama.cpp

Ensure you’re using a compatible llama.cpp version
Check that the GGUF file downloaded completely
Try a different quantization level (e.g., Q4_K_M)

Very slow inference with llama.cpp

Ensure you compiled with GPU support if available
Use appropriate thread count: -t $(nproc)
Try a more aggressive quantization (Q4_0)

Still Stuck?

Discord: Join our Discord community for real-time help
GitHub Issues: Report bugs at github.com/Liquid4All/docs/issues
Cookbook: Check examples for working code

Get Started

Models

Key Concepts

Inference

Fine-tuning

Frameworks

Help

Installation Issues

Inference Issues

Fine-tuning Issues

llama.cpp / GGUF Issues

Still Stuck?

Get Started

Models

Key Concepts

Inference

Fine-tuning

Frameworks

Help

​Installation Issues

​Inference Issues

​Fine-tuning Issues

​llama.cpp / GGUF Issues

​Still Stuck?

Installation Issues

Inference Issues

Fine-tuning Issues

llama.cpp / GGUF Issues

Still Stuck?