Installation Issues
ImportError: cannot import name 'LfmForCausalLM'
ImportError: cannot import name 'LfmForCausalLM'
Ensure you have the latest version of transformers installed:If you’re using an older version, the LFM model classes may not be available.
CUDA out of memory errors
CUDA out of memory errors
Try these solutions in order:
- Use a smaller model: Try LFM2-350M instead of LFM2-1.2B
- Enable quantization:
- Reduce batch size or sequence length
- Use gradient checkpointing for training:
Model download fails or times out
Model download fails or times out
- Check your internet connection
- Try using
huggingface-cli loginif the model requires authentication - Set a longer timeout:
HF_HUB_DOWNLOAD_TIMEOUT=600 - Try downloading with
snapshot_download:
Inference Issues
Model generates repetitive or low-quality output
Model generates repetitive or low-quality output
Adjust generation parameters:Key parameters to tune:
temperature: Lower (0.3-0.5) for factual, higher (0.7-1.0) for creativetop_p: 0.9 is a good defaultrepetition_penalty: 1.1-1.2 helps avoid loops
Slow inference speed
Slow inference speed
Optimization strategies:
- Use GGUF models with llama.cpp for CPU inference
- Use MLX models on Apple Silicon
- Enable Flash Attention (if available):
- Use vLLM for high-throughput serving
- Use smaller quantization levels (Q4 vs Q8)
Output is cut off or incomplete
Output is cut off or incomplete
Increase Check that your input isn’t too long - LFM models support 32k context, but very long inputs leave less room for output.
max_new_tokens:Fine-tuning Issues
Training loss not decreasing
Training loss not decreasing
Common causes and solutions:
- Learning rate too high/low: Try 2e-4 for LoRA, 2e-5 for full fine-tuning
- Dataset format issues: Verify your data matches the expected chat template
- Insufficient data: Ensure you have enough training examples
- Check for data leakage: Make sure eval data isn’t in training set
Out of memory during fine-tuning
Out of memory during fine-tuning
Memory optimization strategies:
- Use QLoRA instead of full fine-tuning
- Reduce batch size and increase gradient accumulation
- Enable gradient checkpointing
- Use a smaller model (LFM2-350M for experiments)
llama.cpp / GGUF Issues
Model fails to load in llama.cpp
Model fails to load in llama.cpp
- Ensure you’re using a compatible llama.cpp version
- Check that the GGUF file downloaded completely
- Try a different quantization level (e.g., Q4_K_M)
Very slow inference with llama.cpp
Very slow inference with llama.cpp
- Ensure you compiled with GPU support if available
- Use appropriate thread count:
-t $(nproc) - Try a more aggressive quantization (Q4_0)
Still Stuck?
- Discord: Join our Discord community for real-time help
- GitHub Issues: Report bugs at github.com/Liquid4All/docs/issues
- Cookbook: Check examples for working code