Skip to main content
← Back to Audio Models LFM2.5-Audio-1.5B is Liquid AI’s flagship audio model, featuring a custom LFM-based audio detokenizer. It delivers natural speech synthesis, multilingual speech recognition, and fully interleaved voice chat with reasoning capabilities in a single compact model.

Specifications

PropertyValue
Parameters1.5B (1.2B LM + 115M audio encoder)
Context Length32K tokens
Audio Output24kHz
Supported LanguageEnglish

Text-to-Speech

Natural speech synthesis

Speech Recognition

Multilingual ASR

Voice Chat

Interleaved audio/text

Quick Start

Install:
pip install liquid-audio
pip install "liquid-audio[demo]"  # optional, for demo dependencies
pip install flash-attn --no-build-isolation  # optional, for flash attention 2
Gradio Demo:
liquid-audio-demo
# Starts webserver on http://localhost:7860/
Multi-Turn Chat:
import torch
import torchaudio
from liquid_audio import LFM2AudioModel, LFM2AudioProcessor, ChatState

# Load models
HF_REPO = "LiquidAI/LFM2.5-Audio-1.5B"
processor = LFM2AudioProcessor.from_pretrained(HF_REPO).eval()
model = LFM2AudioModel.from_pretrained(HF_REPO).eval()

# Set up chat
chat = ChatState(processor)
chat.new_turn("system")
chat.add_text("Respond with interleaved text and audio.")
chat.end_turn()

chat.new_turn("user")
wav, sampling_rate = torchaudio.load("question.wav")
chat.add_audio(wav, sampling_rate)
chat.end_turn()

chat.new_turn("assistant")

# Generate text and audio tokens
text_out, audio_out = [], []
for t in model.generate_interleaved(**chat, max_new_tokens=512, audio_temperature=1.0, audio_top_k=4):
    if t.numel() == 1:
        print(processor.text.decode(t), end="", flush=True)
        text_out.append(t)
    else:
        audio_out.append(t)

# Detokenize audio and save
audio_codes = torch.stack(audio_out[:-1], 1).unsqueeze(0)
waveform = processor.decode(audio_codes)
torchaudio.save("answer.wav", waveform.cpu(), 24_000)