Skip to main content
← Back to Liquid Nanos LFM2-1.2B-Extract is optimized for extracting structured data (JSON, XML, YAML) from unstructured documents. It handles complex nested schemas and multi-field extraction with high accuracy.

Specifications

PropertyValue
Parameters1.2B
Context Length32K tokens
TaskStructured Extraction
Output FormatsJSON, XML, YAML

Document Parsing

Extract fields from documents

Data Entry

Automate form filling

Schema Mapping

Complex nested structures

Prompting Recipe

Use temperature=0 (greedy decoding) for best results. This model is intended for single-turn conversations only.
System Prompt Format:
Identify and extract information matching the following schema.
Return data as a JSON object. Missing data should be omitted.

Schema:
- field_name: "Description of what to extract"
- nested_object:
  - nested_field: "Description"
If no system prompt is provided, defaults to JSON. Specify format (JSON, XML, or YAML) and schema for better accuracy.

Quick Start

Install:
pip install transformers torch
Run:
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "LiquidAI/LFM2-1.2B-Extract"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

system_prompt = """Identify and extract information matching the following schema.
Return data as a JSON object. Missing data should be omitted.

Schema:
- name: "Person's full name"
- email: "Email address"
- company: "Company name"
"""

user_input = "Contact John Smith at john.smith@acme.com. He works at Acme Corp."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_input}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=256, temperature=0, do_sample=False)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)