Fal

Clone the repository
Deployment
Test call

Use Fal for serverless cloud deployments with lightning-fast inference, autoscaling, and easy API access.

Clone the repository

git clone https://github.com/Liquid4All/lfm-inference

Deployment

cd fal

# run one-off server
fal run deploy-lfm2.py::serve

# run production server
fal deploy deploy-lfm2.py::serve --app-name lfm2-8b --auth private

The first run will require extra time to download the docker image and model weights.

Test call

First, create an API key here. Then run the following cURL commands:

export FAL_API_KEY=<your-fal-api-key>

# List deployed model
curl https://fal.run/<org-id>/<app-id>/v1/models -H "Authorization: Key $FAL_API_KEY"

# Query the deployed LFM model
curl -X POST https://fal.run/<org-id>/<app-id>/v1/chat/completions \
  -H "Authorization: Key $FAL_API_KEY" \
  --json '{
    "model": "LiquidAI/LFM2-8B-A1B",
    "messages": [
      {
        "role": "user",
        "content": "What is the melting temperature of silver?"
      }
    ],
    "max_tokens": 32,
    "temperature": 0
  }'

Fal endpoints expect the Key prefix in the Authorization header.

Baseten TRL

⌘I

Get Started

Models

Key Concepts

Inference

Fine-tuning

Frameworks

Help

Clone the repository

Deployment

Test call

Get Started

Models

Key Concepts

Inference

Fine-tuning

Frameworks

Help

​Clone the repository

​Deployment

​Test call

Clone the repository

Deployment

Test call