Use Fal for serverless cloud deployments with lightning-fast inference, autoscaling, and easy API access.
Clone the repository
git clone https://github.com/Liquid4All/lfm-inference
Deployment
cd fal
# run one-off server
fal run deploy-lfm2.py::serve
# run production server
fal deploy deploy-lfm2.py::serve --app-name lfm2-8b --auth private
The first run will require extra time to download the docker image and model weights.
Test call
First, create an API key here.
Then run the following cURL commands:
export FAL_API_KEY=<your-fal-api-key>
# List deployed model
curl https://fal.run/<org-id>/<app-id>/v1/models -H "Authorization: Key $FAL_API_KEY"
# Query the deployed LFM model
curl -X POST https://fal.run/<org-id>/<app-id>/v1/chat/completions \
-H "Authorization: Key $FAL_API_KEY" \
--json '{
"model": "LiquidAI/LFM2-8B-A1B",
"messages": [
{
"role": "user",
"content": "What is the melting temperature of silver?"
}
],
"max_tokens": 32,
"temperature": 0
}'
Fal endpoints expect the Key prefix in the Authorization header.