The SDK uses GGUF manifests for loading models (recommended for all new projects due to superior inference performance and better default generation parameters).
Legacy Executorch bundle support is available in the accordion below for existing projects.
The LEAP Edge SDK supports directly downloading LEAP models in GGUF format. Given the model name and quantization method (which you can find in the LEAP Model Library), the SDK will automatically download the necessary GGUF files along with generation parameters for optimal performance.
Copy
Ask AI
import LeapSDKimport LeapModelDownloaderimport Combine@MainActorfinal class ChatViewModel: ObservableObject { @Published var isLoading = false @Published var conversation: Conversation? private var modelRunner: ModelRunner? private var generationTask: Task<Void, Never>? func loadModel() async { isLoading = true defer { isLoading = false } do { // LEAP will download the model if needed or reuse a cached copy. let modelRunner = try await Leap.load(model: "LFM2-1.2B", quantization: "Q5_K_M", downloadProgressHandler: { progress, speed in // progress: Double (0...1) // speed: bytes per second }) conversation = modelRunner.createConversation(systemPrompt: "You are a helpful travel assistant.") self.modelRunner = modelRunner } catch { print("Failed to load model: \(error)") } } func send(_ text: String) { guard let conversation else { return } generationTask?.cancel() let userMessage = ChatMessage(role: .user, content: [.text(text)]) generationTask = Task { [weak self] in do { for try await response in conversation.generateResponse( message: userMessage, generationOptions: GenerationOptions(temperature: 0.7) ) { self?.handle(response) } } catch { print("Generation failed: \(error)") } } } func stopGeneration() { generationTask?.cancel() } @MainActor private func handle(_ response: MessageResponse) { switch response { case .chunk(let delta): print(delta, terminator: "") // Update UI binding here case .reasoningChunk(let thought): print("Reasoning:", thought) case .audioSample(let samples, let sr): print("Received audio samples \(samples.count) at sample rate \(sr)") case .functionCall(let calls): print("Requested calls: \(calls)") case .complete(let completion): if let stats = completion.stats { print("Finished with \(stats.totalTokens) tokens") } let text = completion.message.content.compactMap { part -> String? in if case .text(let value) = part { return value } return nil }.joined() print("Final response:", text) // completion.message.content may also include `.audio` entries you can persist or replay } }}
Legacy: Executorch Bundles
Browse the Leap Model Library and download a .bundle file for the model/quantization you want. .bundle packages contain metadata plus assets for the ExecuTorch backend.You can either:
Ship it with the app - drag the bundle into your Xcode project and ensure it is added to the main target.
Download at runtime - use LeapModelDownloader to fetch bundles on demand.
Alternative: Download at runtime
Copy
Ask AI
import LeapModelDownloaderlet model = await LeapDownloadableModel.resolve( modelSlug: "lfm2-350m-enjp-mt", quantizationSlug: "lfm2-350m-enjp-mt-20250904-8da4w")if let model { let downloader = ModelDownloader() downloader.requestDownloadModel(model) let status = await downloader.queryStatus(model) switch status { case .downloaded: let bundleURL = downloader.getModelFile(model) try await runModel(at: bundleURL) case .downloadInProgress(let progress): print("Progress: \(Int(progress * 100))%") case .notOnLocal: print("Waiting for download...") }}
Use Leap.load(url:options:) inside an async context. Passing a .bundle loads the model through the ExecuTorch backend.
send(_:) (shown above) launches a Task that consumes the AsyncThrowingStream returned by Conversation.generateResponse. Each MessageResponse case maps to UI updates, tool execution, or completion metadata. Cancel the task manually (for example via stopGeneration()) to interrupt generation early. You can also observe conversation.isGenerating to disable UI controls while a request is in flight.
When the loaded model ships with multimodal weights (and companion files were detected), you can mix text, image, and audio content in the same message:
Copy
Ask AI
let message = ChatMessage( role: .user, content: [ .text("Describe what you see."), .image(jpegData) // Data containing JPEG bytes ])let audioMessage = ChatMessage( role: .user, content: [ .text("Transcribe and summarize this clip."), .audio(wavData) // Data containing WAV bytes ])let pcmMessage = ChatMessage( role: .user, content: [ .text("Give feedback on my pronunciation."), ChatMessageContent.fromFloatSamples(samples, sampleRate: 16000) ])
You now have a project that loads an on-device model, streams responses, and is ready for advanced features like structured output and tool use.Edit this page